SkillHub ClubWrite Technical DocsFull StackBackendTech Writer

x-extract

Extract tweet content from x.com URLs without credentials using browser automation. Use when user asks to "extract tweet", "download x.com link", "get tweet content", or provides x.com/twitter.com URLs for content extraction. Works without Twitter API credentials.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

3,108

Hot score

Updated

March 20, 2026

Overall rating

C0.0

Composite score

0.0

Best-practice grade

A92.0

Install command

npx @skill-hub/cli install openclaw-skills-x-extract

Repository

openclaw/skills

Skill path: skills/chunhualiao/x-extract

Open repository

Best for

Primary workflow: Write Technical Docs.

Technical facets: Full Stack, Backend, Tech Writer.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install x-extract into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/openclaw/skills before adding x-extract to shared team environments
Use x-extract for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: x-extract
description: Extract tweet content from x.com URLs without credentials using browser automation. Use when user asks to "extract tweet", "download x.com link", "get tweet content", or provides x.com/twitter.com URLs for content extraction. Works without Twitter API credentials.
---

# X.com Tweet Extraction

Extract tweet content (text, media, author, metadata) from x.com URLs without requiring Twitter/X credentials.

## How It Works

Uses OpenClaw's browser tool to load the tweet page, then extracts content from the rendered HTML.

## Workflow

### 1. Validate URL

Check that the URL is a valid x.com/twitter.com tweet:
- Must contain `x.com/*/status/` or `twitter.com/*/status/`
- Extract tweet ID from URL pattern: `/status/(\d+)`

### 2. Open in Browser

```javascript
browser action=open profile=openclaw targetUrl=<x.com-url>
```

Wait for page load (targetId returned).

### 3. Capture Snapshot

```javascript
browser action=snapshot targetId=<TARGET_ID> snapshotFormat=aria
```

### 4. Extract Content

From the snapshot, extract:

**Required fields:**
- **Tweet text**: Look for role=article containing the main tweet content
- **Author**: role=link with author name/handle (usually @username format)
- **Timestamp**: role=time element

**Optional fields:**
- **Media**: role=img or role=link containing /photo/, /video/
- **Engagement**: Like count, retweet count, reply count (in role=group or role=button)
- **Thread context**: If tweet is part of thread, note previous/next tweet references

### 5. Format Output

Output as structured markdown:

```markdown
# Tweet by @username

**Author:** Full Name (@handle)  
**Posted:** YYYY-MM-DD HH:MM  
**Source:** <original-url>

---

<Tweet text content here>

---

**Media:**
- ![Image 1](<media-url-1>)
- ![Image 2](<media-url-2>)

**Engagement:**
- 👍 Likes: 1,234
- 🔄 Retweets: 567
- 💬 Replies: 89

**Thread:** [Part 2/5] | [View full thread](<thread-url>)
```

### 6. Download Media (Optional)

If user requests `--download-media` or "download images":

1. Extract all media URLs from snapshot
2. Use `exec` with `curl` or `wget` to download:
   ```bash
   curl -L -o "tweet-{tweetId}-image-{n}.jpg" "<media-url>"
   ```
3. Report downloaded files with paths

## Error Handling

**If page fails to load:**
- Check if URL is valid
- Try alternative: replace `x.com` with `twitter.com` (still works)
- Some tweets may require login (controversial, age-restricted) - report to user

**If content extraction fails:**
- X.com layout may have changed - check references/selectors.md
- Provide raw snapshot to user for manual review
- Report which fields were successfully extracted

## Common Selectors

See [references/selectors.md](references/selectors.md) for detailed CSS/ARIA selectors used by x.com (updated as layout changes).

## Limitations

- **No credentials**: Cannot access protected tweets, DMs, or login-required content
- **Rate limiting**: X.com may block excessive automated requests
- **Layout changes**: Selectors may break if X updates their HTML structure
- **Dynamic content**: Some content (comments, threads) may load lazily

## Examples

**Extract single tweet:**
```
User: "Extract this tweet: https://x.com/vista8/status/2019651804062241077"
Agent: [Opens browser, captures snapshot, formats markdown output]
```

**Extract with media download:**
```
User: "Get the tweet text and download all images from https://x.com/user/status/123"
Agent: [Extracts content, downloads images to ./downloads/, reports paths]
```

**Thread extraction:**
```
User: "Extract this thread: https://x.com/user/status/456"
Agent: [Detects thread, extracts all tweets in sequence, formats as numbered list]
```


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### references/selectors.md

```markdown
# X.com Content Selectors

Reference for extracting content from x.com pages using ARIA roles and CSS selectors.

**Last updated:** 2026-02-15  
**X.com layout version:** Current as of Feb 2026

## ARIA Snapshot Selectors (Preferred)

When using `browser action=snapshot snapshotFormat=aria`:

### Tweet Content

- **Main tweet article**: `role=article` (primary tweet container)
- **Tweet text**: Text content within `role=article`, usually in a `<div>` with `lang` attribute
- **Author name**: `role=link` containing author's display name (e.g., "John Doe")
- **Author handle**: `role=link` containing `@username` format
- **Timestamp**: `role=time` with `datetime` attribute
- **Media images**: `role=img` within the tweet article
- **Media links**: `role=link` with `href` containing `/photo/` or `/video/`

### Engagement Metrics

- **Like button**: `role=button` with label containing "Like" or heart emoji
- **Retweet button**: `role=button` with label containing "Retweet" or retweet icon
- **Reply button**: `role=button` with label containing "Reply" or comment icon
- **Share button**: `role=button` with label containing "Share"

Counts appear as text within or adjacent to these buttons.

### Thread Context

- **Thread indicator**: Text like "Show this thread" or numbered indicators "1/5"
- **Previous tweet**: `role=link` with "Show previous tweets" or similar
- **Next tweet**: Following `role=article` elements in snapshot

## CSS Selectors (Fallback)

If ARIA snapshot is not available or incomplete:

### Tweet Content

```css
/* Main tweet container */
article[data-testid="tweet"]

/* Tweet text */
div[data-testid="tweetText"]

/* Author name */
div[data-testid="User-Name"] a

/* Author handle */
div[data-testid="User-Name"] span:contains("@")

/* Timestamp */
time

/* Images */
div[data-testid="tweetPhoto"] img

/* Videos */
div[data-testid="videoPlayer"]
```

### Engagement Metrics

```css
/* Like count */
button[data-testid="like"] span

/* Retweet count */
button[data-testid="retweet"] span

/* Reply count */
button[data-testid="reply"] span

/* View count */
a[href$="/analytics"] span
```

## Content Extraction Patterns

### Text Extraction

Tweet text often includes:
- **Line breaks**: Preserved as `\n` in text content
- **Links**: May appear as shortened t.co URLs or full URLs
- **Mentions**: @username format (clickable links)
- **Hashtags**: #hashtag format (clickable links)
- **Emojis**: Unicode characters

**Pattern:**
```javascript
const tweetText = articleElement.querySelector('[data-testid="tweetText"]').innerText;
```

### Media URL Extraction

**Images:**
```javascript
// Pattern: https://pbs.twimg.com/media/{id}?format=jpg&name=large
const images = Array.from(articleElement.querySelectorAll('img[src*="pbs.twimg.com/media"]'))
  .map(img => img.src.replace(/&name=\w+/, '&name=large')); // Get highest quality
```

**Videos:**
```javascript
// Video preview images
const videoThumbs = Array.from(articleElement.querySelectorAll('img[src*="ext_tw_video_thumb"]'));

// Note: Actual video URLs require additional extraction from video player data
```

### Engagement Numbers

Numbers may be formatted as:
- Raw numbers: `1234`
- Shortened: `1.2K`, `45.6M`
- Localized: `1,234` (with commas)

**Parsing pattern:**
```javascript
function parseEngagementCount(text) {
  if (!text) return 0;
  text = text.trim().toUpperCase();
  
  if (text.endsWith('K')) return parseFloat(text) * 1000;
  if (text.endsWith('M')) return parseFloat(text) * 1000000;
  
  return parseInt(text.replace(/,/g, ''), 10) || 0;
}
```

## Layout Changes & Maintenance

X.com frequently updates their HTML structure. If selectors break:

1. **Check data-testid attributes**: These are most stable
2. **Verify ARIA roles**: Usually preserved for accessibility
3. **Inspect network requests**: XHR responses may contain structured data
4. **Use browser DevTools**: Inspect live page to identify new selectors

**Known changes:**
- 2023-07: Migration from twitter.com to x.com domains
- 2024-03: Updated engagement button layouts
- 2025-11: Redesigned tweet card structure

Update this document when selectors change. Include date and description of changes.

## Alternative Data Sources

If browser extraction fails, consider:

**1. Twitter/X API** (requires credentials - not used by this skill)
**2. Third-party services:**
   - Nitter instances (open-source Twitter frontend)
   - Tweet archival services
   - Social media data providers

**3. Browser extensions:**
   - Some extensions provide structured data extraction
   - Requires user installation

## Debug Tips

When extraction fails:

1. **Capture full snapshot**: Save entire browser snapshot for manual inspection
2. **Check role hierarchy**: ARIA tree may have nested structures
3. **Look for lazy-loaded content**: Some elements load after initial render
4. **Try alternative URLs**: twitter.com vs x.com, mobile.twitter.com
5. **Check for error messages**: "This tweet is unavailable" etc.

```



---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### README.md

```markdown
# x-extract

Extract tweet content from x.com URLs without requiring Twitter/X API credentials.

## Description

Browser-based tweet extraction tool that captures tweet text, author information, media, and engagement metrics from public x.com/twitter.com URLs using OpenClaw's browser automation.

## Usage

Trigger phrases:
- "extract tweet [URL]"
- "get tweet content from [URL]"
- "download x.com link [URL]"
- Any x.com/*/status/* or twitter.com/*/status/* URL

## Features

- ✅ No API credentials required
- ✅ Extract text, author, timestamp, media URLs
- ✅ Capture engagement metrics (likes, retweets, replies)
- ✅ Thread detection and extraction
- ✅ Optional media download
- ✅ Structured markdown output

## Requirements

- OpenClaw with browser tool enabled
- Profile: `openclaw` (or any browser profile)

## Limitations

- Cannot access protected/private tweets
- Cannot access login-required content (age-restricted, controversial)
- May be affected by X.com layout changes
- Subject to X.com rate limiting

## Documentation

See [SKILL.md](SKILL.md) for detailed workflow and technical documentation.

## Version

1.0.0 (2026-02-16)

```

### _meta.json

```json
{
  "owner": "chunhualiao",
  "slug": "x-extract",
  "displayName": "X Extract",
  "latest": {
    "version": "1.0.0",
    "publishedAt": 1771271796009,
    "commit": "https://github.com/openclaw/skills/commit/b759728cf09ffd62475c98e28be4667af6cb8f22"
  },
  "history": []
}

```