decodo-scraper
Search Google, scrape web pages, Amazon product pages, YouTube subtitles, or Reddit (post/subreddit) using the Decodo Scraper OpenClaw Skill.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install openclaw-skills-decodo-scraper
Repository
Skill path: skills/donatasdecodo/decodo-scraper
Search Google, scrape web pages, Amazon product pages, YouTube subtitles, or Reddit (post/subreddit) using the Decodo Scraper OpenClaw Skill.
Open repositoryBest for
Primary workflow: Ship Full Stack.
Technical facets: Full Stack.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: openclaw.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install decodo-scraper into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/openclaw/skills before adding decodo-scraper to shared team environments
- Use decodo-scraper for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: decodo-scraper
description: Search Google, scrape web pages, Amazon product pages, YouTube subtitles, or Reddit (post/subreddit) using the Decodo Scraper OpenClaw Skill.
homepage: https://decodo.com
credentials:
- DECODO_AUTH_TOKEN
env:
required:
- DECODO_AUTH_TOKEN
---
# Decodo Scraper OpenClaw Skill
Use this skill to search Google, scrape any URL, or fetch YouTube subtitles via the [Decodo Web Scraping API](https://help.decodo.com/docs/web-scraping-api-google-search). **Search** outputs a JSON object of result sections; **Scrape URL** outputs plain markdown; **Amazon** and **Amazon search** output parsed product-page or search results (JSON). Amazon search uses `--query`. **YouTube subtitles** outputs transcript/subtitles. **Reddit post** and **Reddit subreddit** output post/listing content (JSON).
**Authentication:** Set `DECODO_AUTH_TOKEN` (Basic auth token from Decodo Dashboard → Scraping APIs) in your environment or in a `.env` file in the repo root.
**Errors:** On failure the script writes a JSON error to stderr and exits with code 1.
---
## Tools
### 1. Search Google
Use this to find URLs, answers, or structured search results. The API returns a JSON object whose `results` key contains several sections (not all may be present for every query):
| Section | Description |
|--------|--------------|
| `organic` | Main search results (titles, links, snippets). |
| `ai_overviews` | AI-generated overviews or summaries when Google shows them. |
| `paid` | Paid/sponsored results (ads). |
| `related_questions` | “People also ask”–style questions and answers. |
| `related_searches` | Suggested related search queries. |
| `discussions_and_forums` | Forum or discussion results (e.g. Reddit, Stack Exchange). |
The script outputs only the inner `results` object (these sections); pagination info (`page`, `last_visible_page`, `parse_status_code`) is not included.
**Command:**
```bash
python3 tools/scrape.py --target google_search --query "your search query"
```
**Examples:**
```bash
python3 tools/scrape.py --target google_search --query "best laptops 2025"
python3 tools/scrape.py --target google_search --query "python requests tutorial"
```
Optional: `--geo us` or `--locale en` for location/language.
---
### 2. Scrape URL
Use this to get the content of a specific web page. By default the API returns content as **Markdown** (cleaner for LLMs and lower token usage).
**Command:**
```bash
python3 tools/scrape.py --target universal --url "https://example.com"
```
**Examples:**
```bash
python3 tools/scrape.py --target universal --url "https://example.com"
python3 tools/scrape.py --target universal --url "https://news.ycombinator.com/"
```
---
### 3. Amazon product page
Use this to get parsed data from an Amazon product (or other Amazon) page. Pass the product page URL as `--url`. The script sends `parse: true` and outputs the inner **results** object (e.g. `ads`, product details, etc.).
**Command:**
```bash
python3 tools/scrape.py --target amazon --url "https://www.amazon.com/dp/PRODUCT_ID"
```
**Examples:**
```bash
python3 tools/scrape.py --target amazon --url "https://www.amazon.com/dp/B09H74FXNW"
```
---
### 4. Amazon search
Use this to search Amazon and get parsed results (search results list, delivery_postcode, etc.). Pass the search query as `--query`.
**Command:**
```bash
python3 tools/scrape.py --target amazon_search --query "your search query"
```
**Examples:**
```bash
python3 tools/scrape.py --target amazon_search --query "laptop"
```
---
### 5. YouTube subtitles
Use this to get subtitles/transcript for a YouTube video. Pass the **video ID** (e.g. from `youtube.com/watch?v=VIDEO_ID`) as `--query`.
**Command:**
```bash
python3 tools/scrape.py --target youtube_subtitles --query "VIDEO_ID"
```
**Examples:**
```bash
python3 tools/scrape.py --target youtube_subtitles --query "dFu9aKJoqGg"
```
---
### 6. Reddit post
Use this to get the content of a Reddit post (thread). Pass the full post URL as `--url`.
**Command:**
```bash
python3 tools/scrape.py --target reddit_post --url "https://www.reddit.com/r/SUBREDDIT/comments/ID/..."
```
**Examples:**
```bash
python3 tools/scrape.py --target reddit_post --url "https://www.reddit.com/r/nba/comments/17jrqc5/serious_next_day_thread_postgame_discussion/"
```
---
### 7. Reddit subreddit
Use this to get the listing (posts) of a Reddit subreddit. Pass the subreddit URL as `--url`.
**Command:**
```bash
python3 tools/scrape.py --target reddit_subreddit --url "https://www.reddit.com/r/SUBREDDIT/"
```
**Examples:**
```bash
python3 tools/scrape.py --target reddit_subreddit --url "https://www.reddit.com/r/nba/"
```
---
## Summary
| Action | Target | Argument | Example command |
|--------------------|----------------------|------------|-----------------|
| Search | `google_search` | `--query` | `python3 tools/scrape.py --target google_search --query "laptop"` |
| Scrape page | `universal` | `--url` | `python3 tools/scrape.py --target universal --url "https://example.com"` |
| Amazon product | `amazon` | `--url` | `python3 tools/scrape.py --target amazon --url "https://www.amazon.com/dp/B09H74FXNW"` |
| Amazon search | `amazon_search` | `--query` | `python3 tools/scrape.py --target amazon_search --query "laptop"` |
| YouTube subtitles | `youtube_subtitles` | `--query` | `python3 tools/scrape.py --target youtube_subtitles --query "dFu9aKJoqGg"` |
| Reddit post | `reddit_post` | `--url` | `python3 tools/scrape.py --target reddit_post --url "https://www.reddit.com/r/nba/comments/17jrqc5/..."` |
| Reddit subreddit | `reddit_subreddit` | `--url` | `python3 tools/scrape.py --target reddit_subreddit --url "https://www.reddit.com/r/nba/"` |
**Output:** Search → JSON (sections). Scrape URL → markdown. Amazon / Amazon search → JSON (results e.g. ads, product info, delivery_postcode). YouTube → transcript. Reddit → JSON (content).
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### tools/scrape.py
```python
#!/usr/bin/env python3
"""Decodo Scraper OpenClaw Skill: search Google, Amazon search/product, scrape URL, YouTube subtitles, Reddit post or subreddit."""
import argparse
import json
import os
import sys
import requests
from dotenv import load_dotenv
load_dotenv(os.path.join(os.path.dirname(os.path.abspath(__file__)), "..", ".env"))
SCRAPE_URL = "https://scraper-api.decodo.com/v2/scrape"
TARGETS_NEED_QUERY = ("google_search", "amazon_search", "youtube_subtitles")
TARGETS_NEED_URL = ("universal", "amazon", "reddit_post", "reddit_subreddit")
def _first_result_content(data):
"""Get results[0].content from API response, or None if missing."""
results = data.get("results") or []
if not results or not isinstance(results[0], dict):
return None
return results[0].get("content")
def _err(msg, hint=None):
obj = {"error": msg}
if hint:
obj["hint"] = hint
print(json.dumps(obj), file=sys.stderr)
def scrape(args):
token = os.environ.get("DECODO_AUTH_TOKEN")
if not token:
_err("Set DECODO_AUTH_TOKEN.")
sys.exit(1)
headers = {"Content-Type": "application/json", "Authorization": f"Basic {token}", "x-integration": "openclaw"}
payloads = {
"google_search": {"target": "google_search", "query": args.query, "headless": "html", "parse": True},
"youtube_subtitles": {"target": "youtube_subtitles", "query": args.query},
"reddit_post": {"target": "reddit_post", "url": args.url},
"reddit_subreddit": {"target": "reddit_subreddit", "url": args.url},
"amazon": {"target": "amazon", "url": args.url, "parse": True},
"amazon_search": {"target": "amazon_search", "query": args.query, "parse": True},
"universal": {"target": "universal", "url": args.url, "markdown": True},
}
payload = payloads[args.target]
if args.target == "google_search":
if args.geo:
payload["geo"] = args.geo
if args.locale:
payload["locale"] = args.locale
try:
resp = requests.post(SCRAPE_URL, json=payload, headers=headers, timeout=120)
resp.raise_for_status()
except requests.RequestException as e:
status_code = e.response.status_code if e.response is not None else None
print(json.dumps({"error": str(e), "status_code": status_code}), file=sys.stderr)
sys.exit(1)
try:
data = resp.json()
except json.JSONDecodeError:
_err("Invalid JSON in response")
sys.exit(1)
content = _first_result_content(data)
if content is None:
_err("Empty or unexpected response structure")
if args.target in ("google_search", "universal"):
print(resp.text)
sys.exit(1)
if args.target == "google_search":
inner = (content or {}).get("results", {}).get("results") if isinstance(content, dict) else None
if inner is not None:
print(json.dumps(inner, ensure_ascii=False))
else:
_err("Could not extract search results", "API structure may have changed")
print(resp.text)
elif args.target == "youtube_subtitles":
print(content if isinstance(content, str) else json.dumps(content, ensure_ascii=False))
elif args.target in ("reddit_post", "reddit_subreddit"):
print(json.dumps(content, ensure_ascii=False))
elif args.target in ("amazon", "amazon_search"):
inner = (content or {}).get("results") if isinstance(content, dict) else None
if inner is not None:
print(json.dumps(inner, ensure_ascii=False))
else:
_err(f"Could not extract {args.target} results")
sys.exit(1)
else:
# universal
print(content if isinstance(content, str) else json.dumps(content, ensure_ascii=False))
def main():
parser = argparse.ArgumentParser(description="Decodo Scraper OpenClaw Skill: search Google, scrape URL, Amazon, YouTube subtitles, Reddit.")
parser.add_argument("--target", required=True, choices=["google_search", "universal", "amazon", "amazon_search", "youtube_subtitles", "reddit_post", "reddit_subreddit"])
parser.add_argument("--query", help="Required for google_search, amazon_search, or youtube_subtitles (video ID for YouTube).")
parser.add_argument("--url", help="Required for universal, amazon, reddit_post, or reddit_subreddit.")
parser.add_argument("--geo", help="Google search geo (e.g. us, gb).")
parser.add_argument("--locale", help="Google search locale (e.g. en, de).")
args = parser.parse_args()
if args.target in TARGETS_NEED_QUERY and not args.query:
parser.error(f"--query required for {args.target}")
if args.target in TARGETS_NEED_URL and not args.url:
parser.error(f"--url required for {args.target}")
scrape(args)
if __name__ == "__main__":
main()
```
---
## Skill Companion Files
> Additional files collected from the skill directory layout.
### README.md
```markdown
# Decodo Scraper OpenClaw Skill


<p align="center">
<p align="center">
<a href="https://dashboard.decodo.com/scrapers/pricing?utm_source=github&utm_medium=social&utm_campaign=openclaw"><img src="https://github.com/user-attachments/assets/13b08523-32b0-4c85-8e99-580d7c2a9055"></a>
[](https://discord.gg/Ja8dqKgvbZ)
## Overview
This [OpenClaw](https://openclaw.ai/) skill integrates [Decodo's Web Scraping API](https://decodo.com/scraping/web) into any OpenClaw-compatible AI agent or LLM pipeline. It exposes seven tools that agents can call directly:
- ```google_search``` – query Google Search and receive structured JSON (organic results, AI overviews, paid, related questions, and more)
- ```universal``` – fetch and parse any public webpage, returning clean Markdown
- ```amazon``` – fetch parsed Amazon product-page data (e.g. ads, product info) by product URL
- ```amazon_search``` – search Amazon by query; get parsed results (e.g. results list, delivery_postcode)
- ```youtube_subtitles``` – fetch subtitles/transcript for a YouTube video (by video ID)
- ```reddit_post``` – fetch a Reddit post’s content (by post URL)
- ```reddit_subreddit``` – fetch a Reddit subreddit listing (by subreddit URL)
Backed by Decodo's residential and datacenter proxy infrastructure, the skill handles JavaScript rendering, bot detection bypass, and geo-targeting out of the box.
## Features
- Real-time Google Search results scraping
- Universal URL scraping
- Amazon product page parsing (by URL)
- Amazon search (by query)
- YouTube subtitles/transcript by video ID
- Reddit post content by URL
- Reddit subreddit listing by URL
- Structured JSON or Markdown results
- Simple CLI interface compatible with any OpenClaw agent runtime
- Minimal dependencies — just Python with Requests
- Authentication via a single Base64 token from the [Decodo dashboard](https://dashboard.decodo.com/)
## Prerequisites
- [Python 3.9](https://www.python.org/downloads/) or higher
- [Decodo account](https://dashboard.decodo.com/) with access to the Web Scraping API
- [OpenClaw](https://openclaw.ai/) installed on your machine
## Setup
1. Clone this repo.
```
git clone https://github.com/Decodo/decodo-openclaw-skill.git
```
2. Install dependencies.
```
pip install -r requirements.txt
```
3. Set your Decodo auth token as an environment variable (or create a ```.env``` file in the project root):
```
# Terminal
export DECODO_AUTH_TOKEN="your_base64_token"
```
```
# .env file
DECODO_AUTH_TOKEN=your_base64_token
```
## OpenClaw agent integration
This skill ships with a [SKILL.md](https://github.com/Decodo/decodo-openclaw-skill/blob/main/SKILL.md) file that defines all tools in the OpenClaw skill format. OpenClaw-compatible agents automatically discover and invoke the tools from this file without additional configuration.
To register the skill with your OpenClaw agent, point it at the repo root — the agent will read ```SKILL.md``` and expose ```google_search```, ```universal```, ```amazon```, ```amazon_search```, ```youtube_subtitles```, ```reddit_post```, and ```reddit_subreddit``` as callable tools.
## Usage
### Google Search
Search Google and receive structured JSON. Results are grouped by type: **organic** (main results), **ai_overviews** (AI-generated summaries), **paid** (ads), **related_questions**, **related_searches**, **discussions_and_forums**, and others depending on the query.
```
python3 tools/scrape.py --target google_search --query "your query"
```
### Scrape a URL
Fetch and convert any webpage to clean Markdown file:
```
python3 tools/scrape.py --target universal --url "https://example.com/article"
```
### Amazon product page
Fetch parsed data from an Amazon product page (e.g. ads, product details). Use the product URL:
```
python3 tools/scrape.py --target amazon --url "https://www.amazon.com/dp/B09H74FXNW"
```
### Amazon search
Search Amazon and get parsed results (e.g. results list, delivery_postcode):
```
python3 tools/scrape.py --target amazon_search --query "laptop"
```
### YouTube subtitles
Fetch subtitles/transcript for a YouTube video (use the video ID, e.g. from `?v=VIDEO_ID`):
```
python3 tools/scrape.py --target youtube_subtitles --query "dFu9aKJoqGg"
```
### Reddit post
Fetch a Reddit post’s content (use the full post URL):
```
python3 tools/scrape.py --target reddit_post --url "https://www.reddit.com/r/nba/comments/17jrqc5/serious_next_day_thread_postgame_discussion/"
```
### Reddit subreddit
Fetch a Reddit subreddit listing (use the subreddit URL):
```
python3 tools/scrape.py --target reddit_subreddit --url "https://www.reddit.com/r/nba/"
```
## Related resources
[Decodo Web Scraping API documentation](https://help.decodo.com/docs/web-scraping-api-introduction)
[OpenClaw documentation](https://docs.openclaw.ai/start/getting-started)
[ClaWHub – OpenClaw skill registry](https://docs.openclaw.ai/tools/clawhub)
## License
All code is released under the [MIT License](https://github.com/Decodo/Decodo/blob/master/LICENSE).
```
### _meta.json
```json
{
"owner": "donatasdecodo",
"slug": "decodo-scraper",
"displayName": "Decodo Scraper",
"latest": {
"version": "1.1.0",
"publishedAt": 1771534138338,
"commit": "https://github.com/openclaw/skills/commit/4f55d5b7de7877cc29268933b6735ba9aa6f55ca"
},
"history": [
{
"version": "0.1.0",
"publishedAt": 1770994986688,
"commit": "https://github.com/openclaw/skills/commit/a37826b700ded42bdfc6f504af6314f11bc9976c"
}
]
}
```