auto-skill-generator
Generate skills from web research. Given a topic like "how to use Stripe API" or "Prisma ORM", this skill searches for authoritative documentation, crawls the best source, and generates a ready-to-use .md skill file. Use when: (1) User wants to create a skill about a library/tool/API, (2) User says "create a skill for X", "make a skill about X", or "generate skill for X", (3) User wants to capture documentation as a reusable skill.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install benchflow-ai-skillsbench-docs-to-skill
Repository
Skill path: .claude/skills/docs-to-skill
Generate skills from web research. Given a topic like "how to use Stripe API" or "Prisma ORM", this skill searches for authoritative documentation, crawls the best source, and generates a ready-to-use .md skill file. Use when: (1) User wants to create a skill about a library/tool/API, (2) User says "create a skill for X", "make a skill about X", or "generate skill for X", (3) User wants to capture documentation as a reusable skill.
Open repositoryBest for
Primary workflow: Research & Ops.
Technical facets: Full Stack, Backend.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: benchflow-ai.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install auto-skill-generator into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/benchflow-ai/SkillsBench before adding auto-skill-generator to shared team environments
- Use auto-skill-generator for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: auto-skill-generator
description: >
Generate skills from web research. Given a topic like "how to use Stripe API"
or "Prisma ORM", this skill searches for authoritative documentation, crawls the best source,
and generates a ready-to-use .md skill file. Use when: (1) User wants to create a skill
about a library/tool/API, (2) User says "create a skill for X", "make a skill about X",
or "generate skill for X", (3) User wants to capture documentation as a reusable skill.
---
# Auto Skill Generator
Generate skills by researching and crawling authoritative documentation.
## Tool: fetch_docs.py
```bash
# Search - returns all URLs with snippets
python scripts/fetch_docs.py search "Modal GPU Python documentation"
# Crawl - with domain/path filtering to stay focused
python scripts/fetch_docs.py crawl \
--url https://modal.com/docs/guide/gpu \
--no-external \
--select-paths "/docs/.*" \
--instructions "Focus on GPU setup and code examples" \
--limit 30
```
## Workflow
### 1. Search for Documentation
```bash
python scripts/fetch_docs.py search "{topic} documentation"
```
Returns JSON with all URLs, titles, scores, and content snippets.
### 2. Select Best URL
Review search results and select based on:
- **Official docs**: `*.com/docs/`, `docs.*.com`, `*.readthedocs.io`
- **Content relevance**: Check snippets for API docs, code examples
- **Avoid**: Blog posts, changelogs, marketing, glossaries
### 3. Crawl with Filtering
```bash
python scripts/fetch_docs.py crawl \
--url {selected_url} \
--no-external \
--select-paths "/docs/.*" "/guide/.*" \
--instructions "Focus on API methods and code examples"
```
**Core Parameters:**
| Parameter | Description |
|-----------|-------------|
| `--url` | Required. URL to crawl |
| `--instructions` | Natural language guidance for crawler |
| `--limit` | Total pages (default: 50) |
| `--max-depth` | Link depth (default: 2) |
**Domain/Path Filtering (Critical):**
| Parameter | Description |
|-----------|-------------|
| `--no-external` | Block external domains |
| `--select-paths` | Regex patterns to include (e.g., `/docs/.*`) |
| `--exclude-paths` | Regex patterns to exclude (e.g., `/blog/.*`) |
| `--select-domains` | Regex for allowed domains |
| `--exclude-domains` | Regex for blocked domains |
**Quality Options:**
| Parameter | Description |
|-----------|-------------|
| `--extract-depth` | `basic` (1 credit/5 URLs) or `advanced` (2 credits/5 URLs) |
| `--format` | `markdown` or `text` |
| `--timeout` | Seconds (10-150, default: 150) |
### 4. Generate Skill File
From crawled content, create:
```markdown
---
name: {topic-slug}ß
description: >
{What the skill does}. Use when: {specific triggers}.
---
# {Topic Name}
## Quick Start
## Core API
## Common Patterns
```
## Output
- Location: `~/.claude/skills/{topic-slug}/SKILL.md`
- Extract ALL code blocks from crawled content
- Keep SKILL.md under 500 lines; split to `references/` if longer
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### scripts/fetch_docs.py
```python
#!/usr/bin/env python3
"""
Fetch documentation using Tavily Search + Crawl.
Two commands:
1. search: Find URLs with snippets - LLM reviews and decides which to crawl
2. crawl: Crawl a URL with full control over domain/path filtering
Usage:
python fetch_docs.py search "Modal GPU Python documentation"
python fetch_docs.py crawl \
--url https://modal.com/docs/guide/gpu \
--instructions "Focus on GPU setup and code examples" \
--no-external \
--select-paths "/docs/.*"
"""
import argparse
import json
import sys
try:
from tavily import TavilyClient
except ImportError:
print("❌ tavily-python not installed. Run:")
print(" pip install tavily-python")
sys.exit(1)
def search_docs(client: TavilyClient, query: str, max_results: int = 10) -> dict:
"""
Search for documentation. Returns all results with snippets
for LLM to review and decide which URL to crawl.
"""
response = client.search(query=query, search_depth="advanced", max_results=max_results, chunks_per_source=3)
results = response.get("results", [])
return {
"query": query,
"results_count": len(results),
"results": [{"url": r["url"], "title": r.get("title", ""), "score": r["score"], "content": r.get("content", "")} for r in results],
}
def crawl_docs(
client: TavilyClient,
url: str,
instructions: str | None = None,
max_depth: int = 2,
max_breadth: int = 50,
limit: int = 50,
select_paths: list | None = None,
exclude_paths: list | None = None,
select_domains: list | None = None,
exclude_domains: list | None = None,
allow_external: bool = True,
extract_depth: str = "basic",
format: str = "markdown",
timeout: int = 150,
) -> dict:
"""
Crawl a URL for full documentation content.
All parameters controlled by LLM.
"""
crawl_kwargs = {
"url": url,
"max_depth": max_depth,
"max_breadth": max_breadth,
"limit": limit,
"allow_external": allow_external,
"extract_depth": extract_depth,
"format": format,
"timeout": timeout,
}
if instructions:
crawl_kwargs["instructions"] = instructions
if select_paths:
crawl_kwargs["select_paths"] = select_paths
if exclude_paths:
crawl_kwargs["exclude_paths"] = exclude_paths
if select_domains:
crawl_kwargs["select_domains"] = select_domains
if exclude_domains:
crawl_kwargs["exclude_domains"] = exclude_domains
response = client.crawl(**crawl_kwargs)
results = response.get("results", [])
return {
"source_url": url,
"pages_crawled": len(results),
"pages": [{"url": page.get("url", ""), "content": page.get("raw_content", "")} for page in results],
}
def main():
parser = argparse.ArgumentParser(description="Fetch documentation using Tavily Search + Crawl")
subparsers = parser.add_subparsers(dest="command", help="Command to run")
# Search subcommand
search_parser = subparsers.add_parser("search", help="Search for documentation URLs")
search_parser.add_argument("query", help="Search query")
search_parser.add_argument("--max-results", type=int, default=10, help="Max results (default: 10)")
# Crawl subcommand
crawl_parser = subparsers.add_parser("crawl", help="Crawl a specific URL")
crawl_parser.add_argument("--url", required=True, help="URL to crawl")
crawl_parser.add_argument("--instructions", help="Natural language instructions for crawler")
# Depth/breadth/limit
crawl_parser.add_argument("--max-depth", type=int, default=2, help="Crawl depth (default: 2)")
crawl_parser.add_argument("--max-breadth", type=int, default=50, help="Links per level (default: 50)")
crawl_parser.add_argument("--limit", type=int, default=50, help="Total pages limit (default: 50)")
# Domain/path filtering - CRITICAL for staying on target
crawl_parser.add_argument("--select-paths", nargs="+", help="Regex patterns to include (e.g., /docs/.* /api/.*)")
crawl_parser.add_argument("--exclude-paths", nargs="+", help="Regex patterns to exclude (e.g., /blog/.* /pricing.*)")
crawl_parser.add_argument("--select-domains", nargs="+", help="Regex patterns for allowed domains")
crawl_parser.add_argument("--exclude-domains", nargs="+", help="Regex patterns for excluded domains")
crawl_parser.add_argument("--no-external", action="store_true", help="Block external domain links")
# Quality/format options
crawl_parser.add_argument(
"--extract-depth", choices=["basic", "advanced"], default="basic", help="basic (1 credit/5 URLs) or advanced (2 credits/5 URLs)"
)
crawl_parser.add_argument("--format", choices=["markdown", "text"], default="markdown", help="Output format (default: markdown)")
crawl_parser.add_argument("--timeout", type=int, default=150, help="Timeout in seconds (10-150)")
args = parser.parse_args()
if not args.command:
parser.print_help()
sys.exit(1)
client = TavilyClient()
if args.command == "search":
result = search_docs(client, args.query, args.max_results)
print(json.dumps(result, indent=2))
elif args.command == "crawl":
result = crawl_docs(
client,
args.url,
instructions=args.instructions,
max_depth=args.max_depth,
max_breadth=args.max_breadth,
limit=args.limit,
select_paths=args.select_paths,
exclude_paths=args.exclude_paths,
select_domains=args.select_domains,
exclude_domains=args.exclude_domains,
allow_external=not args.no_external,
extract_depth=args.extract_depth,
format=args.format,
timeout=args.timeout,
)
print(json.dumps(result, indent=2))
if __name__ == "__main__":
main()
```