SkillHub ClubResearch & OpsFull StackBackend

auto-skill-generator

Generate skills from web research. Given a topic like "how to use Stripe API" or "Prisma ORM", this skill searches for authoritative documentation, crawls the best source, and generates a ready-to-use .md skill file. Use when: (1) User wants to create a skill about a library/tool/API, (2) User says "create a skill for X", "make a skill about X", or "generate skill for X", (3) User wants to capture documentation as a reusable skill.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

785

Hot score

Updated

March 20, 2026

Overall rating

C4.9

Composite score

4.9

Best-practice grade

A92.0

Install command

npx @skill-hub/cli install benchflow-ai-skillsbench-docs-to-skill

Repository

benchflow-ai/SkillsBench

Skill path: .claude/skills/docs-to-skill

Open repository

Best for

Primary workflow: Research & Ops.

Technical facets: Full Stack, Backend.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: benchflow-ai.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install auto-skill-generator into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/benchflow-ai/SkillsBench before adding auto-skill-generator to shared team environments
Use auto-skill-generator for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: auto-skill-generator
description: >
  Generate skills from web research. Given a topic like "how to use Stripe API"
  or "Prisma ORM", this skill searches for authoritative documentation, crawls the best source,
  and generates a ready-to-use .md skill file. Use when: (1) User wants to create a skill
  about a library/tool/API, (2) User says "create a skill for X", "make a skill about X",
  or "generate skill for X", (3) User wants to capture documentation as a reusable skill.
---

# Auto Skill Generator

Generate skills by researching and crawling authoritative documentation.

## Tool: fetch_docs.py

```bash
# Search - returns all URLs with snippets
python scripts/fetch_docs.py search "Modal GPU Python documentation"

# Crawl - with domain/path filtering to stay focused
python scripts/fetch_docs.py crawl \
  --url https://modal.com/docs/guide/gpu \
  --no-external \
  --select-paths "/docs/.*" \
  --instructions "Focus on GPU setup and code examples" \
  --limit 30
```

## Workflow

### 1. Search for Documentation

```bash
python scripts/fetch_docs.py search "{topic} documentation"
```

Returns JSON with all URLs, titles, scores, and content snippets.

### 2. Select Best URL

Review search results and select based on:
- **Official docs**: `*.com/docs/`, `docs.*.com`, `*.readthedocs.io`
- **Content relevance**: Check snippets for API docs, code examples
- **Avoid**: Blog posts, changelogs, marketing, glossaries

### 3. Crawl with Filtering

```bash
python scripts/fetch_docs.py crawl \
  --url {selected_url} \
  --no-external \
  --select-paths "/docs/.*" "/guide/.*" \
  --instructions "Focus on API methods and code examples"
```

**Core Parameters:**
| Parameter | Description |
|-----------|-------------|
| `--url` | Required. URL to crawl |
| `--instructions` | Natural language guidance for crawler |
| `--limit` | Total pages (default: 50) |
| `--max-depth` | Link depth (default: 2) |

**Domain/Path Filtering (Critical):**
| Parameter | Description |
|-----------|-------------|
| `--no-external` | Block external domains |
| `--select-paths` | Regex patterns to include (e.g., `/docs/.*`) |
| `--exclude-paths` | Regex patterns to exclude (e.g., `/blog/.*`) |
| `--select-domains` | Regex for allowed domains |
| `--exclude-domains` | Regex for blocked domains |

**Quality Options:**
| Parameter | Description |
|-----------|-------------|
| `--extract-depth` | `basic` (1 credit/5 URLs) or `advanced` (2 credits/5 URLs) |
| `--format` | `markdown` or `text` |
| `--timeout` | Seconds (10-150, default: 150) |

### 4. Generate Skill File

From crawled content, create:

```markdown
---
name: {topic-slug}ß
description: >
  {What the skill does}. Use when: {specific triggers}.
---

# {Topic Name}

## Quick Start
## Core API
## Common Patterns
```

## Output

- Location: `~/.claude/skills/{topic-slug}/SKILL.md`
- Extract ALL code blocks from crawled content
- Keep SKILL.md under 500 lines; split to `references/` if longer


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### scripts/fetch_docs.py

```python
#!/usr/bin/env python3
"""
Fetch documentation using Tavily Search + Crawl.

Two commands:
1. search: Find URLs with snippets - LLM reviews and decides which to crawl
2. crawl: Crawl a URL with full control over domain/path filtering

Usage:
    python fetch_docs.py search "Modal GPU Python documentation"

    python fetch_docs.py crawl \
      --url https://modal.com/docs/guide/gpu \
      --instructions "Focus on GPU setup and code examples" \
      --no-external \
      --select-paths "/docs/.*"
"""

import argparse
import json
import sys

try:
    from tavily import TavilyClient
except ImportError:
    print("❌ tavily-python not installed. Run:")
    print("   pip install tavily-python")
    sys.exit(1)


def search_docs(client: TavilyClient, query: str, max_results: int = 10) -> dict:
    """
    Search for documentation. Returns all results with snippets
    for LLM to review and decide which URL to crawl.
    """
    response = client.search(query=query, search_depth="advanced", max_results=max_results, chunks_per_source=3)

    results = response.get("results", [])

    return {
        "query": query,
        "results_count": len(results),
        "results": [{"url": r["url"], "title": r.get("title", ""), "score": r["score"], "content": r.get("content", "")} for r in results],
    }


def crawl_docs(
    client: TavilyClient,
    url: str,
    instructions: str | None = None,
    max_depth: int = 2,
    max_breadth: int = 50,
    limit: int = 50,
    select_paths: list | None = None,
    exclude_paths: list | None = None,
    select_domains: list | None = None,
    exclude_domains: list | None = None,
    allow_external: bool = True,
    extract_depth: str = "basic",
    format: str = "markdown",
    timeout: int = 150,
) -> dict:
    """
    Crawl a URL for full documentation content.
    All parameters controlled by LLM.
    """
    crawl_kwargs = {
        "url": url,
        "max_depth": max_depth,
        "max_breadth": max_breadth,
        "limit": limit,
        "allow_external": allow_external,
        "extract_depth": extract_depth,
        "format": format,
        "timeout": timeout,
    }

    if instructions:
        crawl_kwargs["instructions"] = instructions
    if select_paths:
        crawl_kwargs["select_paths"] = select_paths
    if exclude_paths:
        crawl_kwargs["exclude_paths"] = exclude_paths
    if select_domains:
        crawl_kwargs["select_domains"] = select_domains
    if exclude_domains:
        crawl_kwargs["exclude_domains"] = exclude_domains

    response = client.crawl(**crawl_kwargs)

    results = response.get("results", [])

    return {
        "source_url": url,
        "pages_crawled": len(results),
        "pages": [{"url": page.get("url", ""), "content": page.get("raw_content", "")} for page in results],
    }


def main():
    parser = argparse.ArgumentParser(description="Fetch documentation using Tavily Search + Crawl")
    subparsers = parser.add_subparsers(dest="command", help="Command to run")

    # Search subcommand
    search_parser = subparsers.add_parser("search", help="Search for documentation URLs")
    search_parser.add_argument("query", help="Search query")
    search_parser.add_argument("--max-results", type=int, default=10, help="Max results (default: 10)")

    # Crawl subcommand
    crawl_parser = subparsers.add_parser("crawl", help="Crawl a specific URL")
    crawl_parser.add_argument("--url", required=True, help="URL to crawl")
    crawl_parser.add_argument("--instructions", help="Natural language instructions for crawler")

    # Depth/breadth/limit
    crawl_parser.add_argument("--max-depth", type=int, default=2, help="Crawl depth (default: 2)")
    crawl_parser.add_argument("--max-breadth", type=int, default=50, help="Links per level (default: 50)")
    crawl_parser.add_argument("--limit", type=int, default=50, help="Total pages limit (default: 50)")

    # Domain/path filtering - CRITICAL for staying on target
    crawl_parser.add_argument("--select-paths", nargs="+", help="Regex patterns to include (e.g., /docs/.* /api/.*)")
    crawl_parser.add_argument("--exclude-paths", nargs="+", help="Regex patterns to exclude (e.g., /blog/.* /pricing.*)")
    crawl_parser.add_argument("--select-domains", nargs="+", help="Regex patterns for allowed domains")
    crawl_parser.add_argument("--exclude-domains", nargs="+", help="Regex patterns for excluded domains")
    crawl_parser.add_argument("--no-external", action="store_true", help="Block external domain links")

    # Quality/format options
    crawl_parser.add_argument(
        "--extract-depth", choices=["basic", "advanced"], default="basic", help="basic (1 credit/5 URLs) or advanced (2 credits/5 URLs)"
    )
    crawl_parser.add_argument("--format", choices=["markdown", "text"], default="markdown", help="Output format (default: markdown)")
    crawl_parser.add_argument("--timeout", type=int, default=150, help="Timeout in seconds (10-150)")

    args = parser.parse_args()

    if not args.command:
        parser.print_help()
        sys.exit(1)

    client = TavilyClient()

    if args.command == "search":
        result = search_docs(client, args.query, args.max_results)
        print(json.dumps(result, indent=2))

    elif args.command == "crawl":
        result = crawl_docs(
            client,
            args.url,
            instructions=args.instructions,
            max_depth=args.max_depth,
            max_breadth=args.max_breadth,
            limit=args.limit,
            select_paths=args.select_paths,
            exclude_paths=args.exclude_paths,
            select_domains=args.select_domains,
            exclude_domains=args.exclude_domains,
            allow_external=not args.no_external,
            extract_depth=args.extract_depth,
            format=args.format,
            timeout=args.timeout,
        )
        print(json.dumps(result, indent=2))


if __name__ == "__main__":
    main()

```