Back to skills
SkillHub ClubWrite Technical DocsTech WriterFull Stack

doc-splitter

A sophisticated skill for intelligently splitting massive documentation sites into manageable sub-skills with router-based navigation, featuring excellent structure and mitigation strategies.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
98
Hot score
94
Updated
March 20, 2026
Overall rating
A7.9
Composite score
6.5
Best-practice grade
S96.0

Install command

npx @skill-hub/cli install jmagly-ai-writing-guide-doc-splitter
documentation-splittinglarge-scale-docsskill-routingworkflow-automation

Repository

jmagly/ai-writing-guide

Skill path: .factory/skills/doc-splitter

A sophisticated skill for intelligently splitting massive documentation sites into manageable sub-skills with router-based navigation, featuring excellent structure and mitigation strategies.

Open repository

Best for

Primary workflow: Write Technical Docs.

Technical facets: Tech Writer, Full Stack.

Target audience: Technical writers, documentation engineers, and developers managing large-scale documentation sites (10K+ pages) who need to create structured, navigable skills from massive doc repositories..

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: jmagly.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install doc-splitter into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/jmagly/ai-writing-guide before adding doc-splitter to shared team environments
  • Use doc-splitter for documentation workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: doc-splitter
description: Split large documentation (10K+ pages) into focused sub-skills with intelligent routing. Use for massive doc sites like Godot, AWS, or MSDN. Use when relevant to the task.
---


# Documentation Splitter Skill

## Purpose

Single responsibility: Split large documentation sites into multiple focused sub-skills with an optional router skill for intelligent navigation. (BP-4)

## Grounding Checkpoint (Archetype 1 Mitigation)

Before executing, VERIFY:

- [ ] Total page count is known (run estimation first)
- [ ] Documentation categories are identifiable
- [ ] Target skill size determined (default: 5,000 pages per skill)
- [ ] Router strategy selected (category, size, or hybrid)

**DO NOT split without understanding documentation structure.**

## Uncertainty Escalation (Archetype 2 Mitigation)

ASK USER instead of guessing when:

- Category boundaries unclear
- Optimal skill size uncertain for target use case
- Cross-references between sections complicate splitting
- Router vs flat structure decision needed

**NEVER arbitrarily split - seek user guidance on boundaries.**

## Context Scope (Archetype 3 Mitigation)

| Context Type | Included | Excluded |
|--------------|----------|----------|
| RELEVANT | Doc structure, categories, page counts | Actual page content |
| PERIPHERAL | Similar large doc examples | Other documentation |
| DISTRACTOR | Content quality concerns | Individual page issues |

## Size Guidelines

| Documentation Size | Recommendation | Strategy |
|-------------------|----------------|----------|
| < 5,000 pages | One skill | No splitting |
| 5,000 - 10,000 pages | Consider splitting | Category-based |
| 10,000 - 30,000 pages | Recommended | Router + Categories |
| 30,000+ pages | Strongly recommended | Router + Categories |

## Workflow Steps

### Step 1: Estimate Documentation Size (Grounding)

```bash
# Quick estimation with skill-seekers
skill-seekers estimate configs/large-docs.json

# Output:
# πŸ“Š ESTIMATION RESULTS
# βœ… Pages Discovered: 28,450
# πŸ“ˆ Estimated Total: 32,000
# ⏱️  Time Elapsed: 2.1 minutes
# πŸ’‘ Recommended: Split into 6-7 sub-skills
```

### Step 2: Analyze Category Structure

```bash
# Identify natural category boundaries
skill-seekers analyze --config configs/large-docs.json --categories

# Output:
# Categories detected:
# - scripting: 8,200 pages
# - 2d: 5,400 pages
# - 3d: 9,100 pages
# - physics: 4,300 pages
# - networking: 2,800 pages
# - editor: 2,200 pages
```

### Step 3: Choose Split Strategy

| Strategy | Best For | Description |
|----------|----------|-------------|
| `category` | Clear topic divisions | Split by documentation sections |
| `size` | Uniform distribution | Split every N pages |
| `router` | User navigation | Hub skill + specialized sub-skills |
| `hybrid` | Complex docs | Categories + size limits per category |

### Step 4: Execute Split

**Option A: With skill-seekers**

```bash
# Category-based split
skill-seekers split --config configs/godot.json --strategy category

# Router-based split (recommended for large docs)
skill-seekers split --config configs/godot.json --strategy router

# Size-based split
skill-seekers split --config configs/godot.json --strategy size --pages-per-skill 5000
```

**Option B: Manual split configuration**

```json
{
  "name": "godot",
  "max_pages": 40000,
  "split_strategy": "router",
  "split_config": {
    "target_pages_per_skill": 5000,
    "create_router": true,
    "categories": {
      "scripting": {
        "patterns": ["/scripting/", "/gdscript/", "/c_sharp/"],
        "max_pages": 8000
      },
      "2d": {
        "patterns": ["/2d/", "/sprite/", "/tilemap/"],
        "max_pages": 6000
      },
      "3d": {
        "patterns": ["/3d/", "/mesh/", "/spatial/"],
        "max_pages": 10000
      },
      "physics": {
        "patterns": ["/physics/", "/collision/", "/rigidbody/"],
        "max_pages": 5000
      }
    }
  }
}
```

### Step 5: Scrape Sub-Skills

```bash
# Scrape all sub-skills in parallel
for config in configs/godot-*.json; do
  skill-seekers scrape --config $config &
done
wait

# Or sequentially with progress
for config in configs/godot-*.json; do
  echo "Processing: $config"
  skill-seekers scrape --config $config
done
```

### Step 6: Generate Router Skill

```bash
# Auto-generate router from sub-skills
skill-seekers generate-router configs/godot-*.json

# Creates godot-router skill that intelligently routes queries
```

### Step 7: Validate Split Results

```bash
# Check sub-skill sizes
for dir in output/godot-*/; do
  echo "$dir: $(find $dir -name "*.md" | wc -l) files"
done

# Verify router coverage
cat output/godot-router/SKILL.md | grep -A 50 "## Sub-Skills"
```

## Recovery Protocol (Archetype 4 Mitigation)

On error:

1. **PAUSE** - Note which sub-skill failed
2. **DIAGNOSE** - Check error type:
   - `Category overlap` β†’ Refine URL patterns
   - `Uneven split` β†’ Adjust page limits
   - `Orphan pages` β†’ Add catch-all category
   - `Router incomplete` β†’ Regenerate after all sub-skills done
3. **ADAPT** - Modify split configuration
4. **RETRY** - Re-split affected category (max 3 attempts)
5. **ESCALATE** - Present split preview, ask user for boundary adjustments

## Checkpoint Support

State saved to: `.aiwg/working/checkpoints/doc-splitter/`

```
checkpoints/doc-splitter/
β”œβ”€β”€ estimation.json         # Page count results
β”œβ”€β”€ category_analysis.json  # Category breakdown
β”œβ”€β”€ split_plan.json         # Planned split configuration
β”œβ”€β”€ progress/
β”‚   β”œβ”€β”€ godot-scripting.json
β”‚   β”œβ”€β”€ godot-2d.json
β”‚   └── ...
└── router_draft.md         # Router skill draft
```

## Output Structure

After splitting large documentation:

```
configs/
β”œβ”€β”€ godot.json              # Original config
β”œβ”€β”€ godot-scripting.json    # Generated sub-config
β”œβ”€β”€ godot-2d.json
β”œβ”€β”€ godot-3d.json
β”œβ”€β”€ godot-physics.json
└── godot-router.json       # Router config

output/
β”œβ”€β”€ godot-scripting/        # Sub-skill
β”‚   β”œβ”€β”€ SKILL.md
β”‚   └── references/
β”œβ”€β”€ godot-2d/               # Sub-skill
β”œβ”€β”€ godot-3d/               # Sub-skill
β”œβ”€β”€ godot-physics/          # Sub-skill
└── godot-router/           # Router skill
    β”œβ”€β”€ SKILL.md            # Routing logic
    └── references/
        └── routing-table.md
```

## Router Skill Structure

The generated router skill:

```markdown
# Godot Documentation Router

## Purpose
Route queries to the appropriate specialized Godot sub-skill.

## Sub-Skills

| Topic | Skill | Coverage |
|-------|-------|----------|
| GDScript, C#, scripting patterns | godot-scripting | 8,200 pages |
| 2D graphics, sprites, tilemaps | godot-2d | 5,400 pages |
| 3D graphics, meshes, materials | godot-3d | 9,100 pages |
| Physics, collisions, rigid bodies | godot-physics | 4,300 pages |

## Routing Rules

1. **Scripting questions** β†’ godot-scripting
   - Keywords: script, gdscript, c#, function, variable, class

2. **2D graphics questions** β†’ godot-2d
   - Keywords: sprite, 2d, tilemap, animation2d, canvas

3. **3D graphics questions** β†’ godot-3d
   - Keywords: mesh, 3d, spatial, material, shader, camera3d

4. **Physics questions** β†’ godot-physics
   - Keywords: physics, collision, rigidbody, area, raycast

## Usage

Ask your question naturally. This router will direct you to the appropriate specialized skill.

Example:
- "How do I create a player movement script?" β†’ godot-scripting
- "How do I set up tilemap collisions?" β†’ godot-2d
- "How do I apply materials to a mesh?" β†’ godot-3d
```

## Troubleshooting

| Issue | Diagnosis | Solution |
|-------|-----------|----------|
| Uneven splits | Category size varies | Use hybrid strategy with max_pages |
| Orphan pages | URL patterns incomplete | Add catch-all or refine patterns |
| Router confusion | Overlapping keywords | Make routing rules more specific |
| Too many skills | Over-segmented | Merge related categories |

## References

- Skill Seekers Large Documentation: https://github.com/jmagly/Skill_Seekers/blob/main/docs/LARGE_DOCUMENTATION.md
- REF-001: Production-Grade Agentic Workflows (BP-4, BP-9 KISS)
- REF-002: LLM Failure Modes (Archetype 3 context filtering, Archetype 4 recovery)
doc-splitter | SkillHub