Back to skills
SkillHub ClubShip Full StackFull Stack

pdf-split

PDF chapter splitting

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
1
Hot score
77
Updated
March 19, 2026
Overall rating
C2.4
Composite score
2.4
Best-practice grade
B78.7

Install command

npx @skill-hub/cli install jongwony-cc-plugin-pdf-split

Repository

jongwony/cc-plugin

Skill path: pdf-split/skills/pdf-split

PDF chapter splitting

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: jongwony.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install pdf-split into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/jongwony/cc-plugin before adding pdf-split to shared team environments
  • Use pdf-split for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: pdf-split
description: PDF chapter splitting
---

# PDF Chapter Splitting

Split PDF documents into individual chapter files based on table of contents or text pattern detection.

## Overview

This skill handles PDF splitting when:
- A book or document needs to be divided by chapters
- The PDF has embedded bookmarks/outlines, OR
- Chapter boundaries can be detected from text patterns (e.g., "Chapter 1:", "Part One")

## Prerequisites

Install pypdf via uv inline script dependency:
```python
# /// script
# dependencies = ["pypdf"]
# ///
```

## Workflow

### Phase 1: Analyze PDF Structure

Run `scripts/extract_toc.py` to analyze the PDF:

```bash
uv run ~/.claude/skills/pdf-split/scripts/extract_toc.py <pdf_path>
```

Output includes:
- Total page count
- Embedded bookmarks/outline (if present)
- Detected chapter patterns from text

### Phase 2: Define Chapter Boundaries

Based on Phase 1 output, define chapter boundaries as a list of tuples:
```python
chapters = [
    (start_page, end_page, "chapter_name"),
    # ...
]
```

**If bookmarks exist**: Use bookmark page numbers directly.

**If no bookmarks**:
1. Search for chapter heading patterns in text
2. Verify boundaries by checking page content
3. Present proposed boundaries for user confirmation

### Phase 3: Execute Split

Run `scripts/split_by_chapters.py` with the chapter definitions:

```bash
uv run ~/.claude/skills/pdf-split/scripts/split_by_chapters.py <pdf_path> <output_dir> --chapters '<json_chapters>'
```

Example:
```bash
uv run ~/.claude/skills/pdf-split/scripts/split_by_chapters.py \
  ~/book.pdf \
  ~/book_chapters \
  --chapters '[[1,22,"00_Intro"],[23,45,"01_Chapter1"]]'
```

## Common Chapter Patterns

| Pattern | Regex | Example |
|---------|-------|---------|
| Numbered | `Chapter\s+\d+` | "Chapter 1", "Chapter 12" |
| Part + Chapter | `Part\s+\w+.*Chapter` | "Part One: Chapter 1" |
| Section | `Section\s+\d+` | "Section 1.1" |
| Roman numerals | `Chapter\s+[IVXLC]+` | "Chapter IV" |

## Edge Cases

### Large Chapter Detection (100+ pages)
When a detected chapter exceeds 100 pages, verify the boundary:
- Check if appendix content is included
- Look for sub-sections that should be separate files

### Missing TOC
When no bookmarks or clear patterns exist:
1. Extract first 20 pages of text
2. Look for manual TOC listing
3. Parse page numbers from TOC text

### Duplicate Pattern Matches
Filter results to keep only actual chapter starts:
- Chapter headings typically appear at page top
- Ignore references to chapters in body text (e.g., "see Chapter 3")

## Output Structure

```
output_dir/
├── 00_Front_Matter.pdf
├── 01_Chapter_Name.pdf
├── 02_Chapter_Name.pdf
├── ...
└── Appendix.pdf
```

Naming convention: `{index:02d}_{sanitized_name}.pdf`

## Integration Notes

### For NotebookLM Upload
Split PDFs are suitable for NotebookLM sources:
- Each chapter as separate source enables targeted queries
- Recommended: Keep files under 500KB when possible
- Large chapters may need further splitting

### For RAG Systems
Chapter-level splitting provides natural semantic boundaries for:
- Document chunking
- Retrieval granularity
- Citation accuracy

## Scripts Reference

| Script | Purpose |
|--------|---------|
| `scripts/extract_toc.py` | Analyze PDF, extract bookmarks and detect chapter patterns |
| `scripts/split_by_chapters.py` | Execute split with provided chapter definitions |

## Additional Resources

- **`references/pypdf-guide.md`** - pypdf API quick reference for custom operations