Back to skills
SkillHub ClubShip Full StackFull Stack
pdf-split
PDF chapter splitting
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Stars
1
Hot score
77
Updated
March 19, 2026
Overall rating
C2.4
Composite score
2.4
Best-practice grade
B78.7
Install command
npx @skill-hub/cli install jongwony-cc-plugin-pdf-split
Repository
Best for
Primary workflow: Ship Full Stack.
Technical facets: Full Stack.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: jongwony.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install pdf-split into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/jongwony/cc-plugin before adding pdf-split to shared team environments
- Use pdf-split for development workflows
Works across
Claude CodeCodex CLIGemini CLIOpenCode
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: pdf-split
description: PDF chapter splitting
---
# PDF Chapter Splitting
Split PDF documents into individual chapter files based on table of contents or text pattern detection.
## Overview
This skill handles PDF splitting when:
- A book or document needs to be divided by chapters
- The PDF has embedded bookmarks/outlines, OR
- Chapter boundaries can be detected from text patterns (e.g., "Chapter 1:", "Part One")
## Prerequisites
Install pypdf via uv inline script dependency:
```python
# /// script
# dependencies = ["pypdf"]
# ///
```
## Workflow
### Phase 1: Analyze PDF Structure
Run `scripts/extract_toc.py` to analyze the PDF:
```bash
uv run ~/.claude/skills/pdf-split/scripts/extract_toc.py <pdf_path>
```
Output includes:
- Total page count
- Embedded bookmarks/outline (if present)
- Detected chapter patterns from text
### Phase 2: Define Chapter Boundaries
Based on Phase 1 output, define chapter boundaries as a list of tuples:
```python
chapters = [
(start_page, end_page, "chapter_name"),
# ...
]
```
**If bookmarks exist**: Use bookmark page numbers directly.
**If no bookmarks**:
1. Search for chapter heading patterns in text
2. Verify boundaries by checking page content
3. Present proposed boundaries for user confirmation
### Phase 3: Execute Split
Run `scripts/split_by_chapters.py` with the chapter definitions:
```bash
uv run ~/.claude/skills/pdf-split/scripts/split_by_chapters.py <pdf_path> <output_dir> --chapters '<json_chapters>'
```
Example:
```bash
uv run ~/.claude/skills/pdf-split/scripts/split_by_chapters.py \
~/book.pdf \
~/book_chapters \
--chapters '[[1,22,"00_Intro"],[23,45,"01_Chapter1"]]'
```
## Common Chapter Patterns
| Pattern | Regex | Example |
|---------|-------|---------|
| Numbered | `Chapter\s+\d+` | "Chapter 1", "Chapter 12" |
| Part + Chapter | `Part\s+\w+.*Chapter` | "Part One: Chapter 1" |
| Section | `Section\s+\d+` | "Section 1.1" |
| Roman numerals | `Chapter\s+[IVXLC]+` | "Chapter IV" |
## Edge Cases
### Large Chapter Detection (100+ pages)
When a detected chapter exceeds 100 pages, verify the boundary:
- Check if appendix content is included
- Look for sub-sections that should be separate files
### Missing TOC
When no bookmarks or clear patterns exist:
1. Extract first 20 pages of text
2. Look for manual TOC listing
3. Parse page numbers from TOC text
### Duplicate Pattern Matches
Filter results to keep only actual chapter starts:
- Chapter headings typically appear at page top
- Ignore references to chapters in body text (e.g., "see Chapter 3")
## Output Structure
```
output_dir/
├── 00_Front_Matter.pdf
├── 01_Chapter_Name.pdf
├── 02_Chapter_Name.pdf
├── ...
└── Appendix.pdf
```
Naming convention: `{index:02d}_{sanitized_name}.pdf`
## Integration Notes
### For NotebookLM Upload
Split PDFs are suitable for NotebookLM sources:
- Each chapter as separate source enables targeted queries
- Recommended: Keep files under 500KB when possible
- Large chapters may need further splitting
### For RAG Systems
Chapter-level splitting provides natural semantic boundaries for:
- Document chunking
- Retrieval granularity
- Citation accuracy
## Scripts Reference
| Script | Purpose |
|--------|---------|
| `scripts/extract_toc.py` | Analyze PDF, extract bookmarks and detect chapter patterns |
| `scripts/split_by_chapters.py` | Execute split with provided chapter definitions |
## Additional Resources
- **`references/pypdf-guide.md`** - pypdf API quick reference for custom operations