markdown-tools
Converts documents to markdown (PDFs, Word docs, PowerPoint, Confluence exports) with Windows/WSL path handling. Activates when converting .doc/.docx/PDF/PPTX files to markdown, processing Confluence exports, handling Windows/WSL path conversions, extracting images from PDFs, or working with markitdown utility.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install nguyendinhquocx-code-ai-markdown-tools
Repository
Skill path: skills/markdown-tools
Converts documents to markdown (PDFs, Word docs, PowerPoint, Confluence exports) with Windows/WSL path handling. Activates when converting .doc/.docx/PDF/PPTX files to markdown, processing Confluence exports, handling Windows/WSL path conversions, extracting images from PDFs, or working with markitdown utility.
Open repositoryBest for
Primary workflow: Write Technical Docs.
Technical facets: Full Stack, Tech Writer.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: nguyendinhquocx.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install markdown-tools into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/nguyendinhquocx/code-ai before adding markdown-tools to shared team environments
- Use markdown-tools for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: markdown-tools
description: Converts documents to markdown (PDFs, Word docs, PowerPoint, Confluence exports) with Windows/WSL path handling. Activates when converting .doc/.docx/PDF/PPTX files to markdown, processing Confluence exports, handling Windows/WSL path conversions, extracting images from PDFs, or working with markitdown utility.
---
# Markdown Tools
Convert documents to markdown with image extraction and Windows/WSL path handling.
## Quick Start
### Install markitdown with PDF Support
```bash
# IMPORTANT: Use [pdf] extra for PDF support
uv tool install "markitdown[pdf]"
# Or via pip
pip install "markitdown[pdf]"
```
### Basic Conversion
```bash
markitdown "document.pdf" -o output.md
# Or redirect: markitdown "document.pdf" > output.md
```
## PDF Conversion with Images
markitdown extracts text only. For PDFs with images, use this workflow:
### Step 1: Convert Text
```bash
markitdown "document.pdf" -o output.md
```
### Step 2: Extract Images
```bash
# Create assets directory alongside the markdown
mkdir -p assets
# Extract images using PyMuPDF
uv run --with pymupdf python scripts/extract_pdf_images.py "document.pdf" ./assets
```
### Step 3: Add Image References
Insert image references in the markdown where needed:
```markdown

```
### Step 4: Format Cleanup
markitdown output often needs manual fixes:
- Add proper heading levels (`#`, `##`, `###`)
- Reconstruct tables in markdown format
- Fix broken line breaks
- Restore indentation structure
## Path Conversion (Windows/WSL)
```bash
# Windows → WSL conversion
C:\Users\name\file.pdf → /mnt/c/Users/name/file.pdf
# Use helper script
python scripts/convert_path.py "C:\Users\name\Documents\file.pdf"
```
## Common Issues
**"dependencies needed to read .pdf files"**
```bash
# Install with PDF support
uv tool install "markitdown[pdf]" --force
```
**FontBBox warnings during PDF conversion**
- These are harmless font parsing warnings, output is still correct
**Images missing from output**
- Use `scripts/extract_pdf_images.py` to extract images separately
## Resources
- `scripts/extract_pdf_images.py` - Extract images from PDF using PyMuPDF
- `scripts/convert_path.py` - Windows to WSL path converter
- `references/conversion-examples.md` - Detailed examples for batch operations
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### scripts/extract_pdf_images.py
```python
#!/usr/bin/env python3
"""
Extract images from PDF files using PyMuPDF.
Usage:
uv run --with pymupdf python extract_pdf_images.py <pdf_path> [output_dir]
Examples:
uv run --with pymupdf python extract_pdf_images.py document.pdf
uv run --with pymupdf python extract_pdf_images.py document.pdf ./assets
Output:
Images are saved to output_dir (default: ./assets) with names like:
- img_page1_1.png
- img_page2_1.png
"""
import sys
import os
def extract_images(pdf_path: str, output_dir: str = "assets") -> list[str]:
"""
Extract all images from a PDF file.
Args:
pdf_path: Path to the PDF file
output_dir: Directory to save extracted images
Returns:
List of extracted image file paths
"""
try:
import fitz # PyMuPDF
except ImportError:
print("Error: PyMuPDF not installed. Run with:")
print(' uv run --with pymupdf python extract_pdf_images.py <pdf_path>')
sys.exit(1)
os.makedirs(output_dir, exist_ok=True)
doc = fitz.open(pdf_path)
extracted_files = []
for page_num in range(len(doc)):
page = doc[page_num]
image_list = page.get_images()
for img_index, img in enumerate(image_list):
xref = img[0]
base_image = doc.extract_image(xref)
image_bytes = base_image["image"]
image_ext = base_image["ext"]
# Create descriptive filename
img_filename = f"img_page{page_num + 1}_{img_index + 1}.{image_ext}"
img_path = os.path.join(output_dir, img_filename)
with open(img_path, "wb") as f:
f.write(image_bytes)
extracted_files.append(img_path)
print(f"Extracted: {img_filename} ({len(image_bytes):,} bytes)")
doc.close()
print(f"\nTotal: {len(extracted_files)} images extracted to {output_dir}/")
return extracted_files
def main():
if len(sys.argv) < 2 or sys.argv[1] in ("-h", "--help"):
print("Extract images from PDF files using PyMuPDF.")
print()
print("Usage: python extract_pdf_images.py <pdf_path> [output_dir]")
print()
print("Arguments:")
print(" pdf_path Path to the PDF file")
print(" output_dir Directory to save images (default: ./assets)")
print()
print("Example:")
print(" uv run --with pymupdf python extract_pdf_images.py document.pdf ./assets")
sys.exit(0 if "--help" in sys.argv or "-h" in sys.argv else 1)
pdf_path = sys.argv[1]
output_dir = sys.argv[2] if len(sys.argv) > 2 else "assets"
if not os.path.exists(pdf_path):
print(f"Error: File not found: {pdf_path}")
sys.exit(1)
extract_images(pdf_path, output_dir)
if __name__ == "__main__":
main()
```
### scripts/convert_path.py
```python
#!/usr/bin/env python3
"""
Convert Windows paths to WSL format.
Usage:
python convert_path.py "C:\\Users\\username\\Downloads\\file.doc"
Output:
/mnt/c/Users/username/Downloads/file.doc
"""
import sys
import re
def convert_windows_to_wsl(windows_path: str) -> str:
"""
Convert a Windows path to WSL format.
Args:
windows_path: Windows path (e.g., "C:\\Users\\username\\file.doc")
Returns:
WSL path (e.g., "/mnt/c/Users/username/file.doc")
"""
# Remove quotes if present
path = windows_path.strip('"').strip("'")
# Handle drive letter (C:\ or C:/)
drive_pattern = r'^([A-Za-z]):[\\\/]'
match = re.match(drive_pattern, path)
if not match:
# Already a WSL path or relative path
return path
drive_letter = match.group(1).lower()
path_without_drive = path[3:] # Remove "C:\"
# Replace backslashes with forward slashes
path_without_drive = path_without_drive.replace('\\', '/')
# Construct WSL path
wsl_path = f"/mnt/{drive_letter}/{path_without_drive}"
return wsl_path
def main():
if len(sys.argv) < 2:
print("Usage: python convert_path.py <windows_path>")
print('Example: python convert_path.py "C:\\Users\\username\\Downloads\\file.doc"')
sys.exit(1)
windows_path = sys.argv[1]
wsl_path = convert_windows_to_wsl(windows_path)
print(wsl_path)
if __name__ == "__main__":
main()
```
### references/conversion-examples.md
```markdown
# Document Conversion Examples
Comprehensive examples for converting various document formats to markdown.
## Basic Document Conversions
### PDF to Markdown
```bash
# Simple PDF conversion
markitdown "document.pdf" > output.md
# WSL path example
markitdown "/mnt/c/Users/username/Documents/report.pdf" > report.md
# With explicit output
markitdown "slides.pdf" > "slides.md"
```
### Word Documents to Markdown
```bash
# Modern Word document (.docx)
markitdown "document.docx" > output.md
# Legacy Word document (.doc)
markitdown "legacy-doc.doc" > output.md
# Preserve directory structure
markitdown "/path/to/docs/file.docx" > "/path/to/output/file.md"
```
### PowerPoint to Markdown
```bash
# Convert presentation
markitdown "presentation.pptx" > slides.md
# WSL path
markitdown "/mnt/c/Users/username/Desktop/slides.pptx" > slides.md
```
---
## Windows/WSL Path Conversion
### Basic Path Conversion Rules
```bash
# Windows path
C:\Users\username\Documents\file.doc
# WSL equivalent
/mnt/c/Users/username/Documents/file.doc
```
### Conversion Examples
```bash
# Single backslash to forward slash
C:\folder\file.txt
→ /mnt/c/folder/file.txt
# Path with spaces (must use quotes)
C:\Users\John Doe\Documents\report.pdf
→ "/mnt/c/Users/John Doe/Documents/report.pdf"
# OneDrive path
C:\Users\username\OneDrive\Documents\file.doc
→ "/mnt/c/Users/username/OneDrive/Documents/file.doc"
# Different drive letters
D:\Projects\document.docx
→ /mnt/d/Projects/document.docx
```
### Using convert_path.py Helper
```bash
# Automatic conversion
python scripts/convert_path.py "C:\Users\username\Downloads\document.doc"
# Output: /mnt/c/Users/username/Downloads/document.doc
# Use in conversion command
wsl_path=$(python scripts/convert_path.py "C:\Users\username\file.docx")
markitdown "$wsl_path" > output.md
```
---
## Batch Conversions
### Convert Multiple Files
```bash
# Convert all PDFs in a directory
for pdf in /path/to/pdfs/*.pdf; do
filename=$(basename "$pdf" .pdf)
markitdown "$pdf" > "/path/to/output/${filename}.md"
done
# Convert all Word documents
for doc in /path/to/docs/*.docx; do
filename=$(basename "$doc" .docx)
markitdown "$doc" > "/path/to/output/${filename}.md"
done
```
### Batch Conversion with Path Conversion
```bash
# Windows batch (PowerShell)
Get-ChildItem "C:\Documents\*.pdf" | ForEach-Object {
$wslPath = "/mnt/c/Documents/$($_.Name)"
$outFile = "/mnt/c/Output/$($_.BaseName).md"
wsl markitdown $wslPath > $outFile
}
```
---
## Confluence Export Handling
### Simple Confluence Export
```bash
# Direct conversion for exports without special characters
markitdown "confluence-export.doc" > output.md
```
### Export with Special Characters
For Confluence exports containing special characters:
1. Save the .doc file to an accessible location
2. Try direct conversion first:
```bash
markitdown "confluence-export.doc" > output.md
```
3. If special characters cause issues:
- Open in Word and save as .docx
- Or use LibreOffice to convert: `libreoffice --headless --convert-to docx export.doc`
- Then convert the .docx file
### Handling Encoding Issues
```bash
# Check file encoding
file -i "document.doc"
# Convert if needed (using iconv)
iconv -f ISO-8859-1 -t UTF-8 input.md > output.md
```
---
## Advanced Conversion Scenarios
### Preserving Directory Structure
```bash
# Mirror directory structure
src_dir="/mnt/c/Users/username/Documents"
out_dir="/path/to/output"
find "$src_dir" -name "*.docx" | while read file; do
# Get relative path
rel_path="${file#$src_dir/}"
out_file="$out_dir/${rel_path%.docx}.md"
# Create output directory
mkdir -p "$(dirname "$out_file")"
# Convert
markitdown "$file" > "$out_file"
done
```
### Conversion with Metadata
```bash
# Add frontmatter to converted file
{
echo "---"
echo "title: $(basename "$file" .pdf)"
echo "converted: $(date -I)"
echo "source: $file"
echo "---"
echo ""
markitdown "$file"
} > output.md
```
---
## Error Recovery
### Handling Failed Conversions
```bash
# Check if markitdown succeeded
if markitdown "document.pdf" > output.md 2> error.log; then
echo "Conversion successful"
else
echo "Conversion failed, check error.log"
fi
```
### Retry Logic
```bash
# Retry failed conversions
for file in *.pdf; do
output="${file%.pdf}.md"
if ! [ -f "$output" ]; then
echo "Converting $file..."
markitdown "$file" > "$output" || echo "Failed: $file" >> failed.txt
fi
done
```
---
## Quality Verification
### Check Conversion Quality
```bash
# Compare line counts
wc -l document.pdf.md
# Check for common issues
grep "TODO\|ERROR\|MISSING" output.md
# Preview first/last lines
head -n 20 output.md
tail -n 20 output.md
```
### Validate Output
```bash
# Check for empty files
if [ ! -s output.md ]; then
echo "Warning: Output file is empty"
fi
# Verify markdown syntax
# Use a markdown linter if available
markdownlint output.md
```
---
## Best Practices
### 1. Path Handling
- Always quote paths with spaces
- Verify paths exist before conversion
- Use absolute paths for scripts
### 2. Batch Processing
- Log conversions for audit trail
- Handle errors gracefully
- Preserve original files
### 3. Output Organization
- Mirror source directory structure
- Use consistent naming conventions
- Separate by document type or date
### 4. Quality Assurance
- Spot-check random conversions
- Validate critical documents manually
- Keep conversion logs
### 5. Performance
- Use parallel processing for large batches
- Skip already converted files
- Clean up temporary files
---
## Common Patterns
### Pattern: Convert and Review
```bash
#!/bin/bash
file="$1"
output="${file%.*}.md"
# Convert
markitdown "$file" > "$output"
# Open in editor for review
${EDITOR:-vim} "$output"
```
### Pattern: Safe Conversion
```bash
#!/bin/bash
file="$1"
backup="${file}.backup"
output="${file%.*}.md"
# Backup original
cp "$file" "$backup"
# Convert with error handling
if markitdown "$file" > "$output" 2> conversion.log; then
echo "Success: $output"
rm "$backup"
else
echo "Failed: Check conversion.log"
mv "$backup" "$file"
fi
```
### Pattern: Metadata Preservation
```bash
#!/bin/bash
# Extract and preserve document metadata
file="$1"
output="${file%.*}.md"
# Get file metadata
created=$(stat -c %w "$file" 2>/dev/null || stat -f %SB "$file")
modified=$(stat -c %y "$file" 2>/dev/null || stat -f %Sm "$file")
# Convert with metadata
{
echo "---"
echo "original_file: $(basename "$file")"
echo "created: $created"
echo "modified: $modified"
echo "converted: $(date -I)"
echo "---"
echo ""
markitdown "$file"
} > "$output"
```
```