Back to skills
SkillHub ClubWrite Technical DocsFull StackTech Writer

markdown-tools

Converts documents to markdown (PDFs, Word docs, PowerPoint, Confluence exports) with Windows/WSL path handling. Activates when converting .doc/.docx/PDF/PPTX files to markdown, processing Confluence exports, handling Windows/WSL path conversions, extracting images from PDFs, or working with markitdown utility.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
2
Hot score
79
Updated
March 20, 2026
Overall rating
C2.0
Composite score
2.0
Best-practice grade
N/A

Install command

npx @skill-hub/cli install nguyendinhquocx-code-ai-markdown-tools
document-conversionmarkdownpdf-processingcross-platformautomation

Repository

nguyendinhquocx/code-ai

Skill path: skills/markdown-tools

Converts documents to markdown (PDFs, Word docs, PowerPoint, Confluence exports) with Windows/WSL path handling. Activates when converting .doc/.docx/PDF/PPTX files to markdown, processing Confluence exports, handling Windows/WSL path conversions, extracting images from PDFs, or working with markitdown utility.

Open repository

Best for

Primary workflow: Write Technical Docs.

Technical facets: Full Stack, Tech Writer.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: nguyendinhquocx.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install markdown-tools into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/nguyendinhquocx/code-ai before adding markdown-tools to shared team environments
  • Use markdown-tools for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: markdown-tools
description: Converts documents to markdown (PDFs, Word docs, PowerPoint, Confluence exports) with Windows/WSL path handling. Activates when converting .doc/.docx/PDF/PPTX files to markdown, processing Confluence exports, handling Windows/WSL path conversions, extracting images from PDFs, or working with markitdown utility.
---

# Markdown Tools

Convert documents to markdown with image extraction and Windows/WSL path handling.

## Quick Start

### Install markitdown with PDF Support

```bash
# IMPORTANT: Use [pdf] extra for PDF support
uv tool install "markitdown[pdf]"

# Or via pip
pip install "markitdown[pdf]"
```

### Basic Conversion

```bash
markitdown "document.pdf" -o output.md
# Or redirect: markitdown "document.pdf" > output.md
```

## PDF Conversion with Images

markitdown extracts text only. For PDFs with images, use this workflow:

### Step 1: Convert Text

```bash
markitdown "document.pdf" -o output.md
```

### Step 2: Extract Images

```bash
# Create assets directory alongside the markdown
mkdir -p assets

# Extract images using PyMuPDF
uv run --with pymupdf python scripts/extract_pdf_images.py "document.pdf" ./assets
```

### Step 3: Add Image References

Insert image references in the markdown where needed:

```markdown
![Description](assets/img_page1_1.png)
```

### Step 4: Format Cleanup

markitdown output often needs manual fixes:
- Add proper heading levels (`#`, `##`, `###`)
- Reconstruct tables in markdown format
- Fix broken line breaks
- Restore indentation structure

## Path Conversion (Windows/WSL)

```bash
# Windows → WSL conversion
C:\Users\name\file.pdf → /mnt/c/Users/name/file.pdf

# Use helper script
python scripts/convert_path.py "C:\Users\name\Documents\file.pdf"
```

## Common Issues

**"dependencies needed to read .pdf files"**
```bash
# Install with PDF support
uv tool install "markitdown[pdf]" --force
```

**FontBBox warnings during PDF conversion**
- These are harmless font parsing warnings, output is still correct

**Images missing from output**
- Use `scripts/extract_pdf_images.py` to extract images separately

## Resources

- `scripts/extract_pdf_images.py` - Extract images from PDF using PyMuPDF
- `scripts/convert_path.py` - Windows to WSL path converter
- `references/conversion-examples.md` - Detailed examples for batch operations


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### scripts/extract_pdf_images.py

```python
#!/usr/bin/env python3
"""
Extract images from PDF files using PyMuPDF.

Usage:
    uv run --with pymupdf python extract_pdf_images.py <pdf_path> [output_dir]

Examples:
    uv run --with pymupdf python extract_pdf_images.py document.pdf
    uv run --with pymupdf python extract_pdf_images.py document.pdf ./assets

Output:
    Images are saved to output_dir (default: ./assets) with names like:
    - img_page1_1.png
    - img_page2_1.png
"""

import sys
import os

def extract_images(pdf_path: str, output_dir: str = "assets") -> list[str]:
    """
    Extract all images from a PDF file.

    Args:
        pdf_path: Path to the PDF file
        output_dir: Directory to save extracted images

    Returns:
        List of extracted image file paths
    """
    try:
        import fitz  # PyMuPDF
    except ImportError:
        print("Error: PyMuPDF not installed. Run with:")
        print('  uv run --with pymupdf python extract_pdf_images.py <pdf_path>')
        sys.exit(1)

    os.makedirs(output_dir, exist_ok=True)

    doc = fitz.open(pdf_path)
    extracted_files = []

    for page_num in range(len(doc)):
        page = doc[page_num]
        image_list = page.get_images()

        for img_index, img in enumerate(image_list):
            xref = img[0]
            base_image = doc.extract_image(xref)
            image_bytes = base_image["image"]
            image_ext = base_image["ext"]

            # Create descriptive filename
            img_filename = f"img_page{page_num + 1}_{img_index + 1}.{image_ext}"
            img_path = os.path.join(output_dir, img_filename)

            with open(img_path, "wb") as f:
                f.write(image_bytes)

            extracted_files.append(img_path)
            print(f"Extracted: {img_filename} ({len(image_bytes):,} bytes)")

    doc.close()

    print(f"\nTotal: {len(extracted_files)} images extracted to {output_dir}/")
    return extracted_files


def main():
    if len(sys.argv) < 2 or sys.argv[1] in ("-h", "--help"):
        print("Extract images from PDF files using PyMuPDF.")
        print()
        print("Usage: python extract_pdf_images.py <pdf_path> [output_dir]")
        print()
        print("Arguments:")
        print("  pdf_path    Path to the PDF file")
        print("  output_dir  Directory to save images (default: ./assets)")
        print()
        print("Example:")
        print("  uv run --with pymupdf python extract_pdf_images.py document.pdf ./assets")
        sys.exit(0 if "--help" in sys.argv or "-h" in sys.argv else 1)

    pdf_path = sys.argv[1]
    output_dir = sys.argv[2] if len(sys.argv) > 2 else "assets"

    if not os.path.exists(pdf_path):
        print(f"Error: File not found: {pdf_path}")
        sys.exit(1)

    extract_images(pdf_path, output_dir)


if __name__ == "__main__":
    main()

```

### scripts/convert_path.py

```python
#!/usr/bin/env python3
"""
Convert Windows paths to WSL format.

Usage:
    python convert_path.py "C:\\Users\\username\\Downloads\\file.doc"

Output:
    /mnt/c/Users/username/Downloads/file.doc
"""

import sys
import re


def convert_windows_to_wsl(windows_path: str) -> str:
    """
    Convert a Windows path to WSL format.

    Args:
        windows_path: Windows path (e.g., "C:\\Users\\username\\file.doc")

    Returns:
        WSL path (e.g., "/mnt/c/Users/username/file.doc")
    """
    # Remove quotes if present
    path = windows_path.strip('"').strip("'")

    # Handle drive letter (C:\ or C:/)
    drive_pattern = r'^([A-Za-z]):[\\\/]'
    match = re.match(drive_pattern, path)

    if not match:
        # Already a WSL path or relative path
        return path

    drive_letter = match.group(1).lower()
    path_without_drive = path[3:]  # Remove "C:\"

    # Replace backslashes with forward slashes
    path_without_drive = path_without_drive.replace('\\', '/')

    # Construct WSL path
    wsl_path = f"/mnt/{drive_letter}/{path_without_drive}"

    return wsl_path


def main():
    if len(sys.argv) < 2:
        print("Usage: python convert_path.py <windows_path>")
        print('Example: python convert_path.py "C:\\Users\\username\\Downloads\\file.doc"')
        sys.exit(1)

    windows_path = sys.argv[1]
    wsl_path = convert_windows_to_wsl(windows_path)
    print(wsl_path)


if __name__ == "__main__":
    main()
```

### references/conversion-examples.md

```markdown
# Document Conversion Examples

Comprehensive examples for converting various document formats to markdown.

## Basic Document Conversions

### PDF to Markdown

```bash
# Simple PDF conversion
markitdown "document.pdf" > output.md

# WSL path example
markitdown "/mnt/c/Users/username/Documents/report.pdf" > report.md

# With explicit output
markitdown "slides.pdf" > "slides.md"
```

### Word Documents to Markdown

```bash
# Modern Word document (.docx)
markitdown "document.docx" > output.md

# Legacy Word document (.doc)
markitdown "legacy-doc.doc" > output.md

# Preserve directory structure
markitdown "/path/to/docs/file.docx" > "/path/to/output/file.md"
```

### PowerPoint to Markdown

```bash
# Convert presentation
markitdown "presentation.pptx" > slides.md

# WSL path
markitdown "/mnt/c/Users/username/Desktop/slides.pptx" > slides.md
```

---

## Windows/WSL Path Conversion

### Basic Path Conversion Rules

```bash
# Windows path
C:\Users\username\Documents\file.doc

# WSL equivalent
/mnt/c/Users/username/Documents/file.doc
```

### Conversion Examples

```bash
# Single backslash to forward slash
C:\folder\file.txt
→ /mnt/c/folder/file.txt

# Path with spaces (must use quotes)
C:\Users\John Doe\Documents\report.pdf
→ "/mnt/c/Users/John Doe/Documents/report.pdf"

# OneDrive path
C:\Users\username\OneDrive\Documents\file.doc
→ "/mnt/c/Users/username/OneDrive/Documents/file.doc"

# Different drive letters
D:\Projects\document.docx
→ /mnt/d/Projects/document.docx
```

### Using convert_path.py Helper

```bash
# Automatic conversion
python scripts/convert_path.py "C:\Users\username\Downloads\document.doc"
# Output: /mnt/c/Users/username/Downloads/document.doc

# Use in conversion command
wsl_path=$(python scripts/convert_path.py "C:\Users\username\file.docx")
markitdown "$wsl_path" > output.md
```

---

## Batch Conversions

### Convert Multiple Files

```bash
# Convert all PDFs in a directory
for pdf in /path/to/pdfs/*.pdf; do
  filename=$(basename "$pdf" .pdf)
  markitdown "$pdf" > "/path/to/output/${filename}.md"
done

# Convert all Word documents
for doc in /path/to/docs/*.docx; do
  filename=$(basename "$doc" .docx)
  markitdown "$doc" > "/path/to/output/${filename}.md"
done
```

### Batch Conversion with Path Conversion

```bash
# Windows batch (PowerShell)
Get-ChildItem "C:\Documents\*.pdf" | ForEach-Object {
  $wslPath = "/mnt/c/Documents/$($_.Name)"
  $outFile = "/mnt/c/Output/$($_.BaseName).md"
  wsl markitdown $wslPath > $outFile
}
```

---

## Confluence Export Handling

### Simple Confluence Export

```bash
# Direct conversion for exports without special characters
markitdown "confluence-export.doc" > output.md
```

### Export with Special Characters

For Confluence exports containing special characters:

1. Save the .doc file to an accessible location
2. Try direct conversion first:
   ```bash
   markitdown "confluence-export.doc" > output.md
   ```

3. If special characters cause issues:
   - Open in Word and save as .docx
   - Or use LibreOffice to convert: `libreoffice --headless --convert-to docx export.doc`
   - Then convert the .docx file

### Handling Encoding Issues

```bash
# Check file encoding
file -i "document.doc"

# Convert if needed (using iconv)
iconv -f ISO-8859-1 -t UTF-8 input.md > output.md
```

---

## Advanced Conversion Scenarios

### Preserving Directory Structure

```bash
# Mirror directory structure
src_dir="/mnt/c/Users/username/Documents"
out_dir="/path/to/output"

find "$src_dir" -name "*.docx" | while read file; do
  # Get relative path
  rel_path="${file#$src_dir/}"
  out_file="$out_dir/${rel_path%.docx}.md"

  # Create output directory
  mkdir -p "$(dirname "$out_file")"

  # Convert
  markitdown "$file" > "$out_file"
done
```

### Conversion with Metadata

```bash
# Add frontmatter to converted file
{
  echo "---"
  echo "title: $(basename "$file" .pdf)"
  echo "converted: $(date -I)"
  echo "source: $file"
  echo "---"
  echo ""
  markitdown "$file"
} > output.md
```

---

## Error Recovery

### Handling Failed Conversions

```bash
# Check if markitdown succeeded
if markitdown "document.pdf" > output.md 2> error.log; then
  echo "Conversion successful"
else
  echo "Conversion failed, check error.log"
fi
```

### Retry Logic

```bash
# Retry failed conversions
for file in *.pdf; do
  output="${file%.pdf}.md"
  if ! [ -f "$output" ]; then
    echo "Converting $file..."
    markitdown "$file" > "$output" || echo "Failed: $file" >> failed.txt
  fi
done
```

---

## Quality Verification

### Check Conversion Quality

```bash
# Compare line counts
wc -l document.pdf.md

# Check for common issues
grep "TODO\|ERROR\|MISSING" output.md

# Preview first/last lines
head -n 20 output.md
tail -n 20 output.md
```

### Validate Output

```bash
# Check for empty files
if [ ! -s output.md ]; then
  echo "Warning: Output file is empty"
fi

# Verify markdown syntax
# Use a markdown linter if available
markdownlint output.md
```

---

## Best Practices

### 1. Path Handling
- Always quote paths with spaces
- Verify paths exist before conversion
- Use absolute paths for scripts

### 2. Batch Processing
- Log conversions for audit trail
- Handle errors gracefully
- Preserve original files

### 3. Output Organization
- Mirror source directory structure
- Use consistent naming conventions
- Separate by document type or date

### 4. Quality Assurance
- Spot-check random conversions
- Validate critical documents manually
- Keep conversion logs

### 5. Performance
- Use parallel processing for large batches
- Skip already converted files
- Clean up temporary files

---

## Common Patterns

### Pattern: Convert and Review

```bash
#!/bin/bash
file="$1"
output="${file%.*}.md"

# Convert
markitdown "$file" > "$output"

# Open in editor for review
${EDITOR:-vim} "$output"
```

### Pattern: Safe Conversion

```bash
#!/bin/bash
file="$1"
backup="${file}.backup"
output="${file%.*}.md"

# Backup original
cp "$file" "$backup"

# Convert with error handling
if markitdown "$file" > "$output" 2> conversion.log; then
  echo "Success: $output"
  rm "$backup"
else
  echo "Failed: Check conversion.log"
  mv "$backup" "$file"
fi
```

### Pattern: Metadata Preservation

```bash
#!/bin/bash
# Extract and preserve document metadata

file="$1"
output="${file%.*}.md"

# Get file metadata
created=$(stat -c %w "$file" 2>/dev/null || stat -f %SB "$file")
modified=$(stat -c %y "$file" 2>/dev/null || stat -f %Sm "$file")

# Convert with metadata
{
  echo "---"
  echo "original_file: $(basename "$file")"
  echo "created: $created"
  echo "modified: $modified"
  echo "converted: $(date -I)"
  echo "---"
  echo ""
  markitdown "$file"
} > "$output"
```

```

markdown-tools | SkillHub