Back to skills
SkillHub ClubWrite Technical DocsFull StackBackendTech Writer

paddleocr-doc-parsing

Parse documents using PaddleOCR's API. Supports both sync and async modes for images and PDFs.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
3,114
Hot score
99
Updated
March 20, 2026
Overall rating
C4.0
Composite score
4.0
Best-practice grade
B73.2

Install command

npx @skill-hub/cli install openclaw-skills-paddleocr-doc-parsing-v2

Repository

openclaw/skills

Skill path: skills/hiotec/paddleocr-doc-parsing-v2

Parse documents using PaddleOCR's API. Supports both sync and async modes for images and PDFs.

Open repository

Best for

Primary workflow: Write Technical Docs.

Technical facets: Full Stack, Backend, Tech Writer.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install paddleocr-doc-parsing into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/openclaw/skills before adding paddleocr-doc-parsing to shared team environments
  • Use paddleocr-doc-parsing for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: paddleocr-doc-parsing
description: Parse documents using PaddleOCR's API. Supports both sync and async modes for images and PDFs.
homepage: https://www.paddleocr.com
metadata:
  {
    "openclaw":
      {
        "emoji": "📄",
        "os": ["darwin", "linux"],
        "requires":
          {
            "bins": ["curl", "base64", "jq", "python3"],
            "env": ["PADDLEOCR_ACCESS_TOKEN", "PADDLEOCR_API_URL"],
          },
      },
  }
---

# PaddleOCR Document Parsing

Parse images and PDF files using PaddleOCR's API. Supports both synchronous and asynchronous parsing modes with structured output.

## Resource Links

| Resource              | Link                                                                           |
| --------------------- | ------------------------------------------------------------------------------ |
| **Official Website**  | [https://www.paddleocr.com](https://www.paddleocr.com)                                     |
| **API Documentation** | [https://ai.baidu.com/ai-doc/AISTUDIO/Cmkz2m0ma](https://ai.baidu.com/ai-doc/AISTUDIO/Cmkz2m0ma)         |
| **GitHub**            | [https://github.com/PaddlePaddle/PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) |

## Key Features

- **Multi-format support**: PDF and image files (JPG, PNG, BMP, TIFF)
- **Two parsing modes**:
  - **Sync mode**: Fast response for small files (<600s timeout)
  - **Async mode**: For large files with progress polling
- **Layout analysis**: Automatic detection of text blocks, tables, formulas
- **Multi-language**: Support for 110+ languages
- **Structured output**: Markdown format with preserved document structure

## Setup

1. Visit [PaddleOCR](https://www.paddleocr.com) to obtain your API credentials
2. Set environment variables:

```bash
export PADDLEOCR_ACCESS_TOKEN="your_token_here"
export PADDLEOCR_API_URL="https://your-endpoint.aistudio-app.com/layout-parsing"

# Optional: For async mode
export PADDLEOCR_JOB_URL="https://your-job-endpoint.aistudio-app.com/api/v2/ocr/jobs"
export PADDLEOCR_MODEL="PaddleOCR-VL-1.5"
```

## Usage Examples

### Sync Mode (Default)

For small files and quick processing:

```bash
# Parse local image
{baseDir}/paddleocr_parse.sh document.jpg

# Parse PDF
{baseDir}/paddleocr_parse.sh -t pdf document.pdf

# Parse from URL
{baseDir}/paddleocr_parse.sh https://example.com/document.jpg

# Save output to file
{baseDir}/paddleocr_parse.sh -o result.json document.jpg

# Verbose output
{baseDir}/paddleocr_parse.sh -v document.jpg
```

### Async Mode

For large files with progress tracking:

```bash
# Parse large PDF with async mode
{baseDir}/paddleocr_parse.sh --async large-document.pdf

# Parse from URL with async mode
{baseDir}/paddleocr_parse.sh --async -t pdf https://example.com/doc.pdf

# Save async result to file
{baseDir}/paddleocr_parse.sh --async -o result.json document.pdf
```

### Using Python Script Directly

```bash
# Sync mode
python3 {baseDir}/paddleocr_parse.py document.jpg

# Async mode
python3 {baseDir}/paddleocr_parse.py --async-mode document.pdf

# With output file
python3 {baseDir}/paddleocr_parse.py -o result.json --async-mode document.pdf
```

## Response Structure

```json
{
  "logId": "unique_request_id",
  "errorCode": 0,
  "errorMsg": "Success",
  "result": {
    "layoutParsingResults": [
      {
        "prunedResult": [...],
        "markdown": {
          "text": "# Document Title\n\nParagraph content...",
          "images": {}
        },
        "outputImages": [...],
        "inputImage": "http://input-image"
      }
    ],
    "dataInfo": {...}
  }
}
```

**Important Fields:**

- **`prunedResult`** - Contains detailed layout element information including positions, categories, etc.
- **`markdown`** - Stores the document content converted to Markdown format with preserved structure and formatting.

## Mode Selection Guide

| Use Case | Recommended Mode |
|----------|-----------------|
| Small images (< 10MB) | Sync |
| Single page PDFs | Sync |
| Large PDFs (> 10MB) | Async |
| Multi-page documents | Async |
| Batch processing | Async |
| Quick text extraction | Sync |

## Error Handling

The script will exit with code 1 and print error message for:
- Missing required environment variables
- File not found
- API authentication failures
- Invalid JSON responses
- API error codes (non-zero)

## Quota Information

See official documentation: https://ai.baidu.com/ai-doc/AISTUDIO/Xmjclapam


---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### _meta.json

```json
{
  "owner": "hiotec",
  "slug": "paddleocr-doc-parsing-v2",
  "displayName": "PaddleOCR Document Parsing V2",
  "latest": {
    "version": "1.0.4",
    "publishedAt": 1770944761798,
    "commit": "https://github.com/openclaw/skills/commit/f97ed62d867d1ce563dae6348d7133412ca7f11d"
  },
  "history": []
}

```

### scripts/paddleocr_parse.py

```python
#!/usr/bin/env python3
"""
PaddleOCR Async Document Parser

Supports both sync and async parsing modes.
"""

import argparse
import base64
import json
import os
import sys
import time
from pathlib import Path

import requests


def get_env_or_exit(name: str) -> str:
    """Get environment variable or exit with error."""
    value = os.environ.get(name)
    if not value:
        print(f"Error: {name} environment variable is required", file=sys.stderr)
        print(f"Set it with: export {name}=\"your_value_here\"", file=sys.stderr)
        sys.exit(1)
    return value


def sync_parse(file_path: str, file_type: int, api_url: str, token: str, verbose: bool = False) -> dict:
    """Synchronous document parsing."""
    
    if file_path.startswith("http"):
        # URL mode
        payload = {
            "file": file_path,
            "fileType": file_type,
            "useDocOrientationClassify": False,
            "useDocUnwarping": False,
        }
    else:
        # Local file mode
        path = Path(file_path)
        if not path.exists():
            print(f"Error: File not found: {file_path}", file=sys.stderr)
            sys.exit(1)
        
        file_bytes = path.read_bytes()
        file_data = base64.b64encode(file_bytes).decode("ascii")
        
        payload = {
            "file": file_data,
            "fileType": file_type,
            "useDocOrientationClassify": False,
            "useDocUnwarping": False,
        }
    
    headers = {
        "Authorization": f"token {token}",
        "Content-Type": "application/json"
    }
    
    if verbose:
        print(f"Making sync request to: {api_url}", file=sys.stderr)
    
    response = requests.post(api_url, json=payload, headers=headers, timeout=600)
    
    if response.status_code != 200:
        print(f"Error: HTTP {response.status_code}", file=sys.stderr)
        print(response.text, file=sys.stderr)
        sys.exit(1)
    
    return response.json()


def async_parse(file_path: str, model: str, job_url: str, token: str, verbose: bool = False) -> dict:
    """Asynchronous document parsing."""
    
    headers = {
        "Authorization": f"bearer {token}",
    }
    
    optional_payload = {
        "useDocOrientationClassify": False,
        "useDocUnwarping": False,
        "useChartRecognition": False,
    }
    
    if verbose:
        print(f"Processing file: {file_path}", file=sys.stderr)
    
    if file_path.startswith("http"):
        # URL Mode
        headers["Content-Type"] = "application/json"
        payload = {
            "fileUrl": file_path,
            "model": model,
            "optionalPayload": optional_payload
        }
        job_response = requests.post(job_url, json=payload, headers=headers)
    else:
        # Local File Mode
        path = Path(file_path)
        if not path.exists():
            print(f"Error: File not found: {file_path}", file=sys.stderr)
            sys.exit(1)
        
        data = {
            "model": model,
            "optionalPayload": json.dumps(optional_payload)
        }
        with open(file_path, "rb") as f:
            files = {"file": f}
            job_response = requests.post(job_url, headers=headers, data=data, files=files)
    
    if verbose:
        print(f"Response status: {job_response.status_code}", file=sys.stderr)
    
    if job_response.status_code != 200:
        print(f"Error: HTTP {job_response.status_code}", file=sys.stderr)
        print(job_response.text, file=sys.stderr)
        sys.exit(1)
    
    job_id = job_response.json()["data"]["jobId"]
    print(f"Job submitted. ID: {job_id}", file=sys.stderr)
    
    # Poll for results
    jsonl_url = ""
    while True:
        job_result = requests.get(f"{job_url}/{job_id}", headers=headers)
        job_result.raise_for_status()
        
        data = job_result.json()["data"]
        state = data["state"]
        
        if state == 'pending':
            if verbose:
                print("Status: pending", file=sys.stderr)
        elif state == 'running':
            try:
                progress = data['extractProgress']
                total = progress['totalPages']
                extracted = progress['extractedPages']
                print(f"Status: running ({extracted}/{total} pages)", file=sys.stderr)
            except KeyError:
                if verbose:
                    print("Status: running...", file=sys.stderr)
        elif state == 'done':
            extracted = data['extractProgress']['extractedPages']
            print(f"Status: done ({extracted} pages extracted)", file=sys.stderr)
            jsonl_url = data['resultUrl']['jsonUrl']
            break
        elif state == "failed":
            error_msg = data.get('errorMsg', 'Unknown error')
            print(f"Error: Job failed - {error_msg}", file=sys.stderr)
            sys.exit(1)
        
        time.sleep(5)
    
    # Fetch JSONL results
    if jsonl_url:
        jsonl_response = requests.get(jsonl_url)
        jsonl_response.raise_for_status()
        
        lines = jsonl_response.text.strip().split('\n')
        results = []
        
        for line in lines:
            line = line.strip()
            if not line:
                continue
            try:
                result = json.loads(line)["result"]
                results.extend(result.get("layoutParsingResults", []))
            except (json.JSONDecodeError, KeyError) as e:
                if verbose:
                    print(f"Warning: Failed to parse line: {e}", file=sys.stderr)
                continue
        
        return {"result": {"layoutParsingResults": results}}
    
    return {}


def main():
    parser = argparse.ArgumentParser(description="Parse documents using PaddleOCR API")
    parser.add_argument("input", help="Input file path or URL")
    parser.add_argument("-t", "--type", choices=["image", "pdf"], default="image",
                       help="File type (default: image)")
    parser.add_argument("-o", "--output", help="Output file (default: stdout)")
    parser.add_argument("-v", "--verbose", action="store_true", help="Verbose output")
    parser.add_argument("--async-mode", action="store_true", 
                       help="Use async mode (for large files)")
    
    args = parser.parse_args()
    
    # Get configuration from environment
    token = get_env_or_exit("PADDLEOCR_ACCESS_TOKEN")
    
    if args.async_mode:
        job_url = get_env_or_exit("PADDLEOCR_JOB_URL")
        model = os.environ.get("PADDLEOCR_MODEL", "PaddleOCR-VL-1.5")
        result = async_parse(args.input, model, job_url, token, args.verbose)
    else:
        api_url = get_env_or_exit("PADDLEOCR_API_URL")
        file_type_code = 0 if args.type == "pdf" else 1
        result = sync_parse(args.input, file_type_code, api_url, token, args.verbose)
    
    # Output result
    output = json.dumps(result, ensure_ascii=False, indent=2)
    
    if args.output:
        Path(args.output).write_text(output)
        print(f"Output saved to: {args.output}", file=sys.stderr)
    else:
        print(output)


if __name__ == "__main__":
    main()

```

### scripts/paddleocr_parse.sh

```bash
#!/bin/bash

# PaddleOCR Document Parser Script
# Supports both sync and async modes

set -e

# Default values
file_type="image"
output_file=""
verbose="false"
async_mode="false"

# Function to display usage
usage() {
    cat << EOF
Usage: $0 [OPTIONS] INPUT_FILE_PATH_OR_URL

Parse documents using PaddleOCR API

OPTIONS:
    -t, --type TYPE              File type (image, pdf) [default: image]
    -o, --output FILE            Output file [default: stdout]
    -v, --verbose                Verbose output
    --async                      Use async mode (for large files/PDFs)
    -h, --help                   Show this help message

ENVIRONMENT:
    PADDLEOCR_ACCESS_TOKEN       Required: API access token
    PADDLEOCR_API_URL            Required: Sync mode endpoint URL
    PADDLEOCR_JOB_URL            Required for async: Async endpoint URL
    PADDLEOCR_MODEL              Optional: Model name [default: PaddleOCR-VL-1.5]

SETUP:
    1. Visit https://www.paddleocr.com to get API credentials
    2. Set environment variables:
       export PADDLEOCR_ACCESS_TOKEN="your_token"
       export PADDLEOCR_API_URL="https://your-endpoint/layout-parsing"

EXAMPLES:
    # Sync mode (default)
    $0 document.jpg
    $0 -t pdf document.pdf
    $0 -o result.json document.jpg
    
    # Async mode (for large files)
    $0 --async large-document.pdf

EOF
}

# Parse command line arguments
while [[ $# -gt 0 ]]; do
    case $1 in
        -t|--type)
            file_type="$2"
            shift 2
            ;;
        -o|--output)
            output_file="$2"
            shift 2
            ;;
        -v|--verbose)
            verbose="true"
            shift
            ;;
        --async)
            async_mode="true"
            shift
            ;;
        -h|--help)
            usage
            exit 0
            ;;
        -*)
            echo "Unknown option: $1"
            usage
            exit 1
            ;;
        *)
            input_file="$1"
            shift
            ;;
    esac
done

# Validate input
if [[ -z "$input_file" ]]; then
    echo "Error: Input file path or URL is required"
    usage
    exit 1
fi

# Check required environment variables
if [[ -z "$PADDLEOCR_ACCESS_TOKEN" ]]; then
    echo "Error: PADDLEOCR_ACCESS_TOKEN environment variable is required"
    echo "Get it from: https://www.paddleocr.com"
    exit 1
fi

if [[ -z "$PADDLEOCR_API_URL" ]]; then
    echo "Error: PADDLEOCR_API_URL environment variable is required"
    echo "Set it to your PaddleOCR API endpoint"
    echo "Example: export PADDLEOCR_API_URL=\"https://your-endpoint.aistudio-app.com/layout-parsing\""
    exit 1
fi

# Set optional defaults
PADDLEOCR_MODEL="${PADDLEOCR_MODEL:-PaddleOCR-VL-1.5}"

# Check if input is a URL or local file
if [[ "$input_file" =~ ^https?:// ]]; then
    is_url="true"
    if [[ "$verbose" == "true" ]]; then
        echo "Input is a URL: $input_file" >&2
    fi
else
    is_url="false"
    if [[ ! -f "$input_file" ]]; then
        echo "Error: Input file not found: $input_file"
        exit 1
    fi
    if [[ "$verbose" == "true" ]]; then
        echo "Input is a local file: $input_file" >&2
    fi
fi

# Use Python script for async mode
if [[ "$async_mode" == "true" ]]; then
    if [[ -z "$PADDLEOCR_JOB_URL" ]]; then
        echo "Error: PADDLEOCR_JOB_URL environment variable is required for async mode"
        echo "Example: export PADDLEOCR_JOB_URL=\"https://your-endpoint.aistudio-app.com/api/v2/ocr/jobs\""
        exit 1
    fi
    
    if [[ "$verbose" == "true" ]]; then
        echo "Using async mode with model: $PADDLEOCR_MODEL" >&2
    fi
    
    # Get script directory
    SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
    
    # Run Python async parser
    if [[ -n "$output_file" ]]; then
        python3 "$SCRIPT_DIR/paddleocr_parse.py" --async-mode -o "$output_file" "$input_file"
    else
        python3 "$SCRIPT_DIR/paddleocr_parse.py" --async-mode "$input_file"
    fi
    exit 0
fi

# Sync mode (bash implementation)

# Validate file type
case "$file_type" in
    image|img) file_type_code=1 ;;
    pdf) file_type_code=0 ;;
    *)
        echo "Error: Invalid file type '$file_type'. Supported: image, pdf"
        exit 1
        ;;
esac

# Build payload - directly use URL or encode file
if [[ "$is_url" == "true" ]]; then
    # Use URL directly in payload
    payload=$(cat <<EOF
{
    "file": "$input_file",
    "fileType": $file_type_code,
    "useDocOrientationClassify": false,
    "useDocUnwarping": false
}
EOF
)
    if [[ "$verbose" == "true" ]]; then
        echo "Using URL directly in API request" >&2
    fi
else
    # Encode local file to base64
    if [[ "$verbose" == "true" ]]; then
        echo "Encoding $input_file to base64..." >&2
    fi

    file_base64=$(cat "$input_file" | base64 | tr -d '\n')

    payload=$(cat <<EOF
{
    "file": "$file_base64",
    "fileType": $file_type_code,
    "useDocOrientationClassify": false,
    "useDocUnwarping": false
}
EOF
)
fi

if [[ "$verbose" == "true" ]]; then
    echo "Making API request to: $PADDLEOCR_API_URL" >&2
    echo "Payload size: ${#payload} bytes" >&2
fi

# Make API request
# Use temporary file to avoid "Argument list too long" error for large payloads
payload_file=$(mktemp)
echo "$payload" > "$payload_file"

if [[ "$verbose" == "true" ]]; then
    echo "Request payload saved to temporary file: $payload_file" >&2
fi

# Use trap to ensure temporary file cleanup on script exit
cleanup() {
    if [[ -f "$payload_file" ]]; then
        rm -f "$payload_file"
        if [[ "$verbose" == "true" ]]; then
            echo "Cleaned up temporary file: $payload_file" >&2
        fi
    fi
}
trap cleanup EXIT

response=$(curl -s -X POST "$PADDLEOCR_API_URL" \
    -m 600 \
    --fail-with-body \
    -H "Authorization: token $PADDLEOCR_ACCESS_TOKEN" \
    -H "Content-Type: application/json" \
    -d @"$payload_file")

# Check for curl errors
curl_exit_code=$?
if [[ $curl_exit_code -ne 0 ]]; then
    echo "Error: Curl request failed with code $curl_exit_code"
    exit 1
fi

# Check response for errors using jq
if ! echo "$response" | jq -e . >/dev/null 2>&1; then
    echo "Error: Invalid JSON response from API"
    exit 1
fi

error_code=$(echo "$response" | jq -r '.errorCode // empty')
error_msg=$(echo "$response" | jq -r '.errorMsg // empty')

if [[ -n "$error_code" && "$error_code" != "0" ]]; then
    echo "API Error ($error_code): $error_msg"
    exit 1
fi

# Extract and process result
if [[ -n "$output_file" ]]; then
    echo "$response" > "$output_file"
    if [[ "$verbose" == "true" ]]; then
        echo "Output saved to: $output_file" >&2
    fi
else
    echo "$response"
fi

if [[ "$verbose" == "true" ]]; then
    echo "Processing completed successfully" >&2
fi

```

paddleocr-doc-parsing | SkillHub