paddleocr-doc-parsing
Parse documents using PaddleOCR's API. Supports both sync and async modes for images and PDFs.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install openclaw-skills-paddleocr-doc-parsing-v2
Repository
Skill path: skills/hiotec/paddleocr-doc-parsing-v2
Parse documents using PaddleOCR's API. Supports both sync and async modes for images and PDFs.
Open repositoryBest for
Primary workflow: Write Technical Docs.
Technical facets: Full Stack, Backend, Tech Writer.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: openclaw.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install paddleocr-doc-parsing into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/openclaw/skills before adding paddleocr-doc-parsing to shared team environments
- Use paddleocr-doc-parsing for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: paddleocr-doc-parsing
description: Parse documents using PaddleOCR's API. Supports both sync and async modes for images and PDFs.
homepage: https://www.paddleocr.com
metadata:
{
"openclaw":
{
"emoji": "📄",
"os": ["darwin", "linux"],
"requires":
{
"bins": ["curl", "base64", "jq", "python3"],
"env": ["PADDLEOCR_ACCESS_TOKEN", "PADDLEOCR_API_URL"],
},
},
}
---
# PaddleOCR Document Parsing
Parse images and PDF files using PaddleOCR's API. Supports both synchronous and asynchronous parsing modes with structured output.
## Resource Links
| Resource | Link |
| --------------------- | ------------------------------------------------------------------------------ |
| **Official Website** | [https://www.paddleocr.com](https://www.paddleocr.com) |
| **API Documentation** | [https://ai.baidu.com/ai-doc/AISTUDIO/Cmkz2m0ma](https://ai.baidu.com/ai-doc/AISTUDIO/Cmkz2m0ma) |
| **GitHub** | [https://github.com/PaddlePaddle/PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) |
## Key Features
- **Multi-format support**: PDF and image files (JPG, PNG, BMP, TIFF)
- **Two parsing modes**:
- **Sync mode**: Fast response for small files (<600s timeout)
- **Async mode**: For large files with progress polling
- **Layout analysis**: Automatic detection of text blocks, tables, formulas
- **Multi-language**: Support for 110+ languages
- **Structured output**: Markdown format with preserved document structure
## Setup
1. Visit [PaddleOCR](https://www.paddleocr.com) to obtain your API credentials
2. Set environment variables:
```bash
export PADDLEOCR_ACCESS_TOKEN="your_token_here"
export PADDLEOCR_API_URL="https://your-endpoint.aistudio-app.com/layout-parsing"
# Optional: For async mode
export PADDLEOCR_JOB_URL="https://your-job-endpoint.aistudio-app.com/api/v2/ocr/jobs"
export PADDLEOCR_MODEL="PaddleOCR-VL-1.5"
```
## Usage Examples
### Sync Mode (Default)
For small files and quick processing:
```bash
# Parse local image
{baseDir}/paddleocr_parse.sh document.jpg
# Parse PDF
{baseDir}/paddleocr_parse.sh -t pdf document.pdf
# Parse from URL
{baseDir}/paddleocr_parse.sh https://example.com/document.jpg
# Save output to file
{baseDir}/paddleocr_parse.sh -o result.json document.jpg
# Verbose output
{baseDir}/paddleocr_parse.sh -v document.jpg
```
### Async Mode
For large files with progress tracking:
```bash
# Parse large PDF with async mode
{baseDir}/paddleocr_parse.sh --async large-document.pdf
# Parse from URL with async mode
{baseDir}/paddleocr_parse.sh --async -t pdf https://example.com/doc.pdf
# Save async result to file
{baseDir}/paddleocr_parse.sh --async -o result.json document.pdf
```
### Using Python Script Directly
```bash
# Sync mode
python3 {baseDir}/paddleocr_parse.py document.jpg
# Async mode
python3 {baseDir}/paddleocr_parse.py --async-mode document.pdf
# With output file
python3 {baseDir}/paddleocr_parse.py -o result.json --async-mode document.pdf
```
## Response Structure
```json
{
"logId": "unique_request_id",
"errorCode": 0,
"errorMsg": "Success",
"result": {
"layoutParsingResults": [
{
"prunedResult": [...],
"markdown": {
"text": "# Document Title\n\nParagraph content...",
"images": {}
},
"outputImages": [...],
"inputImage": "http://input-image"
}
],
"dataInfo": {...}
}
}
```
**Important Fields:**
- **`prunedResult`** - Contains detailed layout element information including positions, categories, etc.
- **`markdown`** - Stores the document content converted to Markdown format with preserved structure and formatting.
## Mode Selection Guide
| Use Case | Recommended Mode |
|----------|-----------------|
| Small images (< 10MB) | Sync |
| Single page PDFs | Sync |
| Large PDFs (> 10MB) | Async |
| Multi-page documents | Async |
| Batch processing | Async |
| Quick text extraction | Sync |
## Error Handling
The script will exit with code 1 and print error message for:
- Missing required environment variables
- File not found
- API authentication failures
- Invalid JSON responses
- API error codes (non-zero)
## Quota Information
See official documentation: https://ai.baidu.com/ai-doc/AISTUDIO/Xmjclapam
---
## Skill Companion Files
> Additional files collected from the skill directory layout.
### _meta.json
```json
{
"owner": "hiotec",
"slug": "paddleocr-doc-parsing-v2",
"displayName": "PaddleOCR Document Parsing V2",
"latest": {
"version": "1.0.4",
"publishedAt": 1770944761798,
"commit": "https://github.com/openclaw/skills/commit/f97ed62d867d1ce563dae6348d7133412ca7f11d"
},
"history": []
}
```
### scripts/paddleocr_parse.py
```python
#!/usr/bin/env python3
"""
PaddleOCR Async Document Parser
Supports both sync and async parsing modes.
"""
import argparse
import base64
import json
import os
import sys
import time
from pathlib import Path
import requests
def get_env_or_exit(name: str) -> str:
"""Get environment variable or exit with error."""
value = os.environ.get(name)
if not value:
print(f"Error: {name} environment variable is required", file=sys.stderr)
print(f"Set it with: export {name}=\"your_value_here\"", file=sys.stderr)
sys.exit(1)
return value
def sync_parse(file_path: str, file_type: int, api_url: str, token: str, verbose: bool = False) -> dict:
"""Synchronous document parsing."""
if file_path.startswith("http"):
# URL mode
payload = {
"file": file_path,
"fileType": file_type,
"useDocOrientationClassify": False,
"useDocUnwarping": False,
}
else:
# Local file mode
path = Path(file_path)
if not path.exists():
print(f"Error: File not found: {file_path}", file=sys.stderr)
sys.exit(1)
file_bytes = path.read_bytes()
file_data = base64.b64encode(file_bytes).decode("ascii")
payload = {
"file": file_data,
"fileType": file_type,
"useDocOrientationClassify": False,
"useDocUnwarping": False,
}
headers = {
"Authorization": f"token {token}",
"Content-Type": "application/json"
}
if verbose:
print(f"Making sync request to: {api_url}", file=sys.stderr)
response = requests.post(api_url, json=payload, headers=headers, timeout=600)
if response.status_code != 200:
print(f"Error: HTTP {response.status_code}", file=sys.stderr)
print(response.text, file=sys.stderr)
sys.exit(1)
return response.json()
def async_parse(file_path: str, model: str, job_url: str, token: str, verbose: bool = False) -> dict:
"""Asynchronous document parsing."""
headers = {
"Authorization": f"bearer {token}",
}
optional_payload = {
"useDocOrientationClassify": False,
"useDocUnwarping": False,
"useChartRecognition": False,
}
if verbose:
print(f"Processing file: {file_path}", file=sys.stderr)
if file_path.startswith("http"):
# URL Mode
headers["Content-Type"] = "application/json"
payload = {
"fileUrl": file_path,
"model": model,
"optionalPayload": optional_payload
}
job_response = requests.post(job_url, json=payload, headers=headers)
else:
# Local File Mode
path = Path(file_path)
if not path.exists():
print(f"Error: File not found: {file_path}", file=sys.stderr)
sys.exit(1)
data = {
"model": model,
"optionalPayload": json.dumps(optional_payload)
}
with open(file_path, "rb") as f:
files = {"file": f}
job_response = requests.post(job_url, headers=headers, data=data, files=files)
if verbose:
print(f"Response status: {job_response.status_code}", file=sys.stderr)
if job_response.status_code != 200:
print(f"Error: HTTP {job_response.status_code}", file=sys.stderr)
print(job_response.text, file=sys.stderr)
sys.exit(1)
job_id = job_response.json()["data"]["jobId"]
print(f"Job submitted. ID: {job_id}", file=sys.stderr)
# Poll for results
jsonl_url = ""
while True:
job_result = requests.get(f"{job_url}/{job_id}", headers=headers)
job_result.raise_for_status()
data = job_result.json()["data"]
state = data["state"]
if state == 'pending':
if verbose:
print("Status: pending", file=sys.stderr)
elif state == 'running':
try:
progress = data['extractProgress']
total = progress['totalPages']
extracted = progress['extractedPages']
print(f"Status: running ({extracted}/{total} pages)", file=sys.stderr)
except KeyError:
if verbose:
print("Status: running...", file=sys.stderr)
elif state == 'done':
extracted = data['extractProgress']['extractedPages']
print(f"Status: done ({extracted} pages extracted)", file=sys.stderr)
jsonl_url = data['resultUrl']['jsonUrl']
break
elif state == "failed":
error_msg = data.get('errorMsg', 'Unknown error')
print(f"Error: Job failed - {error_msg}", file=sys.stderr)
sys.exit(1)
time.sleep(5)
# Fetch JSONL results
if jsonl_url:
jsonl_response = requests.get(jsonl_url)
jsonl_response.raise_for_status()
lines = jsonl_response.text.strip().split('\n')
results = []
for line in lines:
line = line.strip()
if not line:
continue
try:
result = json.loads(line)["result"]
results.extend(result.get("layoutParsingResults", []))
except (json.JSONDecodeError, KeyError) as e:
if verbose:
print(f"Warning: Failed to parse line: {e}", file=sys.stderr)
continue
return {"result": {"layoutParsingResults": results}}
return {}
def main():
parser = argparse.ArgumentParser(description="Parse documents using PaddleOCR API")
parser.add_argument("input", help="Input file path or URL")
parser.add_argument("-t", "--type", choices=["image", "pdf"], default="image",
help="File type (default: image)")
parser.add_argument("-o", "--output", help="Output file (default: stdout)")
parser.add_argument("-v", "--verbose", action="store_true", help="Verbose output")
parser.add_argument("--async-mode", action="store_true",
help="Use async mode (for large files)")
args = parser.parse_args()
# Get configuration from environment
token = get_env_or_exit("PADDLEOCR_ACCESS_TOKEN")
if args.async_mode:
job_url = get_env_or_exit("PADDLEOCR_JOB_URL")
model = os.environ.get("PADDLEOCR_MODEL", "PaddleOCR-VL-1.5")
result = async_parse(args.input, model, job_url, token, args.verbose)
else:
api_url = get_env_or_exit("PADDLEOCR_API_URL")
file_type_code = 0 if args.type == "pdf" else 1
result = sync_parse(args.input, file_type_code, api_url, token, args.verbose)
# Output result
output = json.dumps(result, ensure_ascii=False, indent=2)
if args.output:
Path(args.output).write_text(output)
print(f"Output saved to: {args.output}", file=sys.stderr)
else:
print(output)
if __name__ == "__main__":
main()
```
### scripts/paddleocr_parse.sh
```bash
#!/bin/bash
# PaddleOCR Document Parser Script
# Supports both sync and async modes
set -e
# Default values
file_type="image"
output_file=""
verbose="false"
async_mode="false"
# Function to display usage
usage() {
cat << EOF
Usage: $0 [OPTIONS] INPUT_FILE_PATH_OR_URL
Parse documents using PaddleOCR API
OPTIONS:
-t, --type TYPE File type (image, pdf) [default: image]
-o, --output FILE Output file [default: stdout]
-v, --verbose Verbose output
--async Use async mode (for large files/PDFs)
-h, --help Show this help message
ENVIRONMENT:
PADDLEOCR_ACCESS_TOKEN Required: API access token
PADDLEOCR_API_URL Required: Sync mode endpoint URL
PADDLEOCR_JOB_URL Required for async: Async endpoint URL
PADDLEOCR_MODEL Optional: Model name [default: PaddleOCR-VL-1.5]
SETUP:
1. Visit https://www.paddleocr.com to get API credentials
2. Set environment variables:
export PADDLEOCR_ACCESS_TOKEN="your_token"
export PADDLEOCR_API_URL="https://your-endpoint/layout-parsing"
EXAMPLES:
# Sync mode (default)
$0 document.jpg
$0 -t pdf document.pdf
$0 -o result.json document.jpg
# Async mode (for large files)
$0 --async large-document.pdf
EOF
}
# Parse command line arguments
while [[ $# -gt 0 ]]; do
case $1 in
-t|--type)
file_type="$2"
shift 2
;;
-o|--output)
output_file="$2"
shift 2
;;
-v|--verbose)
verbose="true"
shift
;;
--async)
async_mode="true"
shift
;;
-h|--help)
usage
exit 0
;;
-*)
echo "Unknown option: $1"
usage
exit 1
;;
*)
input_file="$1"
shift
;;
esac
done
# Validate input
if [[ -z "$input_file" ]]; then
echo "Error: Input file path or URL is required"
usage
exit 1
fi
# Check required environment variables
if [[ -z "$PADDLEOCR_ACCESS_TOKEN" ]]; then
echo "Error: PADDLEOCR_ACCESS_TOKEN environment variable is required"
echo "Get it from: https://www.paddleocr.com"
exit 1
fi
if [[ -z "$PADDLEOCR_API_URL" ]]; then
echo "Error: PADDLEOCR_API_URL environment variable is required"
echo "Set it to your PaddleOCR API endpoint"
echo "Example: export PADDLEOCR_API_URL=\"https://your-endpoint.aistudio-app.com/layout-parsing\""
exit 1
fi
# Set optional defaults
PADDLEOCR_MODEL="${PADDLEOCR_MODEL:-PaddleOCR-VL-1.5}"
# Check if input is a URL or local file
if [[ "$input_file" =~ ^https?:// ]]; then
is_url="true"
if [[ "$verbose" == "true" ]]; then
echo "Input is a URL: $input_file" >&2
fi
else
is_url="false"
if [[ ! -f "$input_file" ]]; then
echo "Error: Input file not found: $input_file"
exit 1
fi
if [[ "$verbose" == "true" ]]; then
echo "Input is a local file: $input_file" >&2
fi
fi
# Use Python script for async mode
if [[ "$async_mode" == "true" ]]; then
if [[ -z "$PADDLEOCR_JOB_URL" ]]; then
echo "Error: PADDLEOCR_JOB_URL environment variable is required for async mode"
echo "Example: export PADDLEOCR_JOB_URL=\"https://your-endpoint.aistudio-app.com/api/v2/ocr/jobs\""
exit 1
fi
if [[ "$verbose" == "true" ]]; then
echo "Using async mode with model: $PADDLEOCR_MODEL" >&2
fi
# Get script directory
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
# Run Python async parser
if [[ -n "$output_file" ]]; then
python3 "$SCRIPT_DIR/paddleocr_parse.py" --async-mode -o "$output_file" "$input_file"
else
python3 "$SCRIPT_DIR/paddleocr_parse.py" --async-mode "$input_file"
fi
exit 0
fi
# Sync mode (bash implementation)
# Validate file type
case "$file_type" in
image|img) file_type_code=1 ;;
pdf) file_type_code=0 ;;
*)
echo "Error: Invalid file type '$file_type'. Supported: image, pdf"
exit 1
;;
esac
# Build payload - directly use URL or encode file
if [[ "$is_url" == "true" ]]; then
# Use URL directly in payload
payload=$(cat <<EOF
{
"file": "$input_file",
"fileType": $file_type_code,
"useDocOrientationClassify": false,
"useDocUnwarping": false
}
EOF
)
if [[ "$verbose" == "true" ]]; then
echo "Using URL directly in API request" >&2
fi
else
# Encode local file to base64
if [[ "$verbose" == "true" ]]; then
echo "Encoding $input_file to base64..." >&2
fi
file_base64=$(cat "$input_file" | base64 | tr -d '\n')
payload=$(cat <<EOF
{
"file": "$file_base64",
"fileType": $file_type_code,
"useDocOrientationClassify": false,
"useDocUnwarping": false
}
EOF
)
fi
if [[ "$verbose" == "true" ]]; then
echo "Making API request to: $PADDLEOCR_API_URL" >&2
echo "Payload size: ${#payload} bytes" >&2
fi
# Make API request
# Use temporary file to avoid "Argument list too long" error for large payloads
payload_file=$(mktemp)
echo "$payload" > "$payload_file"
if [[ "$verbose" == "true" ]]; then
echo "Request payload saved to temporary file: $payload_file" >&2
fi
# Use trap to ensure temporary file cleanup on script exit
cleanup() {
if [[ -f "$payload_file" ]]; then
rm -f "$payload_file"
if [[ "$verbose" == "true" ]]; then
echo "Cleaned up temporary file: $payload_file" >&2
fi
fi
}
trap cleanup EXIT
response=$(curl -s -X POST "$PADDLEOCR_API_URL" \
-m 600 \
--fail-with-body \
-H "Authorization: token $PADDLEOCR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d @"$payload_file")
# Check for curl errors
curl_exit_code=$?
if [[ $curl_exit_code -ne 0 ]]; then
echo "Error: Curl request failed with code $curl_exit_code"
exit 1
fi
# Check response for errors using jq
if ! echo "$response" | jq -e . >/dev/null 2>&1; then
echo "Error: Invalid JSON response from API"
exit 1
fi
error_code=$(echo "$response" | jq -r '.errorCode // empty')
error_msg=$(echo "$response" | jq -r '.errorMsg // empty')
if [[ -n "$error_code" && "$error_code" != "0" ]]; then
echo "API Error ($error_code): $error_msg"
exit 1
fi
# Extract and process result
if [[ -n "$output_file" ]]; then
echo "$response" > "$output_file"
if [[ "$verbose" == "true" ]]; then
echo "Output saved to: $output_file" >&2
fi
else
echo "$response"
fi
if [[ "$verbose" == "true" ]]; then
echo "Processing completed successfully" >&2
fi
```