SkillHub ClubShip Full StackFull Stack

ocr

Extract text from images using Tesseract OCR

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

1,021

Hot score

Updated

March 20, 2026

Overall rating

C6.1

Composite score

6.1

Best-practice grade

B81.2

Install command

npx @skill-hub/cli install trpc-group-trpc-agent-go-ocr

Repository

trpc-group/trpc-agent-go

Skill path: examples/skill/skills/ocr

Extract text from images using Tesseract OCR

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: trpc-group.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install ocr into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/trpc-group/trpc-agent-go before adding ocr to shared team environments
Use ocr for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: ocr
description: Extract text from images using Tesseract OCR
---

# OCR Image Text Extraction Skill

Extract text from images using Tesseract OCR engine.

## Capabilities

- Extract text from image files (PNG, JPG, JPEG, GIF, BMP, TIFF)
- Support for 100+ languages
- Optional image preprocessing for better accuracy
- Output in plain text or JSON format with confidence scores

## Usage

### Basic OCR

```bash
python3 scripts/ocr.py <image_file> <output_file>
```

### With Options

```bash
# Specify language (default: eng)
python3 scripts/ocr.py image.png text.txt --lang eng

# Chinese text
python3 scripts/ocr.py image.png text.txt --lang chi_sim

# Multiple languages
python3 scripts/ocr.py image.png text.txt --lang eng+chi_sim

# With image preprocessing (improves accuracy)
python3 scripts/ocr.py image.png text.txt --preprocess

# JSON output with confidence scores
python3 scripts/ocr.py image.png output.json --format json
```

### Download and OCR from URL

```bash
# OCR from remote image
python3 scripts/ocr_url.py <image_url> <output_file>

# With options
python3 scripts/ocr_url.py https://example.com/image.jpg text.txt --lang eng --preprocess
```

## Parameters

- `image_file` / `image_url` (required): Path to local image or image URL
- `output_file` (required): Path to output text/JSON file
- `--lang`: Language code (e.g., eng, chi_sim, jpn, fra, deu). Default: eng
- `--preprocess`: Apply image preprocessing (grayscale, thresholding) for better accuracy
- `--format`: Output format (text/json, default: text)

## Common Languages

| Language | Code |
|----------|------|
| English | eng |
| Chinese (Simplified) | chi_sim |
| Chinese (Traditional) | chi_tra |
| Japanese | jpn |
| Korean | kor |
| French | fra |
| German | deu |
| Spanish | spa |
| Russian | rus |
| Arabic | ara |

## Supported Image Formats

PNG, JPG, JPEG, GIF, BMP, TIFF, WEBP

## Dependencies

- Python 3.8+
- pytesseract
- Pillow (PIL)
- tesseract-ocr (system package)

## Installation

```bash
# Python packages
pip install pytesseract Pillow

# Tesseract OCR engine
sudo apt-get install tesseract-ocr  # Ubuntu/Debian
sudo yum install tesseract           # CentOS/RHEL
brew install tesseract               # macOS
```


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### scripts/ocr.py

```python
#!/usr/bin/env python3
"""
OCR Script for Image Text Extraction
Extract text from images using Tesseract OCR
"""

import argparse
import json
import sys
import os

try:
    import pytesseract
    from PIL import Image, ImageEnhance, ImageFilter
except ImportError as e:
    print(f"Error: Required package not installed: {e}", file=sys.stderr)
    print("Run: pip install pytesseract Pillow", file=sys.stderr)
    sys.exit(1)


def preprocess_image(image):
    """Apply preprocessing to improve OCR accuracy"""
    # Convert to grayscale
    image = image.convert('L')
    
    # Enhance contrast
    enhancer = ImageEnhance.Contrast(image)
    image = enhancer.enhance(2.0)
    
    # Apply sharpening
    image = image.filter(ImageFilter.SHARPEN)
    
    return image


def ocr_image(image_path, output_path, lang="eng", preprocess=False, output_format="text"):
    """Extract text from image using OCR"""
    
    # Validate input file
    if not os.path.exists(image_path):
        print(f"Error: Image file not found: {image_path}", file=sys.stderr)
        sys.exit(1)
    
    try:
        # Load image
        print(f"Loading image: {image_path}...", file=sys.stderr)
        image = Image.open(image_path)
        
        # Preprocess if requested
        if preprocess:
            print("Applying image preprocessing...", file=sys.stderr)
            image = preprocess_image(image)
        
        # Perform OCR
        print(f"Extracting text (language: {lang})...", file=sys.stderr)
        
        if output_format == "json":
            # Get detailed data with confidence scores
            data = pytesseract.image_to_data(image, lang=lang, output_type=pytesseract.Output.DICT)
            
            # Extract text and confidence
            text_parts = []
            for i, text in enumerate(data['text']):
                if text.strip():
                    text_parts.append(text)
            
            full_text = " ".join(text_parts)
            avg_conf = sum(c for c in data['conf'] if c != -1) / max(len([c for c in data['conf'] if c != -1]), 1)
            
            output_data = {
                "text": full_text.strip(),
                "language": lang,
                "confidence": round(avg_conf, 2),
                "image_path": image_path
            }
            
            with open(output_path, "w", encoding="utf-8") as f:
                json.dump(output_data, f, ensure_ascii=False, indent=2)
        else:
            # Plain text output
            text = pytesseract.image_to_string(image, lang=lang)
            
            with open(output_path, "w", encoding="utf-8") as f:
                f.write(text.strip())
        
        print(f"✓ Text extracted successfully", file=sys.stderr)
        print(f"  Output saved to: {output_path}", file=sys.stderr)
        
    except Exception as e:
        print(f"Error during OCR processing: {e}", file=sys.stderr)
        sys.exit(1)


def main():
    parser = argparse.ArgumentParser(description="Extract text from images using OCR")
    parser.add_argument("image_file", help="Input image file path")
    parser.add_argument("output_file", help="Output text/JSON file path")
    parser.add_argument("--lang", default="eng",
                       help="Language code (e.g., eng, chi_sim, jpn). Default: eng")
    parser.add_argument("--preprocess", action="store_true",
                       help="Apply image preprocessing for better accuracy")
    parser.add_argument("--format", default="text", choices=["text", "json"],
                       help="Output format (default: text)")
    
    args = parser.parse_args()
    
    ocr_image(
        args.image_file,
        args.output_file,
        lang=args.lang,
        preprocess=args.preprocess,
        output_format=args.format
    )


if __name__ == "__main__":
    main()

```

### scripts/ocr_url.py

```python
#!/usr/bin/env python3
"""
OCR Script for Remote Images
Download and extract text from images hosted on the web
"""

import argparse
import sys
import os
import tempfile

try:
    import pytesseract
    from PIL import Image, ImageEnhance, ImageFilter
    import requests
except ImportError as e:
    print(f"Error: Required package not installed: {e}", file=sys.stderr)
    print("Run: pip install pytesseract Pillow requests", file=sys.stderr)
    sys.exit(1)


def preprocess_image(image):
    """Apply preprocessing to improve OCR accuracy"""
    image = image.convert('L')
    enhancer = ImageEnhance.Contrast(image)
    image = enhancer.enhance(2.0)
    image = image.filter(ImageFilter.SHARPEN)
    return image


def ocr_from_url(image_url, output_path, lang="eng", preprocess=False, output_format="text"):
    """Download image from URL and extract text using OCR"""
    
    try:
        # Download image
        print(f"Downloading image from: {image_url}...", file=sys.stderr)
        response = requests.get(image_url, timeout=30, stream=True)
        response.raise_for_status()
        
        # Save to temporary file
        with tempfile.NamedTemporaryFile(delete=False, suffix=".img") as tmp_file:
            for chunk in response.iter_content(chunk_size=8192):
                tmp_file.write(chunk)
            tmp_path = tmp_file.name
        
        print(f"Image downloaded to: {tmp_path}", file=sys.stderr)
        
        # Load image
        image = Image.open(tmp_path)
        
        # Preprocess if requested
        if preprocess:
            print("Applying image preprocessing...", file=sys.stderr)
            image = preprocess_image(image)
        
        # Perform OCR
        print(f"Extracting text (language: {lang})...", file=sys.stderr)
        text = pytesseract.image_to_string(image, lang=lang)
        
        # Write output
        with open(output_path, "w", encoding="utf-8") as f:
            f.write(text.strip())
        
        # Clean up temp file
        os.unlink(tmp_path)
        
        print(f"✓ Text extracted successfully", file=sys.stderr)
        print(f"  Output saved to: {output_path}", file=sys.stderr)
        
    except requests.RequestException as e:
        print(f"Error downloading image: {e}", file=sys.stderr)
        sys.exit(1)
    except Exception as e:
        print(f"Error during OCR processing: {e}", file=sys.stderr)
        sys.exit(1)


def main():
    parser = argparse.ArgumentParser(description="Extract text from remote images using OCR")
    parser.add_argument("image_url", help="Image URL")
    parser.add_argument("output_file", help="Output text file path")
    parser.add_argument("--lang", default="eng",
                       help="Language code (e.g., eng, chi_sim, jpn). Default: eng")
    parser.add_argument("--preprocess", action="store_true",
                       help="Apply image preprocessing for better accuracy")
    parser.add_argument("--format", default="text", choices=["text", "json"],
                       help="Output format (default: text)")
    
    args = parser.parse_args()
    
    ocr_from_url(
        args.image_url,
        args.output_file,
        lang=args.lang,
        preprocess=args.preprocess,
        output_format=args.format
    )


if __name__ == "__main__":
    main()

```