SkillHub ClubShip Full StackFull Stack

local-stt

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

3,075

Hot score

Updated

March 20, 2026

Overall rating

C4.0

Composite score

4.0

Best-practice grade

B84.0

Install command

npx @skill-hub/cli install openclaw-skills-local-stt

Repository

openclaw/skills

Skill path: skills/araa47/local-stt

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install local-stt into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/openclaw/skills before adding local-stt to shared team environments
Use local-stt for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: local-stt
description: Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).
metadata: {"openclaw":{"emoji":"🎙️","requires":{"bins":["ffmpeg"]}}}
---

# Local STT (Parakeet / Whisper)

Unified local speech-to-text using ONNX Runtime with int8 quantization. Choose your backend:

- **Parakeet** (default): Best accuracy for English, correctly captures names and filler words
- **Whisper**: Fastest inference, supports 99 languages

## Usage

```bash
# Default: Parakeet v2 (best English accuracy)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg

# Explicit backend selection
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b whisper
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b parakeet -m v3

# Quiet mode (suppress progress)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg --quiet
```

## Options

- `-b/--backend`: `parakeet` (default), `whisper`
- `-m/--model`: Model variant (see below)
- `--no-int8`: Disable int8 quantization
- `-q/--quiet`: Suppress progress
- `--room-id`: Matrix room ID for direct message

## Models

### Parakeet (default backend)
| Model | Description |
|-------|-------------|
| **v2** (default) | English only, best accuracy |
| v3 | Multilingual |

### Whisper
| Model | Description |
|-------|-------------|
| tiny | Fastest, lower accuracy |
| **base** (default) | Good balance |
| small | Better accuracy |
| large-v3-turbo | Best quality, slower |

## Benchmark (24s audio)

| Backend/Model | Time | RTF | Notes |
|---------------|------|-----|-------|
| Whisper Base int8 | 0.43s | 0.018x | Fastest |
| **Parakeet v2 int8** | 0.60s | 0.025x | Best accuracy |
| Parakeet v3 int8 | 0.63s | 0.026x | Multilingual |

## openclaw.json

```json
{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "type": "cli",
            "command": "~/.openclaw/skills/local-stt/scripts/local-stt.py",
            "args": ["--quiet", "{{MediaPath}}"],
            "timeoutSeconds": 30
          }
        ]
      }
    }
  }
}
```


---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### _meta.json

```json
{
  "owner": "araa47",
  "slug": "local-stt",
  "displayName": "Local STT (Nvidia Parakeet + Whisper Support)",
  "latest": {
    "version": "1.0.0",
    "publishedAt": 1770042670809,
    "commit": "https://github.com/clawdbot/skills/commit/355d1a16968e42f8b9d64b89442a7f5b9fa54936"
  },
  "history": []
}

```

### scripts/local-stt.py

```python
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.12"
# dependencies = [
#     "onnx-asr",
#     "onnxruntime",
#     "huggingface_hub",
#     "click",
#     "requests",
# ]
# ///
"""Local speech-to-text using Parakeet (default) or Whisper backends.

CLI model for openclaw media understanding. Outputs transcription to stdout.
When --room-id is provided, also sends transcription to that Matrix room.
"""

import subprocess
import tempfile
import warnings
import os
from pathlib import Path

BACKENDS = {
    "parakeet": {
        "models": {
            "v2": "nemo-parakeet-tdt-0.6b-v2",  # English only, best accuracy
            "v3": "nemo-parakeet-tdt-0.6b-v3",  # Multilingual
        },
        "default": "v2",
        "description": "NVIDIA Parakeet TDT - best accuracy for English",
    },
    "whisper": {
        "models": {
            "tiny": "whisper-tiny",
            "base": "whisper-base",
            "small": "whisper-small",
            "large-v3-turbo": "whisper-large-v3-turbo",
        },
        "default": "base",
        "description": "OpenAI Whisper - fastest, 99 languages",
    },
}

DEFAULT_BACKEND = "parakeet"


def load_env_file():
    """Load .env file from home directory if it exists."""
    env_paths = [Path.home() / ".openclaw" / ".env", Path.home() / ".env"]
    for env_path in env_paths:
        if env_path.exists():
            with open(env_path) as f:
                for line in f:
                    line = line.strip()
                    if line and not line.startswith("#") and "=" in line:
                        if line.startswith("export "):
                            line = line[7:]
                        key, value = line.split("=", 1)
                        key = key.strip()
                        if key not in os.environ:
                            os.environ[key] = value.strip().strip('"').strip("'")


warnings.filterwarnings("ignore")

import click
import requests


def send_to_matrix(room_id: str, text: str, quiet: bool = False):
    """Send transcription to Matrix room via REST API."""
    load_env_file()
    homeserver = os.environ.get("MATRIX_HOMESERVER")
    access_token = os.environ.get("MATRIX_ACCESS_TOKEN")

    if not homeserver or not access_token:
        if not quiet:
            click.echo("MATRIX_HOMESERVER or MATRIX_ACCESS_TOKEN not set, skipping Matrix send", err=True)
        return

    try:
        import time
        txn_id = int(time.time() * 1000)

        target_room = room_id
        if target_room.startswith("room:"):
            target_room = target_room[5:]

        url = f"{homeserver.rstrip('/')}/_matrix/client/v3/rooms/{target_room}/send/m.room.message/{txn_id}"
        headers = {"Authorization": f"Bearer {access_token}"}
        payload = {
            'msgtype': 'm.text',
            'body': f'🎙️ {text}',
            'format': 'org.matrix.custom.html',
            'formatted_body': f'<blockquote>🎙️ {text}</blockquote>'
        }

        with open("/tmp/stt_matrix.log", "a") as log:
            log.write(f"Attempting send to {room_id} at {txn_id}\n")
            log.write(f"URL: {url}\n")

        resp = requests.put(url, headers=headers, json=payload, timeout=10)

        with open("/tmp/stt_matrix.log", "a") as log:
            log.write(f"Response: {resp.status_code}\n")

        resp.raise_for_status()
        if not quiet:
            click.echo(f"Sent to Matrix room {room_id}", err=True)
    except Exception as e:
        if not quiet:
            click.echo(f"Failed to send Matrix message: {e}", err=True)


def get_all_models():
    """Get all valid model names across all backends."""
    models = []
    for backend_info in BACKENDS.values():
        models.extend(backend_info["models"].keys())
    return models


@click.command()
@click.argument("audio_file", type=click.Path(exists=True))
@click.option("-b", "--backend", default=DEFAULT_BACKEND, type=click.Choice(list(BACKENDS.keys())),
              help=f"STT backend (default: {DEFAULT_BACKEND})")
@click.option("-m", "--model", default=None,
              help="Model variant (default: v2 for parakeet, base for whisper)")
@click.option("--no-int8", is_flag=True, help="Disable int8 quantization (slower)")
@click.option("-q", "--quiet", is_flag=True, help="Suppress progress messages")
@click.option("--room-id", default=None, help="Matrix room ID to send transcription to")
def main(audio_file: str, backend: str, model: str | None, no_int8: bool, quiet: bool, room_id: str | None):
    """Transcribe audio using local STT (Parakeet or Whisper)."""
    if quiet:
        warnings.filterwarnings("ignore")
        os.environ["PYTHONWARNINGS"] = "ignore"

    import onnx_asr

    backend_info = BACKENDS[backend]
    available_models = backend_info["models"]
    
    # Use default model for backend if not specified
    if model is None:
        model = backend_info["default"]
    
    # Validate model for this backend
    if model not in available_models:
        valid = ", ".join(available_models.keys())
        raise click.BadParameter(f"Invalid model '{model}' for {backend}. Valid options: {valid}")

    # Convert to wav format (16kHz mono)
    with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as tmp:
        tmp_path = tmp.name

    try:
        subprocess.run(
            ['ffmpeg', '-y', '-i', audio_file, '-ar', '16000', '-ac', '1', tmp_path],
            capture_output=True, check=True
        )

        model_id = available_models[model]
        quantization = None if no_int8 else "int8"

        if not quiet:
            quant_str = "fp32" if no_int8 else "int8"
            click.echo(f"Loading {backend}/{model} ({quant_str})...", err=True)

        asr_model = onnx_asr.load_model(model_id, quantization=quantization)

        if not quiet:
            click.echo(f"Transcribing: {audio_file}...", err=True)

        result = asr_model.recognize(tmp_path)
        text = result.strip()

        # Output to stdout - openclaw captures this for context
        click.echo(text)

        # If room_id provided, also send directly to Matrix
        if room_id and text:
            send_to_matrix(room_id, text, quiet)

    finally:
        os.unlink(tmp_path)


if __name__ == "__main__":
    main()

```