local-stt
Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install openclaw-skills-local-stt
Repository
Skill path: skills/araa47/local-stt
Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).
Open repositoryBest for
Primary workflow: Ship Full Stack.
Technical facets: Full Stack.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: openclaw.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install local-stt into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/openclaw/skills before adding local-stt to shared team environments
- Use local-stt for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: local-stt
description: Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).
metadata: {"openclaw":{"emoji":"🎙️","requires":{"bins":["ffmpeg"]}}}
---
# Local STT (Parakeet / Whisper)
Unified local speech-to-text using ONNX Runtime with int8 quantization. Choose your backend:
- **Parakeet** (default): Best accuracy for English, correctly captures names and filler words
- **Whisper**: Fastest inference, supports 99 languages
## Usage
```bash
# Default: Parakeet v2 (best English accuracy)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg
# Explicit backend selection
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b whisper
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg -b parakeet -m v3
# Quiet mode (suppress progress)
~/.openclaw/skills/local-stt/scripts/local-stt.py audio.ogg --quiet
```
## Options
- `-b/--backend`: `parakeet` (default), `whisper`
- `-m/--model`: Model variant (see below)
- `--no-int8`: Disable int8 quantization
- `-q/--quiet`: Suppress progress
- `--room-id`: Matrix room ID for direct message
## Models
### Parakeet (default backend)
| Model | Description |
|-------|-------------|
| **v2** (default) | English only, best accuracy |
| v3 | Multilingual |
### Whisper
| Model | Description |
|-------|-------------|
| tiny | Fastest, lower accuracy |
| **base** (default) | Good balance |
| small | Better accuracy |
| large-v3-turbo | Best quality, slower |
## Benchmark (24s audio)
| Backend/Model | Time | RTF | Notes |
|---------------|------|-----|-------|
| Whisper Base int8 | 0.43s | 0.018x | Fastest |
| **Parakeet v2 int8** | 0.60s | 0.025x | Best accuracy |
| Parakeet v3 int8 | 0.63s | 0.026x | Multilingual |
## openclaw.json
```json
{
"tools": {
"media": {
"audio": {
"enabled": true,
"models": [
{
"type": "cli",
"command": "~/.openclaw/skills/local-stt/scripts/local-stt.py",
"args": ["--quiet", "{{MediaPath}}"],
"timeoutSeconds": 30
}
]
}
}
}
}
```
---
## Skill Companion Files
> Additional files collected from the skill directory layout.
### _meta.json
```json
{
"owner": "araa47",
"slug": "local-stt",
"displayName": "Local STT (Nvidia Parakeet + Whisper Support)",
"latest": {
"version": "1.0.0",
"publishedAt": 1770042670809,
"commit": "https://github.com/clawdbot/skills/commit/355d1a16968e42f8b9d64b89442a7f5b9fa54936"
},
"history": []
}
```
### scripts/local-stt.py
```python
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "onnx-asr",
# "onnxruntime",
# "huggingface_hub",
# "click",
# "requests",
# ]
# ///
"""Local speech-to-text using Parakeet (default) or Whisper backends.
CLI model for openclaw media understanding. Outputs transcription to stdout.
When --room-id is provided, also sends transcription to that Matrix room.
"""
import subprocess
import tempfile
import warnings
import os
from pathlib import Path
BACKENDS = {
"parakeet": {
"models": {
"v2": "nemo-parakeet-tdt-0.6b-v2", # English only, best accuracy
"v3": "nemo-parakeet-tdt-0.6b-v3", # Multilingual
},
"default": "v2",
"description": "NVIDIA Parakeet TDT - best accuracy for English",
},
"whisper": {
"models": {
"tiny": "whisper-tiny",
"base": "whisper-base",
"small": "whisper-small",
"large-v3-turbo": "whisper-large-v3-turbo",
},
"default": "base",
"description": "OpenAI Whisper - fastest, 99 languages",
},
}
DEFAULT_BACKEND = "parakeet"
def load_env_file():
"""Load .env file from home directory if it exists."""
env_paths = [Path.home() / ".openclaw" / ".env", Path.home() / ".env"]
for env_path in env_paths:
if env_path.exists():
with open(env_path) as f:
for line in f:
line = line.strip()
if line and not line.startswith("#") and "=" in line:
if line.startswith("export "):
line = line[7:]
key, value = line.split("=", 1)
key = key.strip()
if key not in os.environ:
os.environ[key] = value.strip().strip('"').strip("'")
warnings.filterwarnings("ignore")
import click
import requests
def send_to_matrix(room_id: str, text: str, quiet: bool = False):
"""Send transcription to Matrix room via REST API."""
load_env_file()
homeserver = os.environ.get("MATRIX_HOMESERVER")
access_token = os.environ.get("MATRIX_ACCESS_TOKEN")
if not homeserver or not access_token:
if not quiet:
click.echo("MATRIX_HOMESERVER or MATRIX_ACCESS_TOKEN not set, skipping Matrix send", err=True)
return
try:
import time
txn_id = int(time.time() * 1000)
target_room = room_id
if target_room.startswith("room:"):
target_room = target_room[5:]
url = f"{homeserver.rstrip('/')}/_matrix/client/v3/rooms/{target_room}/send/m.room.message/{txn_id}"
headers = {"Authorization": f"Bearer {access_token}"}
payload = {
'msgtype': 'm.text',
'body': f'🎙️ {text}',
'format': 'org.matrix.custom.html',
'formatted_body': f'<blockquote>🎙️ {text}</blockquote>'
}
with open("/tmp/stt_matrix.log", "a") as log:
log.write(f"Attempting send to {room_id} at {txn_id}\n")
log.write(f"URL: {url}\n")
resp = requests.put(url, headers=headers, json=payload, timeout=10)
with open("/tmp/stt_matrix.log", "a") as log:
log.write(f"Response: {resp.status_code}\n")
resp.raise_for_status()
if not quiet:
click.echo(f"Sent to Matrix room {room_id}", err=True)
except Exception as e:
if not quiet:
click.echo(f"Failed to send Matrix message: {e}", err=True)
def get_all_models():
"""Get all valid model names across all backends."""
models = []
for backend_info in BACKENDS.values():
models.extend(backend_info["models"].keys())
return models
@click.command()
@click.argument("audio_file", type=click.Path(exists=True))
@click.option("-b", "--backend", default=DEFAULT_BACKEND, type=click.Choice(list(BACKENDS.keys())),
help=f"STT backend (default: {DEFAULT_BACKEND})")
@click.option("-m", "--model", default=None,
help="Model variant (default: v2 for parakeet, base for whisper)")
@click.option("--no-int8", is_flag=True, help="Disable int8 quantization (slower)")
@click.option("-q", "--quiet", is_flag=True, help="Suppress progress messages")
@click.option("--room-id", default=None, help="Matrix room ID to send transcription to")
def main(audio_file: str, backend: str, model: str | None, no_int8: bool, quiet: bool, room_id: str | None):
"""Transcribe audio using local STT (Parakeet or Whisper)."""
if quiet:
warnings.filterwarnings("ignore")
os.environ["PYTHONWARNINGS"] = "ignore"
import onnx_asr
backend_info = BACKENDS[backend]
available_models = backend_info["models"]
# Use default model for backend if not specified
if model is None:
model = backend_info["default"]
# Validate model for this backend
if model not in available_models:
valid = ", ".join(available_models.keys())
raise click.BadParameter(f"Invalid model '{model}' for {backend}. Valid options: {valid}")
# Convert to wav format (16kHz mono)
with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as tmp:
tmp_path = tmp.name
try:
subprocess.run(
['ffmpeg', '-y', '-i', audio_file, '-ar', '16000', '-ac', '1', tmp_path],
capture_output=True, check=True
)
model_id = available_models[model]
quantization = None if no_int8 else "int8"
if not quiet:
quant_str = "fp32" if no_int8 else "int8"
click.echo(f"Loading {backend}/{model} ({quant_str})...", err=True)
asr_model = onnx_asr.load_model(model_id, quantization=quantization)
if not quiet:
click.echo(f"Transcribing: {audio_file}...", err=True)
result = asr_model.recognize(tmp_path)
text = result.strip()
# Output to stdout - openclaw captures this for context
click.echo(text)
# If room_id provided, also send directly to Matrix
if room_id and text:
send_to_matrix(room_id, text, quiet)
finally:
os.unlink(tmp_path)
if __name__ == "__main__":
main()
```