Back to skills
SkillHub ClubShip Full StackFull Stack
Whisper-Transcription
Imported from https://github.com/lawless-m/claude-skills.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Stars
6
Hot score
82
Updated
March 20, 2026
Overall rating
C3.5
Composite score
3.5
Best-practice grade
C57.6
Install command
npx @skill-hub/cli install lawless-m-claude-skills-whisper-transcription
Repository
lawless-m/claude-skills
Skill path: .claude/skills/Whisper-Transcription
Imported from https://github.com/lawless-m/claude-skills.
Open repositoryBest for
Primary workflow: Ship Full Stack.
Technical facets: Full Stack.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: lawless-m.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install Whisper-Transcription into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/lawless-m/claude-skills before adding Whisper-Transcription to shared team environments
- Use Whisper-Transcription for development workflows
Works across
Claude CodeCodex CLIGemini CLIOpenCode
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
--- name: Whisper-Transcription description: Audio transcription using local whisper.cpp server with CUDA acceleration. HTTP API for speech-to-text conversion. --- # Whisper Transcription Server Local speech-to-text transcription using whisper.cpp with GPU acceleration. The server runs on port 5555 and accepts audio files via HTTP POST. ## Instructions When helping users with audio transcription, follow these guidelines: 1. **Server Location**: The whisper-server runs at `http://localhost:5555` with the large-v3 model 2. **Audio Format**: Server accepts WAV, MP3, and other common formats. 16kHz mono WAV is optimal 3. **API Endpoint**: Use POST to `/inference` with multipart form data 4. **GPU Memory**: large-v3 uses ~6GB VRAM on the RTX 3090 (24GB total) 5. **VRAM Sharing**: If OOM errors occur, wait 30-60 seconds for other GPU services to unload (see Vram-GPU-OOM skill) ## Examples ### Example 1: Basic Transcription ``` User: Transcribe this audio file Claude: I'll send the audio to the local Whisper server: curl -X POST http://localhost:5555/inference \ -F "[email protected]" The server returns JSON with the transcribed text. ``` ### Example 2: Recording and Transcribing ``` User: Record my voice and transcribe it Claude: I'll record audio then send it to Whisper: 1. Record 5 seconds of audio: arecord -d 5 -f S16_LE -r 16000 -c 1 recording.wav 2. Transcribe: curl -X POST http://localhost:5555/inference \ -F "[email protected]" ``` ### Example 3: Python Integration ``` User: How do I use Whisper from Python? Claude: Use requests to POST to the inference endpoint: [Provides implementation using the Python pattern from reference material below] ``` --- # Reference Implementation Details ## Server Configuration **Location**: `~/whisper.cpp/build/bin/whisper-server` **Model**: `~/whisper.cpp/models/ggml-large-v3.bin` **Port**: 5555 ### Server Startup Command ```bash ~/whisper.cpp/build/bin/whisper-server \ -m ~/whisper.cpp/models/ggml-large-v3.bin \ -l en \ --port 5555 \ --host 0.0.0.0 ``` ### Systemd Service **Location**: `/etc/systemd/system/whisper-server.service` ```ini [Unit] Description=Whisper.cpp Transcription Server After=network.target [Service] Type=simple User=matt WorkingDirectory=/home/matt/whisper.cpp/build ExecStart=/home/matt/whisper.cpp/build/bin/whisper-server \ -m /home/matt/whisper.cpp/models/ggml-large-v3.bin \ -l en \ --port 5555 \ --host 0.0.0.0 \ --threads 4 Restart=always RestartSec=10 [Install] WantedBy=multi-user.target ``` ## API Reference ### POST /inference Transcribe an audio file. **Request:** ```bash curl -X POST http://localhost:5555/inference \ -F "[email protected]" \ -F "response_format=json" ``` **Response:** ```json { "text": "transcribed text here" } ``` ### GET / Health check endpoint. ## Python Integration ```python import requests def transcribe(audio_path: str, server_url: str = "http://localhost:5555") -> str: """Transcribe audio file using local Whisper server.""" with open(audio_path, "rb") as f: response = requests.post( f"{server_url}/inference", files={"file": f}, timeout=120 ) response.raise_for_status() return response.json().get("text", "") # Usage text = transcribe("recording.wav") print(text) ``` ## Shell Integration ```bash #!/bin/bash # transcribe.sh - Quick transcription helper WHISPER_URL="${WHISPER_URL:-http://localhost:5555}" if [ -z "$1" ]; then echo "Usage: transcribe.sh <audio_file>" exit 1 fi curl -s -X POST "$WHISPER_URL/inference" \ -F "file=@$1" | jq -r '.text' ``` ## Troubleshooting ### Server Won't Start **Cause:** Model file missing or CUDA unavailable **Solution:** ```bash # Check model exists ls -lh ~/whisper.cpp/models/ggml-large-v3.bin # Check CUDA nvidia-smi ``` ### OOM Error **Cause:** Other GPU services using VRAM **Solution:** ```bash # Check GPU memory usage nvidia-smi # Wait for other services to unload, or manually stop them # See Vram-GPU-OOM skill for retry patterns ``` ### Slow Transcription **Cause:** CPU fallback instead of GPU **Solution:** ```bash # Verify GPU is being used during transcription watch -n 1 nvidia-smi # Should show whisper-server using GPU memory ``` ### Connection Refused **Cause:** Server not running **Solution:** ```bash # Check service status systemctl status whisper-server # Start if stopped sudo systemctl start whisper-server # View logs journalctl -u whisper-server -f ``` ## Performance Notes - **Speed**: ~2-4x real-time (1 second audio = 0.25-0.5 seconds processing) - **VRAM Usage**: ~6GB for large-v3 - **Accuracy**: Excellent for English speech - **Latency**: First request may be slower (model loading)