elevenlabs-stt
Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install openclaw-skills-elevenlabs-stt
Repository
Skill path: skills/clawdbotborges/elevenlabs-stt
Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).
Open repositoryBest for
Primary workflow: Ship Full Stack.
Technical facets: Full Stack.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: openclaw.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install elevenlabs-stt into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/openclaw/skills before adding elevenlabs-stt to shared team environments
- Use elevenlabs-stt for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: elevenlabs-stt
description: Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).
homepage: https://elevenlabs.io/speech-to-text
metadata: {"clawdbot":{"emoji":"ποΈ","requires":{"bins":["curl"],"env":["ELEVENLABS_API_KEY"]},"primaryEnv":"ELEVENLABS_API_KEY"}}
---
# ElevenLabs Speech-to-Text
Transcribe audio files using ElevenLabs' Scribe v2 model. Supports 90+ languages with speaker diarization.
## Quick Start
```bash
# Basic transcription
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3
# With speaker diarization
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --diarize
# Specify language (improves accuracy)
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --lang en
# Full JSON output with timestamps
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --json
```
## Options
| Flag | Description |
|------|-------------|
| `--diarize` | Identify different speakers |
| `--lang CODE` | ISO language code (e.g., en, pt, es) |
| `--json` | Output full JSON with word timestamps |
| `--events` | Tag audio events (laughter, music, etc.) |
## Supported Formats
All major audio/video formats: mp3, m4a, wav, ogg, webm, mp4, etc.
## API Key
Set `ELEVENLABS_API_KEY` environment variable, or configure in clawdbot.json:
```json5
{
skills: {
entries: {
"elevenlabs-stt": {
apiKey: "sk_..."
}
}
}
}
```
## Examples
```bash
# Transcribe a WhatsApp voice note
{baseDir}/scripts/transcribe.sh ~/Downloads/voice_note.ogg
# Meeting recording with multiple speakers
{baseDir}/scripts/transcribe.sh meeting.mp3 --diarize --lang en
# Get JSON for processing
{baseDir}/scripts/transcribe.sh podcast.mp3 --json > transcript.json
```
---
## Skill Companion Files
> Additional files collected from the skill directory layout.
### README.md
```markdown
# ποΈ ElevenLabs Speech-to-Text Skill
A [Clawdbot](https://github.com/clawdbot/clawdbot) skill for transcribing audio files using ElevenLabs' Scribe v2 model.
## Features
- π **90+ languages** supported with automatic detection
- π₯ **Speaker diarization** β identify different speakers
- π΅ **Audio event tagging** β detect laughter, music, applause, etc.
- π **Word-level timestamps** β precise timing in JSON output
- π§ **All major formats** β mp3, m4a, wav, ogg, webm, mp4, and more
## Installation
### For Clawdbot
Add to your `clawdbot.json`:
```json5
{
skills: {
entries: {
"elevenlabs-stt": {
source: "github:clawdbotborges/elevenlabs-stt",
apiKey: "sk_your_api_key_here"
}
}
}
}
```
### Standalone
```bash
git clone https://github.com/clawdbotborges/elevenlabs-stt.git
cd elevenlabs-stt
export ELEVENLABS_API_KEY="sk_your_api_key_here"
```
## Usage
```bash
# Basic transcription
./scripts/transcribe.sh audio.mp3
# With speaker diarization
./scripts/transcribe.sh meeting.mp3 --diarize
# Specify language for better accuracy
./scripts/transcribe.sh voice_note.ogg --lang en
# Full JSON with timestamps
./scripts/transcribe.sh podcast.mp3 --json
# Tag audio events (laughter, music, etc.)
./scripts/transcribe.sh recording.wav --events
```
## Options
| Flag | Description |
|------|-------------|
| `--diarize` | Enable speaker diarization |
| `--lang CODE` | ISO language code (e.g., `en`, `pt`, `es`, `fr`) |
| `--json` | Output full JSON response with word timestamps |
| `--events` | Tag audio events like laughter, music, applause |
| `-h, --help` | Show help message |
## Examples
### Transcribe a voice message
```bash
./scripts/transcribe.sh ~/Downloads/voice_note.ogg
# Output: "Hey, just wanted to check in about the meeting tomorrow."
```
### Meeting with multiple speakers
```bash
./scripts/transcribe.sh meeting.mp3 --diarize --lang en --json
```
```json
{
"text": "Welcome everyone. Let's start with updates.",
"words": [
{"text": "Welcome", "start": 0.0, "end": 0.5, "speaker": "speaker_0"},
{"text": "everyone", "start": 0.5, "end": 1.0, "speaker": "speaker_0"}
]
}
```
### Process with jq
```bash
# Get just the text
./scripts/transcribe.sh audio.mp3 --json | jq -r '.text'
# Get word count
./scripts/transcribe.sh audio.mp3 --json | jq '.words | length'
```
## Requirements
- `curl` β for API requests
- `jq` β for JSON parsing (optional, but recommended)
- ElevenLabs API key with Speech-to-Text access
## API Key
Get your API key from [ElevenLabs](https://elevenlabs.io):
1. Sign up or log in
2. Go to Profile β API Keys
3. Create a new key or copy existing one
## License
MIT
## Links
- [ElevenLabs Speech-to-Text](https://elevenlabs.io/speech-to-text)
- [API Documentation](https://elevenlabs.io/docs/api-reference/speech-to-text)
- [Clawdbot](https://github.com/clawdbot/clawdbot)
```
### _meta.json
```json
{
"owner": "clawdbotborges",
"slug": "elevenlabs-stt",
"displayName": "ElevenLabs Speech-to-Text",
"latest": {
"version": "1.0.0",
"publishedAt": 1769436241130,
"commit": "https://github.com/clawdbot/skills/commit/fb94c3580d061df0e429a4196f869f4f60e425e6"
},
"history": []
}
```
### scripts/transcribe.sh
```bash
#!/usr/bin/env bash
set -euo pipefail
# ElevenLabs Speech-to-Text transcription script
# Usage: transcribe.sh <audio_file> [options]
show_help() {
cat << EOF
Usage: $(basename "$0") <audio_file> [options]
Options:
--diarize Enable speaker diarization
--lang CODE ISO language code (e.g., en, pt, es, fr)
--json Output full JSON response
--events Tag audio events (laughter, music, etc.)
-h, --help Show this help
Environment:
ELEVENLABS_API_KEY Required API key
Examples:
$(basename "$0") voice_note.ogg
$(basename "$0") meeting.mp3 --diarize --lang en
$(basename "$0") podcast.mp3 --json > transcript.json
EOF
exit 0
}
# Defaults
DIARIZE="false"
LANG_CODE=""
JSON_OUTPUT="false"
TAG_EVENTS="false"
FILE=""
# Parse arguments
while [[ $# -gt 0 ]]; do
case $1 in
-h|--help) show_help ;;
--diarize) DIARIZE="true"; shift ;;
--lang) LANG_CODE="$2"; shift 2 ;;
--json) JSON_OUTPUT="true"; shift ;;
--events) TAG_EVENTS="true"; shift ;;
-*) echo "Unknown option: $1" >&2; exit 1 ;;
*) FILE="$1"; shift ;;
esac
done
# Validate
if [[ -z "$FILE" ]]; then
echo "Error: No audio file specified" >&2
show_help
fi
if [[ ! -f "$FILE" ]]; then
echo "Error: File not found: $FILE" >&2
exit 1
fi
# API key (check env, then fall back to skill config)
API_KEY="${ELEVENLABS_API_KEY:-}"
if [[ -z "$API_KEY" ]]; then
echo "Error: ELEVENLABS_API_KEY not set" >&2
exit 1
fi
# Build curl command
CURL_ARGS=(
-s
-X POST
"https://api.elevenlabs.io/v1/speech-to-text"
-H "xi-api-key: $API_KEY"
-F "file=@$FILE"
-F "model_id=scribe_v2"
-F "diarize=$DIARIZE"
-F "tag_audio_events=$TAG_EVENTS"
)
if [[ -n "$LANG_CODE" ]]; then
CURL_ARGS+=(-F "language_code=$LANG_CODE")
fi
# Make request
RESPONSE=$(curl "${CURL_ARGS[@]}")
# Check for errors
if echo "$RESPONSE" | grep -q '"detail"'; then
echo "Error from API:" >&2
echo "$RESPONSE" | jq -r '.detail.message // .detail' >&2
exit 1
fi
# Output
if [[ "$JSON_OUTPUT" == "true" ]]; then
echo "$RESPONSE" | jq .
else
# Extract just the text
TEXT=$(echo "$RESPONSE" | jq -r '.text // empty')
if [[ -n "$TEXT" ]]; then
echo "$TEXT"
else
echo "$RESPONSE"
fi
fi
```