Back to skills
SkillHub ClubShip Full StackFull Stack

elevenlabs-stt

Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
3,122
Hot score
99
Updated
March 20, 2026
Overall rating
C4.0
Composite score
4.0
Best-practice grade
A85.2

Install command

npx @skill-hub/cli install openclaw-skills-elevenlabs-stt

Repository

openclaw/skills

Skill path: skills/clawdbotborges/elevenlabs-stt

Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install elevenlabs-stt into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/openclaw/skills before adding elevenlabs-stt to shared team environments
  • Use elevenlabs-stt for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: elevenlabs-stt
description: Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).
homepage: https://elevenlabs.io/speech-to-text
metadata: {"clawdbot":{"emoji":"πŸŽ™οΈ","requires":{"bins":["curl"],"env":["ELEVENLABS_API_KEY"]},"primaryEnv":"ELEVENLABS_API_KEY"}}
---

# ElevenLabs Speech-to-Text

Transcribe audio files using ElevenLabs' Scribe v2 model. Supports 90+ languages with speaker diarization.

## Quick Start

```bash
# Basic transcription
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3

# With speaker diarization
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --diarize

# Specify language (improves accuracy)
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --lang en

# Full JSON output with timestamps
{baseDir}/scripts/transcribe.sh /path/to/audio.mp3 --json
```

## Options

| Flag | Description |
|------|-------------|
| `--diarize` | Identify different speakers |
| `--lang CODE` | ISO language code (e.g., en, pt, es) |
| `--json` | Output full JSON with word timestamps |
| `--events` | Tag audio events (laughter, music, etc.) |

## Supported Formats

All major audio/video formats: mp3, m4a, wav, ogg, webm, mp4, etc.

## API Key

Set `ELEVENLABS_API_KEY` environment variable, or configure in clawdbot.json:

```json5
{
  skills: {
    entries: {
      "elevenlabs-stt": {
        apiKey: "sk_..."
      }
    }
  }
}
```

## Examples

```bash
# Transcribe a WhatsApp voice note
{baseDir}/scripts/transcribe.sh ~/Downloads/voice_note.ogg

# Meeting recording with multiple speakers
{baseDir}/scripts/transcribe.sh meeting.mp3 --diarize --lang en

# Get JSON for processing
{baseDir}/scripts/transcribe.sh podcast.mp3 --json > transcript.json
```


---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### README.md

```markdown
# πŸŽ™οΈ ElevenLabs Speech-to-Text Skill

A [Clawdbot](https://github.com/clawdbot/clawdbot) skill for transcribing audio files using ElevenLabs' Scribe v2 model.

## Features

- 🌍 **90+ languages** supported with automatic detection
- πŸ‘₯ **Speaker diarization** β€” identify different speakers
- 🎡 **Audio event tagging** β€” detect laughter, music, applause, etc.
- πŸ“ **Word-level timestamps** β€” precise timing in JSON output
- 🎧 **All major formats** β€” mp3, m4a, wav, ogg, webm, mp4, and more

## Installation

### For Clawdbot

Add to your `clawdbot.json`:

```json5
{
  skills: {
    entries: {
      "elevenlabs-stt": {
        source: "github:clawdbotborges/elevenlabs-stt",
        apiKey: "sk_your_api_key_here"
      }
    }
  }
}
```

### Standalone

```bash
git clone https://github.com/clawdbotborges/elevenlabs-stt.git
cd elevenlabs-stt
export ELEVENLABS_API_KEY="sk_your_api_key_here"
```

## Usage

```bash
# Basic transcription
./scripts/transcribe.sh audio.mp3

# With speaker diarization
./scripts/transcribe.sh meeting.mp3 --diarize

# Specify language for better accuracy
./scripts/transcribe.sh voice_note.ogg --lang en

# Full JSON with timestamps
./scripts/transcribe.sh podcast.mp3 --json

# Tag audio events (laughter, music, etc.)
./scripts/transcribe.sh recording.wav --events
```

## Options

| Flag | Description |
|------|-------------|
| `--diarize` | Enable speaker diarization |
| `--lang CODE` | ISO language code (e.g., `en`, `pt`, `es`, `fr`) |
| `--json` | Output full JSON response with word timestamps |
| `--events` | Tag audio events like laughter, music, applause |
| `-h, --help` | Show help message |

## Examples

### Transcribe a voice message

```bash
./scripts/transcribe.sh ~/Downloads/voice_note.ogg
# Output: "Hey, just wanted to check in about the meeting tomorrow."
```

### Meeting with multiple speakers

```bash
./scripts/transcribe.sh meeting.mp3 --diarize --lang en --json
```

```json
{
  "text": "Welcome everyone. Let's start with updates.",
  "words": [
    {"text": "Welcome", "start": 0.0, "end": 0.5, "speaker": "speaker_0"},
    {"text": "everyone", "start": 0.5, "end": 1.0, "speaker": "speaker_0"}
  ]
}
```

### Process with jq

```bash
# Get just the text
./scripts/transcribe.sh audio.mp3 --json | jq -r '.text'

# Get word count
./scripts/transcribe.sh audio.mp3 --json | jq '.words | length'
```

## Requirements

- `curl` β€” for API requests
- `jq` β€” for JSON parsing (optional, but recommended)
- ElevenLabs API key with Speech-to-Text access

## API Key

Get your API key from [ElevenLabs](https://elevenlabs.io):

1. Sign up or log in
2. Go to Profile β†’ API Keys
3. Create a new key or copy existing one

## License

MIT

## Links

- [ElevenLabs Speech-to-Text](https://elevenlabs.io/speech-to-text)
- [API Documentation](https://elevenlabs.io/docs/api-reference/speech-to-text)
- [Clawdbot](https://github.com/clawdbot/clawdbot)

```

### _meta.json

```json
{
  "owner": "clawdbotborges",
  "slug": "elevenlabs-stt",
  "displayName": "ElevenLabs Speech-to-Text",
  "latest": {
    "version": "1.0.0",
    "publishedAt": 1769436241130,
    "commit": "https://github.com/clawdbot/skills/commit/fb94c3580d061df0e429a4196f869f4f60e425e6"
  },
  "history": []
}

```

### scripts/transcribe.sh

```bash
#!/usr/bin/env bash
set -euo pipefail

# ElevenLabs Speech-to-Text transcription script
# Usage: transcribe.sh <audio_file> [options]

show_help() {
    cat << EOF
Usage: $(basename "$0") <audio_file> [options]

Options:
  --diarize     Enable speaker diarization
  --lang CODE   ISO language code (e.g., en, pt, es, fr)
  --json        Output full JSON response
  --events      Tag audio events (laughter, music, etc.)
  -h, --help    Show this help

Environment:
  ELEVENLABS_API_KEY  Required API key

Examples:
  $(basename "$0") voice_note.ogg
  $(basename "$0") meeting.mp3 --diarize --lang en
  $(basename "$0") podcast.mp3 --json > transcript.json
EOF
    exit 0
}

# Defaults
DIARIZE="false"
LANG_CODE=""
JSON_OUTPUT="false"
TAG_EVENTS="false"
FILE=""

# Parse arguments
while [[ $# -gt 0 ]]; do
    case $1 in
        -h|--help) show_help ;;
        --diarize) DIARIZE="true"; shift ;;
        --lang) LANG_CODE="$2"; shift 2 ;;
        --json) JSON_OUTPUT="true"; shift ;;
        --events) TAG_EVENTS="true"; shift ;;
        -*) echo "Unknown option: $1" >&2; exit 1 ;;
        *) FILE="$1"; shift ;;
    esac
done

# Validate
if [[ -z "$FILE" ]]; then
    echo "Error: No audio file specified" >&2
    show_help
fi

if [[ ! -f "$FILE" ]]; then
    echo "Error: File not found: $FILE" >&2
    exit 1
fi

# API key (check env, then fall back to skill config)
API_KEY="${ELEVENLABS_API_KEY:-}"
if [[ -z "$API_KEY" ]]; then
    echo "Error: ELEVENLABS_API_KEY not set" >&2
    exit 1
fi

# Build curl command
CURL_ARGS=(
    -s
    -X POST
    "https://api.elevenlabs.io/v1/speech-to-text"
    -H "xi-api-key: $API_KEY"
    -F "file=@$FILE"
    -F "model_id=scribe_v2"
    -F "diarize=$DIARIZE"
    -F "tag_audio_events=$TAG_EVENTS"
)

if [[ -n "$LANG_CODE" ]]; then
    CURL_ARGS+=(-F "language_code=$LANG_CODE")
fi

# Make request
RESPONSE=$(curl "${CURL_ARGS[@]}")

# Check for errors
if echo "$RESPONSE" | grep -q '"detail"'; then
    echo "Error from API:" >&2
    echo "$RESPONSE" | jq -r '.detail.message // .detail' >&2
    exit 1
fi

# Output
if [[ "$JSON_OUTPUT" == "true" ]]; then
    echo "$RESPONSE" | jq .
else
    # Extract just the text
    TEXT=$(echo "$RESPONSE" | jq -r '.text // empty')
    if [[ -n "$TEXT" ]]; then
        echo "$TEXT"
    else
        echo "$RESPONSE"
    fi
fi

```

elevenlabs-stt | SkillHub