Back to skills
SkillHub ClubAnalyze Data & AIFull StackData / AI

elevenlabs-speech

Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
3,132
Hot score
99
Updated
March 20, 2026
Overall rating
C4.0
Composite score
4.0
Best-practice grade
A92.0

Install command

npx @skill-hub/cli install openclaw-skills-elevenlabs-voice

Repository

openclaw/skills

Skill path: skills/amreahmed/elevenlabs-voice

Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.

Open repository

Best for

Primary workflow: Analyze Data & AI.

Technical facets: Full Stack, Data / AI.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install elevenlabs-speech into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/openclaw/skills before adding elevenlabs-speech to shared team environments
  • Use elevenlabs-speech for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: elevenlabs-speech
description: Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.
---

# ElevenLabs Speech

Complete voice solution — both TTS and STT using one API:
- **TTS**: Text-to-Speech (high-quality voices)
- **STT**: Speech-to-Text via Scribe (accurate transcription)

## Quick Start

### Environment Setup

Set your API key:
```bash
export ELEVENLABS_API_KEY="sk_..."
```

Or create `.env` file in workspace root.

### Text-to-Speech (TTS)

Convert text to natural-sounding speech:

```bash
python scripts/elevenlabs_speech.py tts -t "Hello world" -o greeting.mp3
```

With custom voice:
```bash
python scripts/elevenlabs_speech.py tts -t "Hello" -v "voice_id_here" -o output.mp3
```

### List Available Voices

```bash
python scripts/elevenlabs_speech.py voices
```

## Using in Code

```python
from scripts.elevenlabs_speech import ElevenLabsClient

client = ElevenLabsClient(api_key="sk_...")

# Basic TTS
result = client.text_to_speech(
    text="Hello from zerox",
    output_path="greeting.mp3"
)

# With custom settings
result = client.text_to_speech(
    text="Your text here",
    voice_id="21m00Tcm4TlvDq8ikWAM",  # Rachel
    stability=0.5,
    similarity_boost=0.75,
    output_path="output.mp3"
)

# Get available voices
voices = client.get_voices()
for voice in voices['voices']:
    print(f"{voice['name']}: {voice['voice_id']}")
```

## Popular Voices

| Voice ID | Name | Description |
|----------|------|-------------|
| `21m00Tcm4TlvDq8ikWAM` | Rachel | Natural, versatile (default) |
| `AZnzlk1XvdvUeBnXmlld` | Domi | Strong, energetic |
| `EXAVITQu4vr4xnSDxMaL` | Bella | Soft, soothing |
| `ErXwobaYiN019PkySvjV` | Antoni | Well-rounded |
| `MF3mGyEYCl7XYWbV9V6O` | Elli | Warm, friendly |
| `TxGEqnHWrfWFTfGW9XjX` | Josh | Deep, calm |
| `VR6AewLTigWG4xSOukaG` | Arnold | Authoritative |

## Voice Settings

- **stability** (0-1): Lower = more emotional, Higher = more stable
- **similarity_boost** (0-1): Higher = closer to original voice

Default: stability=0.5, similarity_boost=0.75

## Models

- `eleven_turbo_v2_5` - Fast, high quality (default)
- `eleven_multilingual_v2` - Best for non-English
- `eleven_monolingual_v1` - English only

## Integration with Telegram

When user sends text and wants voice reply:

```python
# Generate speech
result = client.text_to_speech(text=user_text, output_path="reply.mp3")

# Send via Telegram message tool with media path
message(action="send", media="path/to/reply.mp3", as_voice=True)
```

## Pricing

Check https://elevenlabs.io/pricing for current rates. Free tier available!

## Speech-to-Text (STT) with ElevenLabs Scribe

Transcribe voice messages using ElevenLabs Scribe:

### Transcribe Audio

```bash
python scripts/elevenlabs_scribe.py voice_message.ogg
```

With specific language:
```bash
python scripts/elevenlabs_scribe.py voice_message.ogg --language ara
```

With speaker diarization (multiple speakers):
```bash
python scripts/elevenlabs_scribe.py voice_message.ogg --speakers 2
```

### Using in Code

```python
from scripts.elevenlabs_scribe import ElevenLabsScribe

client = ElevenLabsScribe(api_key="sk-...")

# Basic transcription
result = client.transcribe("voice_message.ogg")
print(result['text'])

# With language hint (improves accuracy)
result = client.transcribe("voice_message.ogg", language_code="ara")

# With speaker detection
result = client.transcribe("voice_message.ogg", num_speakers=2)
```

### Supported Formats

- mp3, mp4, mpeg, mpga, m4a, wav, webm
- Max file size: 100 MB
- Works great with Telegram voice messages (`.ogg`)

### Language Support

Scribe supports 99 languages including:
- Arabic (`ara`)
- English (`eng`)
- Spanish (`spa`)
- French (`fra`)
- And many more...

Without language hint, it auto-detects.

## Complete Workflow Example

**User sends voice message → You reply with voice:**

```python
from scripts.elevenlabs_scribe import ElevenLabsScribe
from scripts.elevenlabs_speech import ElevenLabsClient

# 1. Transcribe user's voice message
stt = ElevenLabsScribe()
transcription = stt.transcribe("user_voice.ogg")
user_text = transcription['text']

# 2. Process/understand the text
# ... your logic here ...

# 3. Generate response text
response_text = "Your response here"

# 4. Convert to speech
tts = ElevenLabsClient()
tts.text_to_speech(response_text, output_path="reply.mp3")

# 5. Send voice reply
message(action="send", media="reply.mp3", as_voice=True)
```

## Pricing

Check https://elevenlabs.io/pricing for current rates:

**TTS (Text-to-Speech):**
- Free tier: 10,000 characters/month
- Paid plans available

**STT (Speech-to-Text) - Scribe:**
- Free tier available
- Check website for current pricing


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### scripts/elevenlabs_speech.py

```python
import requests
import os
import base64
from pathlib import Path
from dotenv import load_dotenv

# Load environment variables from workspace .env
load_dotenv(dotenv_path=os.path.join(os.path.dirname(__file__), '..', '..', '..', '.env'))

class ElevenLabsClient:
    """Client for ElevenLabs Text-to-Speech API"""
    
    def __init__(self, api_key=None):
        self.api_key = api_key or os.getenv('ELEVENLABS_API_KEY')
        self.base_url = "https://api.elevenlabs.io/v1"
        
    def text_to_speech(self, text, voice_id="21m00Tcm4TlvDq8ikWAM", model_id="eleven_turbo_v2_5", 
                       output_path="output.mp3", stability=0.5, similarity_boost=0.75):
        """
        Convert text to speech using ElevenLabs API
        
        Default voice: Rachel (21m00Tcm4TlvDq8ikWAM) - natural, versatile
        """
        
        url = f"{self.base_url}/text-to-speech/{voice_id}"
        
        headers = {
            "Accept": "audio/mpeg",
            "Content-Type": "application/json",
            "xi-api-key": self.api_key
        }
        
        payload = {
            "text": text,
            "model_id": model_id,
            "voice_settings": {
                "stability": stability,
                "similarity_boost": similarity_boost
            }
        }
        
        try:
            response = requests.post(url, headers=headers, json=payload, timeout=60)
            response.raise_for_status()
            
            # Save audio file
            with open(output_path, 'wb') as f:
                f.write(response.content)
            
            return {
                "success": True,
                "file_path": output_path,
                "size_bytes": len(response.content)
            }
                
        except requests.exceptions.RequestException as e:
            return {"success": False, "error": f"Request failed: {str(e)}"}
        except Exception as e:
            return {"success": False, "error": str(e)}
    
    def get_voices(self):
        """Get list of available voices"""
        
        url = f"{self.base_url}/voices"
        
        headers = {
            "Accept": "application/json",
            "xi-api-key": self.api_key
        }
        
        try:
            response = requests.get(url, headers=headers, timeout=30)
            response.raise_for_status()
            
            data = response.json()
            voices = []
            for voice in data.get('voices', []):
                voices.append({
                    'voice_id': voice['voice_id'],
                    'name': voice['name'],
                    'category': voice.get('category', 'standard'),
                    'preview_url': voice.get('preview_url', '')
                })
            
            return {"success": True, "voices": voices}
                
        except requests.exceptions.RequestException as e:
            return {"success": False, "error": f"Request failed: {str(e)}"}
        except Exception as e:
            return {"success": False, "error": str(e)}


def main():
    """CLI interface for ElevenLabs Speech"""
    import argparse
    
    parser = argparse.ArgumentParser(description='ElevenLabs Speech Client')
    parser.add_argument('action', choices=['tts', 'voices'], help='Action: tts or voices')
    parser.add_argument('--text', '-t', help='Text to convert to speech')
    parser.add_argument('--output', '-o', default='output.mp3', help='Output file path')
    parser.add_argument('--voice', '-v', default='21m00Tcm4TlvDq8ikWAM', help='Voice ID')
    
    args = parser.parse_args()
    
    client = ElevenLabsClient()
    
    if args.action == 'tts':
        if not args.text:
            print("Error: --text required for TTS")
            return
        result = client.text_to_speech(args.text, voice_id=args.voice, output_path=args.output)
        print(json.dumps(result, indent=2))
    
    elif args.action == 'voices':
        result = client.get_voices()
        print(json.dumps(result, indent=2))


if __name__ == "__main__":
    import json
    main()

```

### scripts/elevenlabs_scribe.py

```python
import requests
import os
from dotenv import load_dotenv

load_dotenv()

class ElevenLabsScribe:
    """Speech-to-Text using ElevenLabs Scribe"""
    
    def __init__(self, api_key=None):
        self.api_key = api_key or os.getenv('ELEVENLABS_API_KEY')
        self.base_url = "https://api.elevenlabs.io/v1"
    
    def transcribe(self, audio_file_path, language_code=None, tag_audio_events=True, 
                   num_speakers=None, timestamps_granularity="word"):
        """
        Transcribe audio file to text using ElevenLabs Scribe
        
        Supports: mp3, mp4, mpeg, mpga, m4a, wav, webm
        Max file size: 100 MB
        """
        url = f"{self.base_url}/speech-to-text"
        
        headers = {
            "xi-api-key": self.api_key
        }
        
        data = {
            "model_id": "scribe_v1",
            "tag_audio_events": str(tag_audio_events).lower(),
            "timestamps_granularity": timestamps_granularity
        }
        
        if language_code:
            data["language_code"] = language_code
        if num_speakers:
            data["num_speakers"] = num_speakers
        
        try:
            with open(audio_file_path, 'rb') as audio_file:
                files = {
                    'file': (os.path.basename(audio_file_path), audio_file)
                }
                
                response = requests.post(url, headers=headers, data=data, files=files, timeout=120)
                response.raise_for_status()
                
                result = response.json()
                return {
                    "success": True,
                    "text": result.get('text', ''),
                    "language": result.get('language_code', 'unknown'),
                    "words": result.get('words', []),
                    "speakers": result.get('speakers', [])
                }
                
        except requests.exceptions.RequestException as e:
            return {"success": False, "error": f"Request failed: {str(e)}"}
        except Exception as e:
            return {"success": False, "error": str(e)}


def main():
    import argparse
    
    parser = argparse.ArgumentParser(description='ElevenLabs Scribe STT')
    parser.add_argument('audio_file', help='Path to audio file')
    parser.add_argument('--language', '-l', help='Language code (e.g., ar, en)')
    parser.add_argument('--speakers', '-s', type=int, help='Number of speakers')
    
    args = parser.parse_args()
    
    client = ElevenLabsScribe()
    result = client.transcribe(args.audio_file, language_code=args.language, num_speakers=args.speakers)
    print(result)


if __name__ == "__main__":
    import json
    main()

```



---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### _meta.json

```json
{
  "owner": "amreahmed",
  "slug": "elevenlabs-voice",
  "displayName": "it will help you to send voice messages to your AI Assistant and also can make it talk",
  "latest": {
    "version": "1.0.0",
    "publishedAt": 1770060775743,
    "commit": "https://github.com/clawdbot/skills/commit/b4a2108ea7178e8ae1139f2bfc5c0c71c013a3ef"
  },
  "history": []
}

```

elevenlabs-speech | SkillHub