elevenlabs-speech
Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install openclaw-skills-elevenlabs-voice
Repository
Skill path: skills/amreahmed/elevenlabs-voice
Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.
Open repositoryBest for
Primary workflow: Analyze Data & AI.
Technical facets: Full Stack, Data / AI.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: openclaw.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install elevenlabs-speech into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/openclaw/skills before adding elevenlabs-speech to shared team environments
- Use elevenlabs-speech for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: elevenlabs-speech
description: Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.
---
# ElevenLabs Speech
Complete voice solution — both TTS and STT using one API:
- **TTS**: Text-to-Speech (high-quality voices)
- **STT**: Speech-to-Text via Scribe (accurate transcription)
## Quick Start
### Environment Setup
Set your API key:
```bash
export ELEVENLABS_API_KEY="sk_..."
```
Or create `.env` file in workspace root.
### Text-to-Speech (TTS)
Convert text to natural-sounding speech:
```bash
python scripts/elevenlabs_speech.py tts -t "Hello world" -o greeting.mp3
```
With custom voice:
```bash
python scripts/elevenlabs_speech.py tts -t "Hello" -v "voice_id_here" -o output.mp3
```
### List Available Voices
```bash
python scripts/elevenlabs_speech.py voices
```
## Using in Code
```python
from scripts.elevenlabs_speech import ElevenLabsClient
client = ElevenLabsClient(api_key="sk_...")
# Basic TTS
result = client.text_to_speech(
text="Hello from zerox",
output_path="greeting.mp3"
)
# With custom settings
result = client.text_to_speech(
text="Your text here",
voice_id="21m00Tcm4TlvDq8ikWAM", # Rachel
stability=0.5,
similarity_boost=0.75,
output_path="output.mp3"
)
# Get available voices
voices = client.get_voices()
for voice in voices['voices']:
print(f"{voice['name']}: {voice['voice_id']}")
```
## Popular Voices
| Voice ID | Name | Description |
|----------|------|-------------|
| `21m00Tcm4TlvDq8ikWAM` | Rachel | Natural, versatile (default) |
| `AZnzlk1XvdvUeBnXmlld` | Domi | Strong, energetic |
| `EXAVITQu4vr4xnSDxMaL` | Bella | Soft, soothing |
| `ErXwobaYiN019PkySvjV` | Antoni | Well-rounded |
| `MF3mGyEYCl7XYWbV9V6O` | Elli | Warm, friendly |
| `TxGEqnHWrfWFTfGW9XjX` | Josh | Deep, calm |
| `VR6AewLTigWG4xSOukaG` | Arnold | Authoritative |
## Voice Settings
- **stability** (0-1): Lower = more emotional, Higher = more stable
- **similarity_boost** (0-1): Higher = closer to original voice
Default: stability=0.5, similarity_boost=0.75
## Models
- `eleven_turbo_v2_5` - Fast, high quality (default)
- `eleven_multilingual_v2` - Best for non-English
- `eleven_monolingual_v1` - English only
## Integration with Telegram
When user sends text and wants voice reply:
```python
# Generate speech
result = client.text_to_speech(text=user_text, output_path="reply.mp3")
# Send via Telegram message tool with media path
message(action="send", media="path/to/reply.mp3", as_voice=True)
```
## Pricing
Check https://elevenlabs.io/pricing for current rates. Free tier available!
## Speech-to-Text (STT) with ElevenLabs Scribe
Transcribe voice messages using ElevenLabs Scribe:
### Transcribe Audio
```bash
python scripts/elevenlabs_scribe.py voice_message.ogg
```
With specific language:
```bash
python scripts/elevenlabs_scribe.py voice_message.ogg --language ara
```
With speaker diarization (multiple speakers):
```bash
python scripts/elevenlabs_scribe.py voice_message.ogg --speakers 2
```
### Using in Code
```python
from scripts.elevenlabs_scribe import ElevenLabsScribe
client = ElevenLabsScribe(api_key="sk-...")
# Basic transcription
result = client.transcribe("voice_message.ogg")
print(result['text'])
# With language hint (improves accuracy)
result = client.transcribe("voice_message.ogg", language_code="ara")
# With speaker detection
result = client.transcribe("voice_message.ogg", num_speakers=2)
```
### Supported Formats
- mp3, mp4, mpeg, mpga, m4a, wav, webm
- Max file size: 100 MB
- Works great with Telegram voice messages (`.ogg`)
### Language Support
Scribe supports 99 languages including:
- Arabic (`ara`)
- English (`eng`)
- Spanish (`spa`)
- French (`fra`)
- And many more...
Without language hint, it auto-detects.
## Complete Workflow Example
**User sends voice message → You reply with voice:**
```python
from scripts.elevenlabs_scribe import ElevenLabsScribe
from scripts.elevenlabs_speech import ElevenLabsClient
# 1. Transcribe user's voice message
stt = ElevenLabsScribe()
transcription = stt.transcribe("user_voice.ogg")
user_text = transcription['text']
# 2. Process/understand the text
# ... your logic here ...
# 3. Generate response text
response_text = "Your response here"
# 4. Convert to speech
tts = ElevenLabsClient()
tts.text_to_speech(response_text, output_path="reply.mp3")
# 5. Send voice reply
message(action="send", media="reply.mp3", as_voice=True)
```
## Pricing
Check https://elevenlabs.io/pricing for current rates:
**TTS (Text-to-Speech):**
- Free tier: 10,000 characters/month
- Paid plans available
**STT (Speech-to-Text) - Scribe:**
- Free tier available
- Check website for current pricing
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### scripts/elevenlabs_speech.py
```python
import requests
import os
import base64
from pathlib import Path
from dotenv import load_dotenv
# Load environment variables from workspace .env
load_dotenv(dotenv_path=os.path.join(os.path.dirname(__file__), '..', '..', '..', '.env'))
class ElevenLabsClient:
"""Client for ElevenLabs Text-to-Speech API"""
def __init__(self, api_key=None):
self.api_key = api_key or os.getenv('ELEVENLABS_API_KEY')
self.base_url = "https://api.elevenlabs.io/v1"
def text_to_speech(self, text, voice_id="21m00Tcm4TlvDq8ikWAM", model_id="eleven_turbo_v2_5",
output_path="output.mp3", stability=0.5, similarity_boost=0.75):
"""
Convert text to speech using ElevenLabs API
Default voice: Rachel (21m00Tcm4TlvDq8ikWAM) - natural, versatile
"""
url = f"{self.base_url}/text-to-speech/{voice_id}"
headers = {
"Accept": "audio/mpeg",
"Content-Type": "application/json",
"xi-api-key": self.api_key
}
payload = {
"text": text,
"model_id": model_id,
"voice_settings": {
"stability": stability,
"similarity_boost": similarity_boost
}
}
try:
response = requests.post(url, headers=headers, json=payload, timeout=60)
response.raise_for_status()
# Save audio file
with open(output_path, 'wb') as f:
f.write(response.content)
return {
"success": True,
"file_path": output_path,
"size_bytes": len(response.content)
}
except requests.exceptions.RequestException as e:
return {"success": False, "error": f"Request failed: {str(e)}"}
except Exception as e:
return {"success": False, "error": str(e)}
def get_voices(self):
"""Get list of available voices"""
url = f"{self.base_url}/voices"
headers = {
"Accept": "application/json",
"xi-api-key": self.api_key
}
try:
response = requests.get(url, headers=headers, timeout=30)
response.raise_for_status()
data = response.json()
voices = []
for voice in data.get('voices', []):
voices.append({
'voice_id': voice['voice_id'],
'name': voice['name'],
'category': voice.get('category', 'standard'),
'preview_url': voice.get('preview_url', '')
})
return {"success": True, "voices": voices}
except requests.exceptions.RequestException as e:
return {"success": False, "error": f"Request failed: {str(e)}"}
except Exception as e:
return {"success": False, "error": str(e)}
def main():
"""CLI interface for ElevenLabs Speech"""
import argparse
parser = argparse.ArgumentParser(description='ElevenLabs Speech Client')
parser.add_argument('action', choices=['tts', 'voices'], help='Action: tts or voices')
parser.add_argument('--text', '-t', help='Text to convert to speech')
parser.add_argument('--output', '-o', default='output.mp3', help='Output file path')
parser.add_argument('--voice', '-v', default='21m00Tcm4TlvDq8ikWAM', help='Voice ID')
args = parser.parse_args()
client = ElevenLabsClient()
if args.action == 'tts':
if not args.text:
print("Error: --text required for TTS")
return
result = client.text_to_speech(args.text, voice_id=args.voice, output_path=args.output)
print(json.dumps(result, indent=2))
elif args.action == 'voices':
result = client.get_voices()
print(json.dumps(result, indent=2))
if __name__ == "__main__":
import json
main()
```
### scripts/elevenlabs_scribe.py
```python
import requests
import os
from dotenv import load_dotenv
load_dotenv()
class ElevenLabsScribe:
"""Speech-to-Text using ElevenLabs Scribe"""
def __init__(self, api_key=None):
self.api_key = api_key or os.getenv('ELEVENLABS_API_KEY')
self.base_url = "https://api.elevenlabs.io/v1"
def transcribe(self, audio_file_path, language_code=None, tag_audio_events=True,
num_speakers=None, timestamps_granularity="word"):
"""
Transcribe audio file to text using ElevenLabs Scribe
Supports: mp3, mp4, mpeg, mpga, m4a, wav, webm
Max file size: 100 MB
"""
url = f"{self.base_url}/speech-to-text"
headers = {
"xi-api-key": self.api_key
}
data = {
"model_id": "scribe_v1",
"tag_audio_events": str(tag_audio_events).lower(),
"timestamps_granularity": timestamps_granularity
}
if language_code:
data["language_code"] = language_code
if num_speakers:
data["num_speakers"] = num_speakers
try:
with open(audio_file_path, 'rb') as audio_file:
files = {
'file': (os.path.basename(audio_file_path), audio_file)
}
response = requests.post(url, headers=headers, data=data, files=files, timeout=120)
response.raise_for_status()
result = response.json()
return {
"success": True,
"text": result.get('text', ''),
"language": result.get('language_code', 'unknown'),
"words": result.get('words', []),
"speakers": result.get('speakers', [])
}
except requests.exceptions.RequestException as e:
return {"success": False, "error": f"Request failed: {str(e)}"}
except Exception as e:
return {"success": False, "error": str(e)}
def main():
import argparse
parser = argparse.ArgumentParser(description='ElevenLabs Scribe STT')
parser.add_argument('audio_file', help='Path to audio file')
parser.add_argument('--language', '-l', help='Language code (e.g., ar, en)')
parser.add_argument('--speakers', '-s', type=int, help='Number of speakers')
args = parser.parse_args()
client = ElevenLabsScribe()
result = client.transcribe(args.audio_file, language_code=args.language, num_speakers=args.speakers)
print(result)
if __name__ == "__main__":
import json
main()
```
---
## Skill Companion Files
> Additional files collected from the skill directory layout.
### _meta.json
```json
{
"owner": "amreahmed",
"slug": "elevenlabs-voice",
"displayName": "it will help you to send voice messages to your AI Assistant and also can make it talk",
"latest": {
"version": "1.0.0",
"publishedAt": 1770060775743,
"commit": "https://github.com/clawdbot/skills/commit/b4a2108ea7178e8ae1139f2bfc5c0c71c013a3ef"
},
"history": []
}
```