speakturbo-tts
Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install openclaw-skills-speakturbo-tts
Repository
Skill path: skills/emzod/speakturbo-tts
Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.
Open repositoryBest for
Primary workflow: Ship Full Stack.
Technical facets: Full Stack.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: openclaw.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install speakturbo-tts into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/openclaw/skills before adding speakturbo-tts to shared team environments
- Use speakturbo-tts for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: speakturbo-tts
description: Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.
---
# speakturbo - Talk to your Claude!
Give your agent the ability to speak to you real-time. Ultra-fast text-to-speech with ~90ms latency and 8 built-in voices.
## Quick Start
```bash
# Play immediately - you should hear "Hello world" through your speakers
speakturbo "Hello world"
# Output: ⚡ 92ms → ▶ 93ms → ✓ 1245ms
# Verify it's working by saving to file
speakturbo "Hello world" -o test.wav
ls -lh test.wav # Should show ~50-100KB file
```
**Output explained:** `⚡` = first audio received, `▶` = playback started, `✓` = done
## First Run
The **first execution takes 2-5 seconds** while the daemon starts and loads the model into memory. Subsequent calls are ~90ms to first sound.
```bash
# First run (slow - daemon starting)
speakturbo "Starting up" # ~2-5 seconds
# Second run (fast - daemon already running)
speakturbo "Now I'm fast" # ~90ms
```
## Usage
```bash
# Basic - plays immediately (default voice: alba)
speakturbo "Hello world"
# Save to file (no audio playback)
speakturbo "Hello" -o output.wav
# Save to specific file
speakturbo "Goodbye" -o goodbye.wav
# Quiet mode (suppress status messages, still plays audio)
speakturbo "Hello" -q
# List available voices
speakturbo --list-voices
```
## Available Voices
| Voice | Type |
|-------|------|
| `alba` | Female (default) |
| `marius` | Male |
| `javert` | Male |
| `jean` | Male |
| `fantine` | Female |
| `cosette` | Female |
| `eponine` | Female |
| `azelma` | Female |
## Performance
| Metric | Value |
|--------|-------|
| Time to first sound | ~90ms (daemon warm) |
| First run | 2-5s (daemon startup) |
| Real-time factor | ~4x faster |
| Sample rate | 24kHz mono |
## Architecture
```
speakturbo (Rust CLI, 2.2MB)
│
│ HTTP streaming (port 7125)
▼
speakturbo-daemon (Python + pocket-tts)
│
│ Model in memory, auto-shutdown after 1hr idle
▼
Audio playback (rodio)
```
## Text Input
- **Encoding:** UTF-8
- **Quotes in text:** Use escaping: `speakturbo "She said \"hello\""`
- **Long text:** Supported, streams as it generates
## Output Path Security
The `-o` flag only writes to directories that are on the allowlist. By default, these are:
- `/tmp` and system temp directories
- Your current working directory
- `~/.speakturbo/`
If you need to write elsewhere, use `--allow-dir`:
```bash
speakturbo "Hello" -o /custom/path/audio.wav --allow-dir /custom/path
```
To permanently allow a directory, add it to `~/.speakturbo/config`:
```bash
mkdir -p ~/.speakturbo && echo "/custom/path" >> ~/.speakturbo/config
```
The config file is one directory per line. Lines starting with `#` are comments.
## Exit Codes
| Code | Meaning |
|------|---------|
| 0 | Success (audio played/saved) |
| 1 | Error (daemon connection failed, invalid args) |
## When to Use
**Use speakturbo when:**
- You need instant audio feedback (~90ms)
- Speed matters more than voice variety
- Built-in voices are sufficient
**Use `speak` instead when:**
- You need custom voice cloning (Morgan Freeman, etc.)
→ `speak "text" --voice ~/.chatter/voices/morgan_freeman.wav`
- You need emotion tags like `[laugh]`, `[sigh]`
- Quality/variety matters more than speed
See the `speak` skill documentation for full usage.
## Troubleshooting
**No audio plays:**
```bash
# Check daemon is running
curl http://127.0.0.1:7125/health
# Expected: {"status":"ready","voices":["alba","marius",...]}
# Verify by saving to file and playing manually
speakturbo "test" -o /tmp/test.wav
afplay /tmp/test.wav # macOS
aplay /tmp/test.wav # Linux
```
**Daemon won't start:**
```bash
# Check port availability
lsof -i :7125
# Manually kill and restart
pkill -f "daemon_streaming"
speakturbo "test" # Auto-restarts daemon
```
**First run is slow:**
This is expected. The daemon needs to load the ~100MB model into memory. Subsequent calls will be fast (~90ms).
## Daemon Management
The daemon auto-starts on first use and **auto-shuts down after 1 hour idle**.
```bash
# Check status
curl http://127.0.0.1:7125/health
# Manual stop
pkill -f "daemon_streaming"
# View logs
cat /tmp/speakturbo.log
```
## Comparison with speak
| Feature | speakturbo | speak |
|---------|------------|-------|
| Time to first sound | ~90ms | ~4-8s |
| Voice cloning | ❌ | ✅ |
| Emotion tags | ❌ | ✅ |
| Voices | 8 built-in | Custom wav files |
| Engine | pocket-tts | Chatterbox |
---
## Skill Companion Files
> Additional files collected from the skill directory layout.
### README.md
```markdown
```
███████╗██████╗ ███████╗ █████╗ ██╗ ██╗ ████████╗██╗ ██╗██████╗ ██████╗ ██████╗
██╔════╝██╔══██╗██╔════╝██╔══██╗██║ ██╔╝ ╚══██╔══╝██║ ██║██╔══██╗██╔══██╗██╔═══██╗
███████╗██████╔╝█████╗ ███████║█████╔╝ ██║ ██║ ██║██████╔╝██████╔╝██║ ██║
╚════██║██╔═══╝ ██╔══╝ ██╔══██║██╔═██╗ ██║ ██║ ██║██╔══██╗██╔══██╗██║ ██║
███████║██║ ███████╗██║ ██║██║ ██╗ ██║ ╚██████╔╝██║ ██║██████╔╝╚██████╔╝
╚══════╝╚═╝ ╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚═╝ ╚═════╝ ╚═╝ ╚═╝╚═════╝ ╚═════╝
```
<h3 align="center">Talk to your Claude.</h3>
<p align="center">
<a href="https://speakturbo-site.vercel.app"><img src="https://img.shields.io/badge/website-speakturbo-f97316.svg" alt="Website"></a>
<a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License"></a>
<img src="https://img.shields.io/badge/latency-~90ms-brightgreen.svg" alt="Latency">
<img src="https://img.shields.io/badge/platform-Apple%20Silicon-orange.svg" alt="Platform">
</p>
<p align="center">
<strong>~90ms to first sound. Realistic. Local. Private. Fast.</strong>
</p>
<p align="center">
<code>speakturbo "Hello world"</code> → <code>⚡ 92ms → ▶ 93ms → ✓ done</code>
</p>
---
## Install
**For AI Agents** (Claude Code, Cursor, Windsurf):
```bash
npx skills add EmZod/Speak-Turbo
```
**CLI only:**
```bash
pip install pocket-tts uvicorn fastapi
cd speakturbo-cli && cargo build --release
```
---
## Usage
```bash
speakturbo "Hello world" # Play instantly
speakturbo "Hello" -o out.wav # Save to file
speakturbo "Hello" -q # Quiet mode
speakturbo --list-voices # Show voices
```
---
## Voices
```
alba ██████████ Female (default)
marius ██████████ Male
javert ██████████ Male
jean ██████████ Male
fantine ██████████ Female
cosette ██████████ Female
eponine ██████████ Female
azelma ██████████ Female
```
---
## Performance
```
Time to first sound ░░░░░░░░░░░░░░░░░░░░ ~90ms
First run (cold) ████░░░░░░░░░░░░░░░░ 2-5s
Real-time factor ████████████████░░░░ 4x faster
```
---
## Architecture
```
┌─────────────────┐
│ speakturbo │
│ (Rust, 2.2MB) │
└────────┬────────┘
│ HTTP :7125
▼
┌─────────────────┐
│ daemon │
│ (Python + MLX) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Audio Output │
│ (rodio) │
└─────────────────┘
```
---
## Troubleshooting
| Problem | Fix |
|---------|-----|
| No audio | `curl http://127.0.0.1:7125/health` |
| Daemon stuck | `pkill -f "daemon_streaming"` |
| Slow first run | Normal - model loading (2-5s) |
---
## See Also
Need voice cloning? Emotion tags? Try [**speak**](https://github.com/EmZod/speak).
---
<p align="center">
<sub>MIT License · Built on <a href="https://github.com/kyutai-labs/pocket-tts">Pocket TTS</a></sub>
</p>
```
### _meta.json
```json
{
"owner": "emzod",
"slug": "speakturbo-tts",
"displayName": "Speak Turbo - Talk to your Claude 90ms latency!",
"latest": {
"version": "1.0.7",
"publishedAt": 1771673322379,
"commit": "https://github.com/openclaw/skills/commit/fc4e5bcc1ac672f26307078830ea209d86433e62"
},
"history": []
}
```