Back to skills
SkillHub ClubShip Full StackFull Stack

speakturbo-tts

Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
3,077
Hot score
99
Updated
March 20, 2026
Overall rating
C4.0
Composite score
4.0
Best-practice grade
C56.0

Install command

npx @skill-hub/cli install openclaw-skills-speakturbo-tts

Repository

openclaw/skills

Skill path: skills/emzod/speakturbo-tts

Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install speakturbo-tts into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/openclaw/skills before adding speakturbo-tts to shared team environments
  • Use speakturbo-tts for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: speakturbo-tts
description: Give your agent the ability to speak to you real-time. Talk to your Claude! Ultra-fast TTS, text-to-speech, voice synthesis, audio output with ~90ms latency. 8 built-in voices for instant voice responses. For voice cloning, use the speak skill.
---

# speakturbo - Talk to your Claude!

Give your agent the ability to speak to you real-time. Ultra-fast text-to-speech with ~90ms latency and 8 built-in voices.

## Quick Start

```bash
# Play immediately - you should hear "Hello world" through your speakers
speakturbo "Hello world"
# Output: ⚡ 92ms → ▶ 93ms → ✓ 1245ms

# Verify it's working by saving to file
speakturbo "Hello world" -o test.wav
ls -lh test.wav  # Should show ~50-100KB file
```

**Output explained:** `⚡` = first audio received, `▶` = playback started, `✓` = done

## First Run

The **first execution takes 2-5 seconds** while the daemon starts and loads the model into memory. Subsequent calls are ~90ms to first sound.

```bash
# First run (slow - daemon starting)
speakturbo "Starting up"  # ~2-5 seconds

# Second run (fast - daemon already running)
speakturbo "Now I'm fast"  # ~90ms
```

## Usage

```bash
# Basic - plays immediately (default voice: alba)
speakturbo "Hello world"

# Save to file (no audio playback)
speakturbo "Hello" -o output.wav

# Save to specific file
speakturbo "Goodbye" -o goodbye.wav

# Quiet mode (suppress status messages, still plays audio)
speakturbo "Hello" -q

# List available voices
speakturbo --list-voices
```

## Available Voices

| Voice | Type |
|-------|------|
| `alba` | Female (default) |
| `marius` | Male |
| `javert` | Male |
| `jean` | Male |
| `fantine` | Female |
| `cosette` | Female |
| `eponine` | Female |
| `azelma` | Female |

## Performance

| Metric | Value |
|--------|-------|
| Time to first sound | ~90ms (daemon warm) |
| First run | 2-5s (daemon startup) |
| Real-time factor | ~4x faster |
| Sample rate | 24kHz mono |

## Architecture

```
speakturbo (Rust CLI, 2.2MB)
    │
    │ HTTP streaming (port 7125)
    ▼
speakturbo-daemon (Python + pocket-tts)
    │
    │ Model in memory, auto-shutdown after 1hr idle
    ▼
Audio playback (rodio)
```

## Text Input

- **Encoding:** UTF-8
- **Quotes in text:** Use escaping: `speakturbo "She said \"hello\""`
- **Long text:** Supported, streams as it generates

## Output Path Security

The `-o` flag only writes to directories that are on the allowlist. By default, these are:

- `/tmp` and system temp directories
- Your current working directory
- `~/.speakturbo/`

If you need to write elsewhere, use `--allow-dir`:

```bash
speakturbo "Hello" -o /custom/path/audio.wav --allow-dir /custom/path
```

To permanently allow a directory, add it to `~/.speakturbo/config`:

```bash
mkdir -p ~/.speakturbo && echo "/custom/path" >> ~/.speakturbo/config
```

The config file is one directory per line. Lines starting with `#` are comments.

## Exit Codes

| Code | Meaning |
|------|---------|
| 0 | Success (audio played/saved) |
| 1 | Error (daemon connection failed, invalid args) |

## When to Use

**Use speakturbo when:**
- You need instant audio feedback (~90ms)
- Speed matters more than voice variety
- Built-in voices are sufficient

**Use `speak` instead when:**
- You need custom voice cloning (Morgan Freeman, etc.)
  → `speak "text" --voice ~/.chatter/voices/morgan_freeman.wav`
- You need emotion tags like `[laugh]`, `[sigh]`
- Quality/variety matters more than speed

See the `speak` skill documentation for full usage.

## Troubleshooting

**No audio plays:**
```bash
# Check daemon is running
curl http://127.0.0.1:7125/health
# Expected: {"status":"ready","voices":["alba","marius",...]}

# Verify by saving to file and playing manually
speakturbo "test" -o /tmp/test.wav
afplay /tmp/test.wav  # macOS
aplay /tmp/test.wav   # Linux
```

**Daemon won't start:**
```bash
# Check port availability
lsof -i :7125

# Manually kill and restart
pkill -f "daemon_streaming"
speakturbo "test"  # Auto-restarts daemon
```

**First run is slow:**
This is expected. The daemon needs to load the ~100MB model into memory. Subsequent calls will be fast (~90ms).

## Daemon Management

The daemon auto-starts on first use and **auto-shuts down after 1 hour idle**.

```bash
# Check status
curl http://127.0.0.1:7125/health

# Manual stop
pkill -f "daemon_streaming"

# View logs
cat /tmp/speakturbo.log
```

## Comparison with speak

| Feature | speakturbo | speak |
|---------|------------|-------|
| Time to first sound | ~90ms | ~4-8s |
| Voice cloning | ❌ | ✅ |
| Emotion tags | ❌ | ✅ |
| Voices | 8 built-in | Custom wav files |
| Engine | pocket-tts | Chatterbox |


---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### README.md

```markdown
```
     ███████╗██████╗ ███████╗ █████╗ ██╗  ██╗ ████████╗██╗   ██╗██████╗ ██████╗  ██████╗ 
     ██╔════╝██╔══██╗██╔════╝██╔══██╗██║ ██╔╝ ╚══██╔══╝██║   ██║██╔══██╗██╔══██╗██╔═══██╗
     ███████╗██████╔╝█████╗  ███████║█████╔╝     ██║   ██║   ██║██████╔╝██████╔╝██║   ██║
     ╚════██║██╔═══╝ ██╔══╝  ██╔══██║██╔═██╗     ██║   ██║   ██║██╔══██╗██╔══██╗██║   ██║
     ███████║██║     ███████╗██║  ██║██║  ██╗    ██║   ╚██████╔╝██║  ██║██████╔╝╚██████╔╝
     ╚══════╝╚═╝     ╚══════╝╚═╝  ╚═╝╚═╝  ╚═╝    ╚═╝    ╚═════╝ ╚═╝  ╚═╝╚═════╝  ╚═════╝ 
```

<h3 align="center">Talk to your Claude.</h3>

<p align="center">
  <a href="https://speakturbo-site.vercel.app"><img src="https://img.shields.io/badge/website-speakturbo-f97316.svg" alt="Website"></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License"></a>
  <img src="https://img.shields.io/badge/latency-~90ms-brightgreen.svg" alt="Latency">
  <img src="https://img.shields.io/badge/platform-Apple%20Silicon-orange.svg" alt="Platform">
</p>

<p align="center">
  <strong>~90ms to first sound. Realistic. Local. Private. Fast.</strong>
</p>

<p align="center">
  <code>speakturbo "Hello world"</code> → <code>⚡ 92ms → ▶ 93ms → ✓ done</code>
</p>

---

## Install

**For AI Agents** (Claude Code, Cursor, Windsurf):
```bash
npx skills add EmZod/Speak-Turbo
```

**CLI only:**
```bash
pip install pocket-tts uvicorn fastapi
cd speakturbo-cli && cargo build --release
```

---

## Usage

```bash
speakturbo "Hello world"              # Play instantly
speakturbo "Hello" -o out.wav         # Save to file
speakturbo "Hello" -q                 # Quiet mode
speakturbo --list-voices              # Show voices
```

---

## Voices

```
alba      ██████████  Female (default)
marius    ██████████  Male
javert    ██████████  Male  
jean      ██████████  Male
fantine   ██████████  Female
cosette   ██████████  Female
eponine   ██████████  Female
azelma    ██████████  Female
```

---

## Performance

```
Time to first sound    ░░░░░░░░░░░░░░░░░░░░  ~90ms
First run (cold)       ████░░░░░░░░░░░░░░░░  2-5s  
Real-time factor       ████████████████░░░░  4x faster
```

---

## Architecture

```
                    ┌─────────────────┐
                    │   speakturbo    │
                    │   (Rust, 2.2MB) │
                    └────────┬────────┘
                             │ HTTP :7125
                             ▼
                    ┌─────────────────┐
                    │     daemon      │
                    │ (Python + MLX)  │
                    └────────┬────────┘
                             │
                             ▼
                    ┌─────────────────┐
                    │  Audio Output   │
                    │    (rodio)      │
                    └─────────────────┘
```

---

## Troubleshooting

| Problem | Fix |
|---------|-----|
| No audio | `curl http://127.0.0.1:7125/health` |
| Daemon stuck | `pkill -f "daemon_streaming"` |
| Slow first run | Normal - model loading (2-5s) |

---

## See Also

Need voice cloning? Emotion tags? Try [**speak**](https://github.com/EmZod/speak).

---

<p align="center">
  <sub>MIT License · Built on <a href="https://github.com/kyutai-labs/pocket-tts">Pocket TTS</a></sub>
</p>

```

### _meta.json

```json
{
  "owner": "emzod",
  "slug": "speakturbo-tts",
  "displayName": "Speak Turbo - Talk to your Claude 90ms latency!",
  "latest": {
    "version": "1.0.7",
    "publishedAt": 1771673322379,
    "commit": "https://github.com/openclaw/skills/commit/fc4e5bcc1ac672f26307078830ea209d86433e62"
  },
  "history": []
}

```

speakturbo-tts | SkillHub