Back to skills
SkillHub ClubAnalyze Data & AIFull StackData / AI

dub-youtube-with-voiceai

Dub YouTube videos with Voice.ai TTS. Turn scripts into publish-ready voiceovers with chapters, captions, and audio replacement for YouTube long-form and Shorts.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
3,087
Hot score
99
Updated
March 20, 2026
Overall rating
C4.0
Composite score
4.0
Best-practice grade
C67.6

Install command

npx @skill-hub/cli install openclaw-skills-dub-youtube-with-voiceai

Repository

openclaw/skills

Skill path: skills/gizmogremlin/dub-youtube-with-voiceai

Dub YouTube videos with Voice.ai TTS. Turn scripts into publish-ready voiceovers with chapters, captions, and audio replacement for YouTube long-form and Shorts.

Open repository

Best for

Primary workflow: Analyze Data & AI.

Technical facets: Full Stack, Data / AI.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install dub-youtube-with-voiceai into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/openclaw/skills before adding dub-youtube-with-voiceai to shared team environments
  • Use dub-youtube-with-voiceai for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: dub-youtube-with-voiceai
description: "Dub YouTube videos with Voice.ai TTS. Turn scripts into publish-ready voiceovers with chapters, captions, and audio replacement for YouTube long-form and Shorts."
version: 0.1.2
env:
  - VOICE_AI_API_KEY
required_env:
  - VOICE_AI_API_KEY
credentials:
  - VOICE_AI_API_KEY
setup: "none — single file, runs directly with Node.js"
runtime: "node>=20"
optional_deps: "ffmpeg"
---

# Dub YouTube with Voice.ai

> This skill follows the [Agent Skills specification](https://agentskills.io/specification).

Turn any script into a **YouTube-ready voiceover** — complete with numbered segments, a stitched master, chapter timestamps, SRT captions, and a review page. Drop the voiceover onto an existing video to dub it in one command.

Built for YouTube creators who want studio-quality narration without the studio. Powered by [Voice.ai](https://voice.ai).

---

## When to use this skill

| Scenario | Why it fits |
|---|---|
| **YouTube long-form** | Full narration with chapter markers and captions |
| **YouTube Shorts** | Quick hooks with punchy delivery |
| **Course content** | Professional narration for educational videos |
| **Screen recordings** | Dub a screencast with clean AI voiceover |
| **Quick iteration** | Smart caching — edit one section, only that segment re-renders |
| **Batch production** | Same voice, consistent quality across every video |

---

## The one-command workflow

Have a script and a video? Dub it in one shot:

```bash
node voiceai-vo.cjs build \
  --input my-script.md \
  --voice oliver \
  --title "My YouTube Video" \
  --video ./my-recording.mp4 \
  --mux \
  --template youtube
```

This renders the voiceover, stitches the master audio, and drops it onto your video — all in one command. Output:

- `out/my-youtube-video/muxed.mp4` — your video dubbed with the AI voiceover
- `out/my-youtube-video/master.wav` — the standalone audio
- `out/my-youtube-video/review.html` — listen and review each segment
- `out/my-youtube-video/chapters.txt` — paste directly into your YouTube description
- `out/my-youtube-video/captions.srt` — upload to YouTube as subtitles
- `out/my-youtube-video/description.txt` — ready-made YouTube description with chapters

Use `--sync pad` if the audio is shorter than the video, or `--sync trim` to cut it to match.

---

## Requirements

- **Node.js 20+** — runtime (no npm install needed — the CLI is a single bundled file)
- **VOICE_AI_API_KEY** — set as environment variable or in a `.env` file in the skill root. Get a key at [voice.ai/dashboard](https://voice.ai/dashboard).
- **ffmpeg** (optional) — needed for master stitching, MP3 encoding, loudness normalization, and video dubbing. The pipeline still produces individual segments, the review page, chapters, and captions without it.

---

## Configuration

Set `VOICE_AI_API_KEY` as an environment variable before running:

```bash
export VOICE_AI_API_KEY=your-key-here
```

The skill does not read `.env` files or access any files for credentials — only the environment variable.

Use `--mock` on any command to run the full pipeline without an API key (produces placeholder audio).

---

## Commands

### `build` — Generate a YouTube voiceover from a script

```bash
node voiceai-vo.cjs build \
  --input <script.md or script.txt> \
  --voice <voice-alias-or-uuid> \
  --title "My YouTube Video" \
  [--template youtube] \
  [--video input.mp4 --mux --sync shortest] \
  [--force] [--mock]
```

**What it does:**

1. Reads the script and splits it into segments (by `##` headings for `.md`, or by sentence boundaries for `.txt`)
2. Optionally prepends/appends YouTube intro/outro segments
3. Renders each segment via Voice.ai TTS
4. Stitches a master audio file (if ffmpeg is available)
5. Generates YouTube chapters, SRT captions, a review page, and a ready-made description
6. Optionally dubs your video with the voiceover

**Full options:**

| Option | Description |
|---|---|
| `-i, --input <path>` | Script file (.txt or .md) — **required** |
| `-v, --voice <id>` | Voice alias or UUID — **required** |
| `-t, --title <title>` | Video title (defaults to filename) |
| `--template youtube` | Auto-inject YouTube intro/outro |
| `--mode <mode>` | `headings` or `auto` (default: headings for .md) |
| `--max-chars <n>` | Max characters per auto-chunk (default: 1500) |
| `--language <code>` | Language code (default: en) |
| `--video <path>` | Input video to dub |
| `--mux` | Enable video dubbing (requires --video) |
| `--sync <policy>` | `shortest`, `pad`, or `trim` (default: shortest) |
| `--force` | Re-render all segments (ignore cache) |
| `--mock` | Mock mode — no API calls, placeholder audio |
| `-o, --out <dir>` | Custom output directory |

### `replace-audio` — Dub an existing video

```bash
node voiceai-vo.cjs replace-audio \
  --video ./my-video.mp4 \
  --audio ./out/my-video/master.wav \
  [--out ./out/my-video/dubbed.mp4] \
  [--sync shortest|pad|trim]
```

Requires ffmpeg. If not installed, generates helper shell/PowerShell scripts instead.

| Sync policy | Behavior |
|---|---|
| `shortest` (default) | Output ends when the shorter track ends |
| `pad` | Pad audio with silence to match video duration |
| `trim` | Trim audio to match video duration |

Video stream is copied without re-encoding (`-c:v copy`). Audio is encoded as AAC for YouTube compatibility.

**Privacy:** Video processing is entirely local. Only script text is sent to Voice.ai for TTS. Your video files never leave your machine.

### `voices` — List available voices

```bash
node voiceai-vo.cjs voices [--limit 20] [--query "deep"] [--mock]
```

---

## Available voices

Use short aliases or full UUIDs with `--voice`:

| Alias    | Voice                | Gender | Best for YouTube                  |
|----------|----------------------|--------|-----------------------------------|
| `ellie`  | Ellie                | F      | Vlogs, lifestyle, social content  |
| `oliver` | Oliver               | M      | Tutorials, narration, explainers  |
| `lilith` | Lilith               | F      | ASMR, calm walkthroughs           |
| `smooth` | Smooth Calm Voice    | M      | Documentaries, long-form essays   |
| `corpse` | Corpse Husband       | M      | Gaming, entertainment             |
| `skadi`  | Skadi                | F      | Anime, character content          |
| `zhongli`| Zhongli              | M      | Gaming, dramatic intros           |
| `flora`  | Flora                | F      | Kids content, upbeat videos       |
| `chief`  | Master Chief         | M      | Gaming, action trailers           |

The `voices` command also returns any additional voices available on the API. Voice list is cached for 10 minutes.

---

## Build outputs

After a build, the output directory contains everything you need to publish on YouTube:

```
out/<title-slug>/
  segments/           # Numbered WAV files (001-intro.wav, 002-section.wav, …)
  master.wav          # Stitched voiceover (requires ffmpeg)
  master.mp3          # MP3 for upload (requires ffmpeg)
  muxed.mp4           # Dubbed video (if --video --mux used)
  chapters.txt        # Paste into YouTube description
  captions.srt        # Upload as YouTube subtitles
  description.txt     # Ready-made YouTube description with chapters
  review.html         # Interactive review page with audio players
  manifest.json       # Build metadata: voice, template, segment list
  timeline.json       # Segment durations and start times
```

### YouTube workflow

1. Run the build command
2. Upload `muxed.mp4` (or your original video + `master.mp3` as audio)
3. Paste `chapters.txt` content into your YouTube description
4. Upload `captions.srt` as subtitles in YouTube Studio
5. Done — professional narration, chapters, and captions in minutes

---

## YouTube template

Use `--template youtube` to auto-inject a branded intro and outro:

| Segment | Source file |
|---|---|
| Intro (prepended) | `templates/youtube_intro.txt` |
| Outro (appended) | `templates/youtube_outro.txt` |

Edit the files in `templates/` to customize your channel's branding.

---

## Caching

Segments are cached by a hash of: `text content + voice ID + language`.

- Unchanged segments are **skipped** on rebuild — fast iteration
- Modified segments are **re-rendered** automatically
- Use `--force` to re-render everything
- Cache manifest is stored in `segments/.cache.json`

---

## Multilingual dubbing

Voice.ai supports 11 languages — dub your YouTube videos for global audiences:

`en`, `es`, `fr`, `de`, `it`, `pt`, `pl`, `ru`, `nl`, `sv`, `ca`

```bash
node voiceai-vo.cjs build \
  --input script-spanish.md \
  --voice ellie \
  --title "Mi Video" \
  --language es \
  --video ./my-video.mp4 \
  --mux
```

The pipeline auto-selects the multilingual TTS model for non-English languages.

---

## Troubleshooting

| Issue | Solution |
|---|---|
| **ffmpeg missing** | Pipeline still works — you get segments, review page, chapters, captions. Install ffmpeg for stitching and video dubbing. |
| **Rate limits (429)** | Segments render sequentially, which stays under most limits. Wait and retry. |
| **Insufficient credits (402)** | Top up at [voice.ai/dashboard](https://voice.ai/dashboard). Cached segments won't re-use credits on retry. |
| **Long scripts** | Caching makes rebuilds fast. Text over 490 chars per segment is automatically split across API calls. |
| **Windows paths** | Wrap paths with spaces in quotes: `--input "C:\My Scripts\script.md"` |

See [`references/TROUBLESHOOTING.md`](references/TROUBLESHOOTING.md) for more.

---

## References

- [Agent Skills Specification](https://agentskills.io/specification)
- [Voice.ai](https://voice.ai)
- [`references/VOICEAI_API.md`](references/VOICEAI_API.md) — API endpoints, audio formats, models
- [`references/TROUBLESHOOTING.md`](references/TROUBLESHOOTING.md) — Common issues and fixes


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### references/TROUBLESHOOTING.md

```markdown
# Troubleshooting

Common issues and solutions for the voiceai-creator-voiceover-pipeline.

---

## FFmpeg not found

**Symptom:** Warning about skipping master stitch or video muxing.

**Solution:** Install ffmpeg for your platform:

```bash
# macOS
brew install ffmpeg

# Ubuntu / Debian
sudo apt update && sudo apt install ffmpeg

# Windows (Chocolatey)
choco install ffmpeg

# Windows (winget)
winget install ffmpeg

# Windows (manual)
# Download from https://ffmpeg.org/download.html
# Add the bin/ folder to your PATH
```

After installing, verify:
```bash
ffmpeg -version
ffprobe -version
```

> **Note:** The pipeline still works without ffmpeg — you get individual segments, the review page, chapters, and captions. Only master stitching and video muxing require ffmpeg.

---

## Rate limits / API errors

**Symptom:** `Voice.ai TTS error 429` or similar.

**Solutions:**
- Wait a few minutes and retry
- Use `--mock` mode for pipeline testing
- For large scripts, the pipeline renders segments sequentially — this naturally stays under most rate limits
- If you hit limits frequently, consider adding retry logic in `src/api.ts` (a TODO for production use)

---

## Long scripts

**Symptom:** Script has many segments and takes a long time.

**Solutions:**
- Segments are cached by content hash — subsequent builds only re-render changed segments
- Use `--max-chars` to control segment size (larger = fewer API calls, but longer per-segment)
- Split very long scripts into separate build runs
- Use `--mock` mode for testing the pipeline before using real API credits

---

## Windows path quoting

**Symptom:** Paths with spaces cause errors.

**Solutions:**
```powershell
# PowerShell — wrap in quotes
voiceai-vo build --input "C:\Users\Me\My Scripts\script.md" --voice v-warm-narrator --mock

# Or use short paths / avoid spaces
voiceai-vo build --input C:\scripts\video.md --voice v-warm-narrator --mock
```

---

## "Real API not yet configured" error

**Symptom:** Error when running without `--mock` and without setting up real API endpoints.

**Explanation:** The current release uses placeholder API endpoints. The full pipeline works in `--mock` mode.

**Solutions:**
- Always use `--mock` flag until real API endpoints are configured
- See `references/VOICEAI_API.md` for what endpoints to fill in
- Contact the Voice.ai team for production API access

---

## Audio playback issues in review.html

**Symptom:** Audio players don't work in review.html.

**Solutions:**
- Open `review.html` directly in your browser (file:// protocol works)
- Make sure the `segments/` folder is next to `review.html`
- Some browsers block file:// audio — try a local server:
  ```bash
  cd out/my-project/
  python3 -m http.server 8000
  # Then open http://localhost:8000/review.html
  ```

---

## Segment caching

**How it works:**
- Each segment is hashed based on: text content + voice ID + language
- Hashes are stored in `segments/.cache.json`
- On rebuild, unchanged segments are skipped
- Use `--force` to regenerate everything

**To clear cache manually:**
```bash
rm out/my-project/segments/.cache.json
```

```

### references/VOICEAI_API.md

```markdown
# Voice.ai API Reference

Production API endpoints used by this pipeline.
Based on the [Voice.ai TTS SDK](https://github.com/gizmoGremlin/openclaw-skill-voice-ai-voices).

## Authentication

Bearer token in the `Authorization` header:

```
Authorization: Bearer <VOICE_AI_API_KEY>
```

Get your key at [voice.ai/dashboard](https://voice.ai/dashboard).

## Base URL

```
https://dev.voice.ai
```

Override via `VOICEAI_API_BASE` environment variable.

API path prefix: `/api/v1`

---

## Endpoints

### `GET /api/v1/tts/voices` — List available voices

**Query Parameters:**

| Param        | Type    | Default | Description                |
|--------------|---------|---------|----------------------------|
| `limit`      | integer | 10      | Max voices to return       |
| `offset`     | integer | 0       | Pagination offset          |
| `visibility` | string  | —       | `PUBLIC` or `PRIVATE`      |

**Response:**

```json
{
  "voices": [
    {
      "voice_id": "d1bf0f33-8e0e-4fbf-acf8-45c3c6262513",
      "name": "Ellie",
      "language": "en",
      "visibility": "PUBLIC",
      "status": "AVAILABLE"
    }
  ]
}
```

### `POST /api/v1/tts/speech` — Generate speech

**Request Body:**

```json
{
  "text": "Hello, world!",
  "voice_id": "d1bf0f33-8e0e-4fbf-acf8-45c3c6262513",
  "audio_format": "wav",
  "temperature": 1.0,
  "top_p": 0.8,
  "model": "voiceai-tts-v1-latest",
  "language": "en"
}
```

| Field          | Type   | Required | Default                     | Description                         |
|----------------|--------|----------|-----------------------------|-------------------------------------|
| `text`         | string | ✅       | —                           | Text to synthesize (max 5000 chars) |
| `voice_id`     | string | No       | built-in default            | Voice UUID or omit for default      |
| `audio_format` | string | No       | `mp3`                       | `mp3`, `wav`, `pcm`, etc.           |
| `temperature`  | number | No       | 1.0                         | Variation (0.0–2.0)                 |
| `top_p`        | number | No       | 0.8                         | Nucleus sampling (0.0–1.0)          |
| `model`        | string | No       | auto-selected by language   | `voiceai-tts-v1-latest` or `voiceai-tts-multilingual-v1-latest` |
| `language`     | string | No       | `en`                        | ISO 639-1 code                      |

**Response:** Binary audio data in the requested format.

### `POST /api/v1/tts/speech/stream` — Streaming speech

Same body as `/tts/speech`. Returns chunked transfer-encoded audio for low-latency playback.

---

## Audio Formats

| Format            | Description                |
|-------------------|----------------------------|
| `mp3`             | MP3 at 32kHz (default)     |
| `wav`             | WAV at 32kHz               |
| `pcm`             | Raw PCM 16-bit             |
| `mp3_44100_128`   | MP3 44.1kHz 128kbps        |
| `mp3_44100_192`   | MP3 44.1kHz 192kbps        |
| `wav_22050`       | WAV 22.05kHz               |
| `wav_24000`       | WAV 24kHz                  |
| `opus_48000_128`  | Opus 48kHz 128kbps         |

## Models

| Model ID                                    | Languages |
|---------------------------------------------|-----------|
| `voiceai-tts-v1-latest`                     | English   |
| `voiceai-tts-multilingual-v1-latest`        | 11 langs  |

Multilingual model supports: en, es, fr, de, it, pt, pl, ru, nl, sv, ca.

## Popular Voices

| Alias    | Voice ID (UUID)                              | Gender | Style                    |
|----------|----------------------------------------------|--------|--------------------------|
| `ellie`  | `d1bf0f33-8e0e-4fbf-acf8-45c3c6262513`      | F      | Youthful, vibrant vlogger|
| `oliver` | `f9e6a5eb-a7fd-4525-9e92-75125249c933`      | M      | Friendly British         |
| `lilith` | `4388040c-8812-42f4-a264-f457a6b2b5b9`      | F      | Soft, feminine           |
| `smooth` | `dbb271df-db25-4225-abb0-5200ba1426bc`      | M      | Deep, smooth narrator    |
| `corpse` | `72d2a864-b236-402e-a166-a838ccc2c273`      | M      | Deep, distinctive        |
| `skadi`  | `559d3b72-3e79-4f11-9b62-9ec702a6c057`      | F      | Anime character          |
| `zhongli`| `ed751d4d-e633-4bb0-8f5e-b5c8ddb04402`      | M      | Deep, authoritative      |
| `flora`  | `a931a6af-fb01-42f0-a8c0-bd14bc302bb1`      | F      | Cheerful, high pitch     |
| `chief`  | `bd35e4e6-6283-46b9-86b6-7cfa3dd409b9`      | M      | Heroic, commanding       |

The CLI accepts both aliases (`--voice ellie`) and full UUIDs (`--voice d1bf0f33-...`).

---

## Error Codes

| Code | Meaning            | Action                           |
|------|--------------------|----------------------------------|
| 401  | Invalid API key    | Check `VOICE_AI_API_KEY`         |
| 402  | Out of credits     | Top up at voice.ai/dashboard     |
| 422  | Validation error   | Check text length, voice_id      |
| 429  | Rate limited       | Wait and retry                   |

---

## Mock Mode

When `--mock` is passed, the pipeline runs end-to-end without any network calls:

- Voice listing returns the 9 popular voices above
- TTS returns generated WAV files with an audible test tone
- No API key required
- All output files (review.html, chapters, captions, etc.) are produced identically

```

### templates/youtube_intro.txt

```text
Hey there! Welcome to the video. Before we jump in, make sure you hit that subscribe button and turn on notifications — you won't want to miss what's coming up. Alright, let's get into it.
```

### templates/youtube_outro.txt

```text
And that's a wrap! If you found this valuable, drop a like and leave a comment letting me know your biggest takeaway. I read every single one. Don't forget to subscribe, and I'll see you in the next video. Peace!
```



---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### README.md

```markdown
# Dub YouTube with Voice.ai

> Dub YouTube videos with AI voiceovers — chapters, captions, and audio replacement in one command.

**📖 Skill documentation: [SKILL.md](SKILL.md)**

## Quick start

Requires Node.js 20+. No install needed.

```bash
# Set your API key
export VOICE_AI_API_KEY=your-key-here

# Dub a YouTube video with AI voiceover
node voiceai-vo.cjs build \
  --input my-script.md \
  --voice oliver \
  --title "My YouTube Video" \
  --video ./my-recording.mp4 \
  --mux \
  --template youtube

# Or just build the voiceover (no video)
node voiceai-vo.cjs build --input examples/youtube_script.md --voice ellie --title "My Video"

# List available voices
node voiceai-vo.cjs voices

# Test without an API key
node voiceai-vo.cjs build --input examples/youtube_script.md --voice ellie --title "My Video" --mock
```

Get your API key at [voice.ai/dashboard](https://voice.ai/dashboard).

## What it produces

```
out/<title>/
  muxed.mp4        # Dubbed video (if --video --mux)
  segments/        # Numbered WAV files
  master.wav       # Stitched voiceover
  master.mp3       # MP3 for upload
  chapters.txt     # Paste into YouTube description
  captions.srt     # Upload as YouTube subtitles
  description.txt  # Ready-made YouTube description
  review.html      # Interactive review page
```

## Learn more

- **[SKILL.md](SKILL.md)** — Full skill documentation: commands, voices, outputs, configuration
- **[references/VOICEAI_API.md](references/VOICEAI_API.md)** — Voice.ai API endpoints and formats
- **[references/TROUBLESHOOTING.md](references/TROUBLESHOOTING.md)** — Common issues and fixes

---

*Powered by [Voice.ai](https://voice.ai) · Follows the [Agent Skills specification](https://agentskills.io/specification)*

```

### _meta.json

```json
{
  "owner": "gizmogremlin",
  "slug": "dub-youtube-with-voiceai",
  "displayName": "Dub YouTube with Voice.ai",
  "latest": {
    "version": "0.1.6",
    "publishedAt": 1770772190925,
    "commit": "https://github.com/openclaw/skills/commit/0d7f8fa3f4ca04471028f118a53762278ea276a8"
  },
  "history": []
}

```

dub-youtube-with-voiceai | SkillHub