dub-youtube-with-voiceai
Dub YouTube videos with Voice.ai TTS. Turn scripts into publish-ready voiceovers with chapters, captions, and audio replacement for YouTube long-form and Shorts.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install openclaw-skills-dub-youtube-with-voiceai
Repository
Skill path: skills/gizmogremlin/dub-youtube-with-voiceai
Dub YouTube videos with Voice.ai TTS. Turn scripts into publish-ready voiceovers with chapters, captions, and audio replacement for YouTube long-form and Shorts.
Open repositoryBest for
Primary workflow: Analyze Data & AI.
Technical facets: Full Stack, Data / AI.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: openclaw.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install dub-youtube-with-voiceai into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/openclaw/skills before adding dub-youtube-with-voiceai to shared team environments
- Use dub-youtube-with-voiceai for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: dub-youtube-with-voiceai
description: "Dub YouTube videos with Voice.ai TTS. Turn scripts into publish-ready voiceovers with chapters, captions, and audio replacement for YouTube long-form and Shorts."
version: 0.1.2
env:
- VOICE_AI_API_KEY
required_env:
- VOICE_AI_API_KEY
credentials:
- VOICE_AI_API_KEY
setup: "none — single file, runs directly with Node.js"
runtime: "node>=20"
optional_deps: "ffmpeg"
---
# Dub YouTube with Voice.ai
> This skill follows the [Agent Skills specification](https://agentskills.io/specification).
Turn any script into a **YouTube-ready voiceover** — complete with numbered segments, a stitched master, chapter timestamps, SRT captions, and a review page. Drop the voiceover onto an existing video to dub it in one command.
Built for YouTube creators who want studio-quality narration without the studio. Powered by [Voice.ai](https://voice.ai).
---
## When to use this skill
| Scenario | Why it fits |
|---|---|
| **YouTube long-form** | Full narration with chapter markers and captions |
| **YouTube Shorts** | Quick hooks with punchy delivery |
| **Course content** | Professional narration for educational videos |
| **Screen recordings** | Dub a screencast with clean AI voiceover |
| **Quick iteration** | Smart caching — edit one section, only that segment re-renders |
| **Batch production** | Same voice, consistent quality across every video |
---
## The one-command workflow
Have a script and a video? Dub it in one shot:
```bash
node voiceai-vo.cjs build \
--input my-script.md \
--voice oliver \
--title "My YouTube Video" \
--video ./my-recording.mp4 \
--mux \
--template youtube
```
This renders the voiceover, stitches the master audio, and drops it onto your video — all in one command. Output:
- `out/my-youtube-video/muxed.mp4` — your video dubbed with the AI voiceover
- `out/my-youtube-video/master.wav` — the standalone audio
- `out/my-youtube-video/review.html` — listen and review each segment
- `out/my-youtube-video/chapters.txt` — paste directly into your YouTube description
- `out/my-youtube-video/captions.srt` — upload to YouTube as subtitles
- `out/my-youtube-video/description.txt` — ready-made YouTube description with chapters
Use `--sync pad` if the audio is shorter than the video, or `--sync trim` to cut it to match.
---
## Requirements
- **Node.js 20+** — runtime (no npm install needed — the CLI is a single bundled file)
- **VOICE_AI_API_KEY** — set as environment variable or in a `.env` file in the skill root. Get a key at [voice.ai/dashboard](https://voice.ai/dashboard).
- **ffmpeg** (optional) — needed for master stitching, MP3 encoding, loudness normalization, and video dubbing. The pipeline still produces individual segments, the review page, chapters, and captions without it.
---
## Configuration
Set `VOICE_AI_API_KEY` as an environment variable before running:
```bash
export VOICE_AI_API_KEY=your-key-here
```
The skill does not read `.env` files or access any files for credentials — only the environment variable.
Use `--mock` on any command to run the full pipeline without an API key (produces placeholder audio).
---
## Commands
### `build` — Generate a YouTube voiceover from a script
```bash
node voiceai-vo.cjs build \
--input <script.md or script.txt> \
--voice <voice-alias-or-uuid> \
--title "My YouTube Video" \
[--template youtube] \
[--video input.mp4 --mux --sync shortest] \
[--force] [--mock]
```
**What it does:**
1. Reads the script and splits it into segments (by `##` headings for `.md`, or by sentence boundaries for `.txt`)
2. Optionally prepends/appends YouTube intro/outro segments
3. Renders each segment via Voice.ai TTS
4. Stitches a master audio file (if ffmpeg is available)
5. Generates YouTube chapters, SRT captions, a review page, and a ready-made description
6. Optionally dubs your video with the voiceover
**Full options:**
| Option | Description |
|---|---|
| `-i, --input <path>` | Script file (.txt or .md) — **required** |
| `-v, --voice <id>` | Voice alias or UUID — **required** |
| `-t, --title <title>` | Video title (defaults to filename) |
| `--template youtube` | Auto-inject YouTube intro/outro |
| `--mode <mode>` | `headings` or `auto` (default: headings for .md) |
| `--max-chars <n>` | Max characters per auto-chunk (default: 1500) |
| `--language <code>` | Language code (default: en) |
| `--video <path>` | Input video to dub |
| `--mux` | Enable video dubbing (requires --video) |
| `--sync <policy>` | `shortest`, `pad`, or `trim` (default: shortest) |
| `--force` | Re-render all segments (ignore cache) |
| `--mock` | Mock mode — no API calls, placeholder audio |
| `-o, --out <dir>` | Custom output directory |
### `replace-audio` — Dub an existing video
```bash
node voiceai-vo.cjs replace-audio \
--video ./my-video.mp4 \
--audio ./out/my-video/master.wav \
[--out ./out/my-video/dubbed.mp4] \
[--sync shortest|pad|trim]
```
Requires ffmpeg. If not installed, generates helper shell/PowerShell scripts instead.
| Sync policy | Behavior |
|---|---|
| `shortest` (default) | Output ends when the shorter track ends |
| `pad` | Pad audio with silence to match video duration |
| `trim` | Trim audio to match video duration |
Video stream is copied without re-encoding (`-c:v copy`). Audio is encoded as AAC for YouTube compatibility.
**Privacy:** Video processing is entirely local. Only script text is sent to Voice.ai for TTS. Your video files never leave your machine.
### `voices` — List available voices
```bash
node voiceai-vo.cjs voices [--limit 20] [--query "deep"] [--mock]
```
---
## Available voices
Use short aliases or full UUIDs with `--voice`:
| Alias | Voice | Gender | Best for YouTube |
|----------|----------------------|--------|-----------------------------------|
| `ellie` | Ellie | F | Vlogs, lifestyle, social content |
| `oliver` | Oliver | M | Tutorials, narration, explainers |
| `lilith` | Lilith | F | ASMR, calm walkthroughs |
| `smooth` | Smooth Calm Voice | M | Documentaries, long-form essays |
| `corpse` | Corpse Husband | M | Gaming, entertainment |
| `skadi` | Skadi | F | Anime, character content |
| `zhongli`| Zhongli | M | Gaming, dramatic intros |
| `flora` | Flora | F | Kids content, upbeat videos |
| `chief` | Master Chief | M | Gaming, action trailers |
The `voices` command also returns any additional voices available on the API. Voice list is cached for 10 minutes.
---
## Build outputs
After a build, the output directory contains everything you need to publish on YouTube:
```
out/<title-slug>/
segments/ # Numbered WAV files (001-intro.wav, 002-section.wav, …)
master.wav # Stitched voiceover (requires ffmpeg)
master.mp3 # MP3 for upload (requires ffmpeg)
muxed.mp4 # Dubbed video (if --video --mux used)
chapters.txt # Paste into YouTube description
captions.srt # Upload as YouTube subtitles
description.txt # Ready-made YouTube description with chapters
review.html # Interactive review page with audio players
manifest.json # Build metadata: voice, template, segment list
timeline.json # Segment durations and start times
```
### YouTube workflow
1. Run the build command
2. Upload `muxed.mp4` (or your original video + `master.mp3` as audio)
3. Paste `chapters.txt` content into your YouTube description
4. Upload `captions.srt` as subtitles in YouTube Studio
5. Done — professional narration, chapters, and captions in minutes
---
## YouTube template
Use `--template youtube` to auto-inject a branded intro and outro:
| Segment | Source file |
|---|---|
| Intro (prepended) | `templates/youtube_intro.txt` |
| Outro (appended) | `templates/youtube_outro.txt` |
Edit the files in `templates/` to customize your channel's branding.
---
## Caching
Segments are cached by a hash of: `text content + voice ID + language`.
- Unchanged segments are **skipped** on rebuild — fast iteration
- Modified segments are **re-rendered** automatically
- Use `--force` to re-render everything
- Cache manifest is stored in `segments/.cache.json`
---
## Multilingual dubbing
Voice.ai supports 11 languages — dub your YouTube videos for global audiences:
`en`, `es`, `fr`, `de`, `it`, `pt`, `pl`, `ru`, `nl`, `sv`, `ca`
```bash
node voiceai-vo.cjs build \
--input script-spanish.md \
--voice ellie \
--title "Mi Video" \
--language es \
--video ./my-video.mp4 \
--mux
```
The pipeline auto-selects the multilingual TTS model for non-English languages.
---
## Troubleshooting
| Issue | Solution |
|---|---|
| **ffmpeg missing** | Pipeline still works — you get segments, review page, chapters, captions. Install ffmpeg for stitching and video dubbing. |
| **Rate limits (429)** | Segments render sequentially, which stays under most limits. Wait and retry. |
| **Insufficient credits (402)** | Top up at [voice.ai/dashboard](https://voice.ai/dashboard). Cached segments won't re-use credits on retry. |
| **Long scripts** | Caching makes rebuilds fast. Text over 490 chars per segment is automatically split across API calls. |
| **Windows paths** | Wrap paths with spaces in quotes: `--input "C:\My Scripts\script.md"` |
See [`references/TROUBLESHOOTING.md`](references/TROUBLESHOOTING.md) for more.
---
## References
- [Agent Skills Specification](https://agentskills.io/specification)
- [Voice.ai](https://voice.ai)
- [`references/VOICEAI_API.md`](references/VOICEAI_API.md) — API endpoints, audio formats, models
- [`references/TROUBLESHOOTING.md`](references/TROUBLESHOOTING.md) — Common issues and fixes
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### references/TROUBLESHOOTING.md
```markdown
# Troubleshooting
Common issues and solutions for the voiceai-creator-voiceover-pipeline.
---
## FFmpeg not found
**Symptom:** Warning about skipping master stitch or video muxing.
**Solution:** Install ffmpeg for your platform:
```bash
# macOS
brew install ffmpeg
# Ubuntu / Debian
sudo apt update && sudo apt install ffmpeg
# Windows (Chocolatey)
choco install ffmpeg
# Windows (winget)
winget install ffmpeg
# Windows (manual)
# Download from https://ffmpeg.org/download.html
# Add the bin/ folder to your PATH
```
After installing, verify:
```bash
ffmpeg -version
ffprobe -version
```
> **Note:** The pipeline still works without ffmpeg — you get individual segments, the review page, chapters, and captions. Only master stitching and video muxing require ffmpeg.
---
## Rate limits / API errors
**Symptom:** `Voice.ai TTS error 429` or similar.
**Solutions:**
- Wait a few minutes and retry
- Use `--mock` mode for pipeline testing
- For large scripts, the pipeline renders segments sequentially — this naturally stays under most rate limits
- If you hit limits frequently, consider adding retry logic in `src/api.ts` (a TODO for production use)
---
## Long scripts
**Symptom:** Script has many segments and takes a long time.
**Solutions:**
- Segments are cached by content hash — subsequent builds only re-render changed segments
- Use `--max-chars` to control segment size (larger = fewer API calls, but longer per-segment)
- Split very long scripts into separate build runs
- Use `--mock` mode for testing the pipeline before using real API credits
---
## Windows path quoting
**Symptom:** Paths with spaces cause errors.
**Solutions:**
```powershell
# PowerShell — wrap in quotes
voiceai-vo build --input "C:\Users\Me\My Scripts\script.md" --voice v-warm-narrator --mock
# Or use short paths / avoid spaces
voiceai-vo build --input C:\scripts\video.md --voice v-warm-narrator --mock
```
---
## "Real API not yet configured" error
**Symptom:** Error when running without `--mock` and without setting up real API endpoints.
**Explanation:** The current release uses placeholder API endpoints. The full pipeline works in `--mock` mode.
**Solutions:**
- Always use `--mock` flag until real API endpoints are configured
- See `references/VOICEAI_API.md` for what endpoints to fill in
- Contact the Voice.ai team for production API access
---
## Audio playback issues in review.html
**Symptom:** Audio players don't work in review.html.
**Solutions:**
- Open `review.html` directly in your browser (file:// protocol works)
- Make sure the `segments/` folder is next to `review.html`
- Some browsers block file:// audio — try a local server:
```bash
cd out/my-project/
python3 -m http.server 8000
# Then open http://localhost:8000/review.html
```
---
## Segment caching
**How it works:**
- Each segment is hashed based on: text content + voice ID + language
- Hashes are stored in `segments/.cache.json`
- On rebuild, unchanged segments are skipped
- Use `--force` to regenerate everything
**To clear cache manually:**
```bash
rm out/my-project/segments/.cache.json
```
```
### references/VOICEAI_API.md
```markdown
# Voice.ai API Reference
Production API endpoints used by this pipeline.
Based on the [Voice.ai TTS SDK](https://github.com/gizmoGremlin/openclaw-skill-voice-ai-voices).
## Authentication
Bearer token in the `Authorization` header:
```
Authorization: Bearer <VOICE_AI_API_KEY>
```
Get your key at [voice.ai/dashboard](https://voice.ai/dashboard).
## Base URL
```
https://dev.voice.ai
```
Override via `VOICEAI_API_BASE` environment variable.
API path prefix: `/api/v1`
---
## Endpoints
### `GET /api/v1/tts/voices` — List available voices
**Query Parameters:**
| Param | Type | Default | Description |
|--------------|---------|---------|----------------------------|
| `limit` | integer | 10 | Max voices to return |
| `offset` | integer | 0 | Pagination offset |
| `visibility` | string | — | `PUBLIC` or `PRIVATE` |
**Response:**
```json
{
"voices": [
{
"voice_id": "d1bf0f33-8e0e-4fbf-acf8-45c3c6262513",
"name": "Ellie",
"language": "en",
"visibility": "PUBLIC",
"status": "AVAILABLE"
}
]
}
```
### `POST /api/v1/tts/speech` — Generate speech
**Request Body:**
```json
{
"text": "Hello, world!",
"voice_id": "d1bf0f33-8e0e-4fbf-acf8-45c3c6262513",
"audio_format": "wav",
"temperature": 1.0,
"top_p": 0.8,
"model": "voiceai-tts-v1-latest",
"language": "en"
}
```
| Field | Type | Required | Default | Description |
|----------------|--------|----------|-----------------------------|-------------------------------------|
| `text` | string | ✅ | — | Text to synthesize (max 5000 chars) |
| `voice_id` | string | No | built-in default | Voice UUID or omit for default |
| `audio_format` | string | No | `mp3` | `mp3`, `wav`, `pcm`, etc. |
| `temperature` | number | No | 1.0 | Variation (0.0–2.0) |
| `top_p` | number | No | 0.8 | Nucleus sampling (0.0–1.0) |
| `model` | string | No | auto-selected by language | `voiceai-tts-v1-latest` or `voiceai-tts-multilingual-v1-latest` |
| `language` | string | No | `en` | ISO 639-1 code |
**Response:** Binary audio data in the requested format.
### `POST /api/v1/tts/speech/stream` — Streaming speech
Same body as `/tts/speech`. Returns chunked transfer-encoded audio for low-latency playback.
---
## Audio Formats
| Format | Description |
|-------------------|----------------------------|
| `mp3` | MP3 at 32kHz (default) |
| `wav` | WAV at 32kHz |
| `pcm` | Raw PCM 16-bit |
| `mp3_44100_128` | MP3 44.1kHz 128kbps |
| `mp3_44100_192` | MP3 44.1kHz 192kbps |
| `wav_22050` | WAV 22.05kHz |
| `wav_24000` | WAV 24kHz |
| `opus_48000_128` | Opus 48kHz 128kbps |
## Models
| Model ID | Languages |
|---------------------------------------------|-----------|
| `voiceai-tts-v1-latest` | English |
| `voiceai-tts-multilingual-v1-latest` | 11 langs |
Multilingual model supports: en, es, fr, de, it, pt, pl, ru, nl, sv, ca.
## Popular Voices
| Alias | Voice ID (UUID) | Gender | Style |
|----------|----------------------------------------------|--------|--------------------------|
| `ellie` | `d1bf0f33-8e0e-4fbf-acf8-45c3c6262513` | F | Youthful, vibrant vlogger|
| `oliver` | `f9e6a5eb-a7fd-4525-9e92-75125249c933` | M | Friendly British |
| `lilith` | `4388040c-8812-42f4-a264-f457a6b2b5b9` | F | Soft, feminine |
| `smooth` | `dbb271df-db25-4225-abb0-5200ba1426bc` | M | Deep, smooth narrator |
| `corpse` | `72d2a864-b236-402e-a166-a838ccc2c273` | M | Deep, distinctive |
| `skadi` | `559d3b72-3e79-4f11-9b62-9ec702a6c057` | F | Anime character |
| `zhongli`| `ed751d4d-e633-4bb0-8f5e-b5c8ddb04402` | M | Deep, authoritative |
| `flora` | `a931a6af-fb01-42f0-a8c0-bd14bc302bb1` | F | Cheerful, high pitch |
| `chief` | `bd35e4e6-6283-46b9-86b6-7cfa3dd409b9` | M | Heroic, commanding |
The CLI accepts both aliases (`--voice ellie`) and full UUIDs (`--voice d1bf0f33-...`).
---
## Error Codes
| Code | Meaning | Action |
|------|--------------------|----------------------------------|
| 401 | Invalid API key | Check `VOICE_AI_API_KEY` |
| 402 | Out of credits | Top up at voice.ai/dashboard |
| 422 | Validation error | Check text length, voice_id |
| 429 | Rate limited | Wait and retry |
---
## Mock Mode
When `--mock` is passed, the pipeline runs end-to-end without any network calls:
- Voice listing returns the 9 popular voices above
- TTS returns generated WAV files with an audible test tone
- No API key required
- All output files (review.html, chapters, captions, etc.) are produced identically
```
### templates/youtube_intro.txt
```text
Hey there! Welcome to the video. Before we jump in, make sure you hit that subscribe button and turn on notifications — you won't want to miss what's coming up. Alright, let's get into it.
```
### templates/youtube_outro.txt
```text
And that's a wrap! If you found this valuable, drop a like and leave a comment letting me know your biggest takeaway. I read every single one. Don't forget to subscribe, and I'll see you in the next video. Peace!
```
---
## Skill Companion Files
> Additional files collected from the skill directory layout.
### README.md
```markdown
# Dub YouTube with Voice.ai
> Dub YouTube videos with AI voiceovers — chapters, captions, and audio replacement in one command.
**📖 Skill documentation: [SKILL.md](SKILL.md)**
## Quick start
Requires Node.js 20+. No install needed.
```bash
# Set your API key
export VOICE_AI_API_KEY=your-key-here
# Dub a YouTube video with AI voiceover
node voiceai-vo.cjs build \
--input my-script.md \
--voice oliver \
--title "My YouTube Video" \
--video ./my-recording.mp4 \
--mux \
--template youtube
# Or just build the voiceover (no video)
node voiceai-vo.cjs build --input examples/youtube_script.md --voice ellie --title "My Video"
# List available voices
node voiceai-vo.cjs voices
# Test without an API key
node voiceai-vo.cjs build --input examples/youtube_script.md --voice ellie --title "My Video" --mock
```
Get your API key at [voice.ai/dashboard](https://voice.ai/dashboard).
## What it produces
```
out/<title>/
muxed.mp4 # Dubbed video (if --video --mux)
segments/ # Numbered WAV files
master.wav # Stitched voiceover
master.mp3 # MP3 for upload
chapters.txt # Paste into YouTube description
captions.srt # Upload as YouTube subtitles
description.txt # Ready-made YouTube description
review.html # Interactive review page
```
## Learn more
- **[SKILL.md](SKILL.md)** — Full skill documentation: commands, voices, outputs, configuration
- **[references/VOICEAI_API.md](references/VOICEAI_API.md)** — Voice.ai API endpoints and formats
- **[references/TROUBLESHOOTING.md](references/TROUBLESHOOTING.md)** — Common issues and fixes
---
*Powered by [Voice.ai](https://voice.ai) · Follows the [Agent Skills specification](https://agentskills.io/specification)*
```
### _meta.json
```json
{
"owner": "gizmogremlin",
"slug": "dub-youtube-with-voiceai",
"displayName": "Dub YouTube with Voice.ai",
"latest": {
"version": "0.1.6",
"publishedAt": 1770772190925,
"commit": "https://github.com/openclaw/skills/commit/0d7f8fa3f4ca04471028f118a53762278ea276a8"
},
"history": []
}
```