Back to skills
SkillHub ClubDesign ProductData / AIDesigner

transcribe-audio

Transcribes video audio using WhisperX with accurate timestamp preservation. Creates JSON transcripts with word-level timing. Designed for parallel execution in video analysis pipelines where timestamp alignment is critical.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
185
Hot score
97
Updated
March 20, 2026
Overall rating
A8.0
Composite score
5.7
Best-practice grade
S96.0

Install command

npx @skill-hub/cli install barefootford-buttercut-transcribe-audio
audio-transcriptionwhisperxvideo-processingtimestampsparallel-processing

Repository

barefootford/buttercut

Skill path: .claude/skills/transcribe-audio

Transcribes video audio using WhisperX with accurate timestamp preservation. Creates JSON transcripts with word-level timing. Designed for parallel execution in video analysis pipelines where timestamp alignment is critical.

Open repository

Best for

Primary workflow: Design Product.

Technical facets: Data / AI, Designer.

Target audience: Developers and researchers working with video analysis pipelines who need accurate audio transcripts with precise timing for synchronization with visual data..

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: barefootford.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install transcribe-audio into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/barefootford/buttercut before adding transcribe-audio to shared team environments
  • Use transcribe-audio for ai/ml workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: transcribe-audio
description: Transcribes video audio using WhisperX, preserving original timestamps. Creates JSON transcript with word-level timing. Use when you need to generate audio transcripts for videos.
---

# Skill: Transcribe Audio

Transcribes video audio using WhisperX and creates clean JSON transcripts with word-level timing data.

## When to Use
- Videos need audio transcripts before visual analysis

## Critical Requirements

Use WhisperX, NOT standard Whisper. WhisperX preserves the original video timeline including leading silence, ensuring transcripts match actual video timestamps. Run WhisperX directly on video files. Don't extract audio separately - this ensures timestamp alignment.

## Workflow

### 1. Read Language from Library File

Read the library's `library.yaml` to get the language code:

```yaml
# Library metadata
library_name: [library-name]
language: en  # Language code stored here
...
```

### 2. Run WhisperX

```bash
whisperx "/full/path/to/video.mov" \
  --language en \
  --model medium \
  --compute_type float32 \
  --device cpu \
  --output_format json \
  --output_dir libraries/[library-name]/transcripts
```

### 3. Prepare Audio Transcript

After WhisperX completes, format the JSON using our prepare_audio_script:

```bash
ruby .claude/skills/transcribe-audio/prepare_audio_script.rb \
  libraries/[library-name]/transcripts/video_name.json \
  /full/path/to/original/video_name.mov
```

This script:
- Adds video source path as metadata
- Removes unnecessary fields to reduce file size
- Prettifies JSON

### 4. Return Success Response

After audio preparation completes, return this structured response to the parent agent:

```
✓ [video_filename.mov] transcribed successfully
  Audio transcript: libraries/[library-name]/transcripts/video_name.json
  Video path: /full/path/to/video_filename.mov
```

**DO NOT update library.yaml** - the parent agent will handle this to avoid race conditions when running multiple transcriptions in parallel.

## Running in Parallel

This skill is designed to run inside a Task agent for parallel execution:
- Each agent handles ONE video file
- Multiple agents can run simultaneously
- Parent thread updates library.yaml sequentially after each agent completes
- No race conditions on shared YAML file

## Next Step

After audio transcription, use the **analyze-video** skill to add visual descriptions and create the visual transcript.

## Installation

Ensure WhisperX is installed. Use the **setup** skill to verify dependencies.
transcribe-audio | SkillHub