Back to skills
SkillHub ClubAnalyze Data & AIFull StackData / AI

clip-hand-skill

Expert knowledge for AI video clipping — yt-dlp downloading, whisper transcription, SRT generation, and ffmpeg processing

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
14,931
Hot score
99
Updated
March 20, 2026
Overall rating
C4.0
Composite score
4.0
Best-practice grade
B80.4

Install command

npx @skill-hub/cli install rightnow-ai-openfang-clip

Repository

RightNow-AI/openfang

Skill path: crates/openfang-hands/bundled/clip

Expert knowledge for AI video clipping — yt-dlp downloading, whisper transcription, SRT generation, and ffmpeg processing

Open repository

Best for

Primary workflow: Analyze Data & AI.

Technical facets: Full Stack, Data / AI.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: RightNow-AI.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install clip-hand-skill into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/RightNow-AI/openfang before adding clip-hand-skill to shared team environments
  • Use clip-hand-skill for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: clip-hand-skill
version: "2.0.0"
description: "Expert knowledge for AI video clipping — yt-dlp downloading, whisper transcription, SRT generation, and ffmpeg processing"
runtime: prompt_only
---

# Video Clipping Expert Knowledge

## Cross-Platform Notes

All tools (ffmpeg, ffprobe, yt-dlp, whisper) use **identical CLI flags** on Windows, macOS, and Linux. The differences are only in shell syntax:

| Feature | macOS / Linux | Windows (cmd.exe) |
|---------|---------------|-------------------|
| Suppress stderr | `2>/dev/null` | `2>NUL` |
| Filter output | `\| grep pattern` | `\| findstr pattern` |
| Delete files | `rm file1 file2` | `del file1 file2` |
| Null output device | `-f null -` | `-f null -` (same) |
| ffmpeg subtitle paths | `subtitles=clip.srt` | `subtitles=clip.srt` (relative OK, absolute needs `C\\:/path`) |

IMPORTANT: ffmpeg filter paths (`-vf "subtitles=..."`) always need forward slashes. On Windows with absolute paths, escape the colon: `subtitles=C\\:/Users/me/clip.srt`

Prefer using `file_write` tool for creating SRT/text files instead of shell echo/heredoc.

---

## yt-dlp Reference

### Download with Format Selection
```
# Best video up to 1080p + best audio, merged
yt-dlp -f "bv[height<=1080]+ba/b[height<=1080]" --restrict-filenames -o "source.%(ext)s" "URL"

# 720p max (smaller, faster)
yt-dlp -f "bv[height<=720]+ba/b[height<=720]" --restrict-filenames -o "source.%(ext)s" "URL"

# Audio only (for transcription-only workflows)
yt-dlp -x --audio-format wav --restrict-filenames -o "audio.%(ext)s" "URL"
```

### Metadata Inspection
```
# Get full metadata as JSON (duration, title, chapters, available subs)
yt-dlp --dump-json "URL"

# Key fields: duration, title, description, chapters, subtitles, automatic_captions
```

### YouTube Auto-Subtitles
```
# Download auto-generated subtitles in json3 format (word-level timing)
yt-dlp --write-auto-subs --sub-lang en --sub-format json3 --skip-download --restrict-filenames -o "source" "URL"

# Download manual subtitles if available
yt-dlp --write-subs --sub-lang en --sub-format srt --skip-download --restrict-filenames -o "source" "URL"

# List available subtitle languages
yt-dlp --list-subs "URL"
```

### Useful Flags
- `--restrict-filenames` — safe ASCII filenames (no spaces/special chars) — important on all platforms
- `--no-playlist` — download single video even if URL is in a playlist
- `-o "template.%(ext)s"` — output template (%(ext)s auto-detects format)
- `--cookies-from-browser chrome` — use browser cookies for age-restricted content
- `--extract-audio` / `-x` — extract audio only
- `--audio-format wav` — convert audio to wav (for whisper)

---

## Whisper Transcription Reference

### Audio Extraction for Whisper
```
# Extract mono 16kHz WAV (whisper's preferred input format)
ffmpeg -i source.mp4 -vn -ar 16000 -ac 1 -y audio.wav
```

### Basic Transcription
```
# Standard transcription with word-level timestamps
whisper audio.wav --model small --output_format json --word_timestamps true --language en

# Faster alternative (same flags, 4x speed)
whisper-ctranslate2 audio.wav --model small --output_format json --word_timestamps true --language en
```

### Model Sizes
| Model | VRAM | Speed | Quality | Use When |
|-------|------|-------|---------|----------|
| tiny | ~1GB | Fastest | Rough | Quick previews, testing pipeline |
| base | ~1GB | Fast | OK | Short clips, clear speech |
| small | ~2GB | Good | Good | **Default — best balance** |
| medium | ~5GB | Slow | Better | Important content, accented speech |
| large-v3 | ~10GB | Slowest | Best | Final production, multiple languages |

Note: On macOS Apple Silicon, consider `mlx-whisper` as a faster native alternative.

### JSON Output Structure
```json
{
  "text": "full transcript text...",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 4.52,
      "text": " Hello everyone, welcome back.",
      "words": [
        {"word": " Hello", "start": 0.0, "end": 0.32, "probability": 0.95},
        {"word": " everyone,", "start": 0.32, "end": 0.78, "probability": 0.91},
        {"word": " welcome", "start": 0.78, "end": 1.14, "probability": 0.98},
        {"word": " back.", "start": 1.14, "end": 1.52, "probability": 0.97}
      ]
    }
  ]
}
```
- `segments[].words[]` gives word-level timing when `--word_timestamps true`
- `probability` indicates confidence (< 0.5 = likely wrong)

---

## YouTube json3 Subtitle Parsing

### Format Structure
```json
{
  "events": [
    {
      "tStartMs": 1230,
      "dDurationMs": 5000,
      "segs": [
        {"utf8": "hello ", "tOffsetMs": 0},
        {"utf8": "world ", "tOffsetMs": 200},
        {"utf8": "how ", "tOffsetMs": 450},
        {"utf8": "are you", "tOffsetMs": 700}
      ]
    }
  ]
}
```

### Extracting Word Timing
For each event and each segment within it:
- `word_start_ms = event.tStartMs + seg.tOffsetMs`
- `word_start_secs = word_start_ms / 1000.0`
- `word_text = seg.utf8.trim()`

Events without `segs` are line breaks or formatting — skip them.
Events with `segs` containing only `"\n"` are newlines — skip them.

---

## SRT Generation from Transcript

### SRT Format
```
1
00:00:00,000 --> 00:00:02,500
First line of caption text

2
00:00:02,500 --> 00:00:05,100
Second line of caption text
```

### Rules for Building Good SRT
- Group words into subtitle lines of ~8-12 words (2-3 seconds per line)
- Break at natural pause points (periods, commas, clause boundaries)
- Keep lines under 42 characters for readability on mobile
- Adjust timestamps relative to clip start (subtract clip start time from all timestamps)
- Timestamp format: `HH:MM:SS,mmm` (comma separator, not dot)
- Each entry: index line, timestamp line, text line(s), blank line
- Use `file_write` tool to create the SRT file — works identically on all platforms

### Styled Captions with ASS Format
For animated/styled captions, use ASS subtitle format instead of SRT:
```
ffmpeg -i clip.mp4 -vf "subtitles=clip.ass:force_style='FontSize=22,FontName=Arial,Bold=1,PrimaryColour=&H00FFFFFF,OutlineColour=&H00000000,Outline=2,Shadow=1,Alignment=2,MarginV=40'" -c:a copy output.mp4
```

Key ASS style properties:
- `PrimaryColour=&H00FFFFFF` — white text (AABBGGRR format)
- `OutlineColour=&H00000000` — black outline
- `Outline=2` — outline thickness
- `Alignment=2` — bottom center
- `MarginV=40` — margin from bottom edge
- `FontSize=22` — good size for 1080x1920 vertical

---

## FFmpeg Video Processing

### Scene Detection
```
ffmpeg -i input.mp4 -filter:v "select='gt(scene,0.3)',showinfo" -f null - 2>&1
```
- Threshold 0.1 = very sensitive, 0.5 = only major cuts
- Parse `pts_time:` from showinfo output for timestamps
- On macOS/Linux pipe through `grep showinfo`, on Windows pipe through `findstr showinfo`

### Silence Detection
```
ffmpeg -i input.mp4 -af "silencedetect=noise=-30dB:d=1.5" -f null - 2>&1
```
- `d=1.5` = minimum 1.5 seconds of silence
- Look for `silence_start` and `silence_end` in output

### Clip Extraction
```
# Re-encoded (accurate cuts)
ffmpeg -ss 00:01:30 -to 00:02:15 -i input.mp4 -c:v libx264 -c:a aac -preset fast -crf 23 -movflags +faststart -y clip.mp4

# Lossless copy (fast but may have keyframe alignment issues)
ffmpeg -ss 00:01:30 -to 00:02:15 -i input.mp4 -c copy -y clip.mp4
```
- `-ss` before `-i` = fast seek (recommended for extraction)
- `-to` = end timestamp, `-t` = duration

### Vertical Video (9:16 for Shorts/Reels/TikTok)
```
# Center crop (when source is 16:9)
ffmpeg -i input.mp4 -vf "crop=ih*9/16:ih:(iw-ih*9/16)/2:0,scale=1080:1920" -c:a copy output.mp4

# Scale with letterbox padding (preserves full frame)
ffmpeg -i input.mp4 -vf "scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2:black" -c:a copy output.mp4
```

### Caption Burn-in
```
# SRT subtitles with styling (use relative path or forward-slash absolute path)
ffmpeg -i input.mp4 -vf "subtitles=subs.srt:force_style='FontSize=22,FontName=Arial,PrimaryColour=&H00FFFFFF,OutlineColour=&H00000000,Outline=2,Alignment=2,MarginV=40'" -c:a copy output.mp4

# Simple text overlay
ffmpeg -i input.mp4 -vf "drawtext=text='Caption':fontsize=48:fontcolor=white:borderw=3:bordercolor=black:x=(w-text_w)/2:y=h-th-40" output.mp4
```
Windows path escaping: `subtitles=C\\:/Users/me/subs.srt` (double-backslash before colon)

### Thumbnail Generation
```
# At specific time (2 seconds in)
ffmpeg -i input.mp4 -ss 2 -frames:v 1 -q:v 2 -y thumb.jpg

# Best keyframe
ffmpeg -i input.mp4 -vf "select='eq(pict_type,I)',scale=1280:720" -frames:v 1 thumb.jpg

# Contact sheet
ffmpeg -i input.mp4 -vf "fps=1/10,scale=320:-1,tile=4x4" contact.jpg
```

### Video Analysis
```
# Full metadata (JSON)
ffprobe -v quiet -print_format json -show_format -show_streams input.mp4

# Duration only
ffprobe -v error -show_entries format=duration -of csv=p=0 input.mp4

# Resolution
ffprobe -v error -select_streams v:0 -show_entries stream=width,height -of csv=p=0 input.mp4
```

## API-Based STT Reference

### Groq Whisper API
Fastest cloud STT — uses whisper-large-v3 on Groq hardware. Free tier available.
```
curl -s -X POST "https://api.groq.com/openai/v1/audio/transcriptions" \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F "[email protected]" \
  -F "model=whisper-large-v3" \
  -F "response_format=verbose_json" \
  -F "timestamp_granularities[]=word" \
  -o transcript_raw.json
```
Response: `{"text": "...", "words": [{"word": "hello", "start": 0.0, "end": 0.32}]}`
- Max file size: 25MB. For longer audio, split with ffmpeg first.
- `timestamp_granularities[]=word` is required for word-level timing.

### OpenAI Whisper API
```
curl -s -X POST "https://api.openai.com/v1/audio/transcriptions" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F "[email protected]" \
  -F "model=whisper-1" \
  -F "response_format=verbose_json" \
  -F "timestamp_granularities[]=word" \
  -o transcript_raw.json
```
Response format same as Groq. Max 25MB.

### Deepgram Nova-2
```
curl -s -X POST "https://api.deepgram.com/v1/listen?model=nova-2&smart_format=true&utterances=true&punctuate=true" \
  -H "Authorization: Token $DEEPGRAM_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @audio.wav \
  -o transcript_raw.json
```
Response: `{"results": {"channels": [{"alternatives": [{"words": [{"word": "hello", "start": 0.0, "end": 0.32, "confidence": 0.99}]}]}]}}`
- Supports streaming, but for clips use batch mode.
- `smart_format=true` adds punctuation and casing.

---

## TTS Reference

### Edge TTS (free, no API key needed)
```
# List available voices
edge-tts --list-voices

# Generate speech
edge-tts --text "Your caption text here" --voice en-US-AriaNeural --write-media tts_output.mp3

# Other good voices: en-US-GuyNeural, en-GB-SoniaNeural, en-AU-NatashaNeural
```
Install: `pip install edge-tts`

### OpenAI TTS
```
curl -s -X POST "https://api.openai.com/v1/audio/speech" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"tts-1","input":"Your text here","voice":"alloy"}' \
  --output tts_output.mp3
```
Voices: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`
Models: `tts-1` (fast), `tts-1-hd` (quality)

### ElevenLabs
```
curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/21m00Tcm4TlvDq8ikWAM" \
  -H "xi-api-key: $ELEVENLABS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text":"Your text here","model_id":"eleven_monolingual_v1"}' \
  --output tts_output.mp3
```
Voice ID `21m00Tcm4TlvDq8ikWAM` = Rachel (default). List voices: `GET /v1/voices`

### Audio Merging (TTS + Original)
```
# Mix TTS over original audio (original at 30% volume, TTS at 100%)
ffmpeg -i clip.mp4 -i tts.mp3 \
  -filter_complex "[0:a]volume=0.3[orig];[1:a]volume=1.0[tts];[orig][tts]amix=inputs=2:duration=first[out]" \
  -map 0:v -map "[out]" -c:v copy -c:a aac -y clip_voiced.mp4

# Replace audio entirely (no original audio)
ffmpeg -i clip.mp4 -i tts.mp3 -map 0:v -map 1:a -c:v copy -c:a aac -shortest -y clip_voiced.mp4
```

---

## Quality & Performance Tips

- Use `-preset ultrafast` for quick previews, `-preset slow` for final output
- Use `-crf 23` for good quality (18=high, 28=low, lower=bigger files)
- Add `-movflags +faststart` for web-friendly MP4
- Use `-threads 0` to auto-detect CPU cores
- Always use `-y` to overwrite without asking

---

## Telegram Bot API Reference

### sendVideo — Upload and send a video to a chat/channel
```
curl -s -X POST "https://api.telegram.org/bot<BOT_TOKEN>/sendVideo" \
  -F "chat_id=<CHAT_ID>" \
  -F "video=@clip_N_final.mp4" \
  -F "caption=Clip title here" \
  -F "parse_mode=HTML" \
  -F "supports_streaming=true"
```

### Parameters
| Parameter | Required | Description |
|-----------|----------|-------------|
| `chat_id` | Yes | Channel (`-100XXXXXXXXXX` or `@channelname`), group, or user numeric ID |
| `video` | Yes | `@filepath` for upload (max 50MB) or a Telegram `file_id` for re-send |
| `caption` | No | Text caption, up to 1024 characters |
| `parse_mode` | No | `HTML` or `MarkdownV2` for styled captions |
| `supports_streaming` | No | `true` enables progressive playback |

### Success Response
```json
{"ok": true, "result": {"message_id": 1234, "video": {"file_id": "BAACAgI...", "file_size": 5242880}}}
```

### Error Response
```json
{"ok": false, "error_code": 400, "description": "Bad Request: chat not found"}
```

### Common Errors
| Error Code | Description | Fix |
|------------|-------------|-----|
| 400 | Chat not found | Verify chat_id; bot must be added to the channel/group |
| 401 | Unauthorized | Bot token is invalid or revoked — regenerate via @BotFather |
| 413 | Request entity too large | File exceeds 50MB — re-encode: `ffmpeg -i input.mp4 -fs 49M -c:v libx264 -crf 28 -preset fast -c:a aac -y output.mp4` |
| 429 | Too many requests | Rate limited — wait the `retry_after` seconds from the response |

### File Size Limit
Telegram allows up to **50MB** for video uploads via Bot API. If a clip exceeds this:
```
ffmpeg -i clip_N_final.mp4 -fs 49M -c:v libx264 -crf 28 -preset fast -c:a aac -movflags +faststart -y clip_N_tg.mp4
```

---

## WhatsApp Business Cloud API Reference

### Two-Step Flow: Upload Media → Send Message

WhatsApp Cloud API requires uploading the video first to get a `media_id`, then sending a message referencing that ID.

### Step 1 — Upload Media
```
curl -s -X POST "https://graph.facebook.com/v21.0/<PHONE_NUMBER_ID>/media" \
  -H "Authorization: Bearer <ACCESS_TOKEN>" \
  -F "file=@clip_N_final.mp4" \
  -F "type=video/mp4" \
  -F "messaging_product=whatsapp"
```

Success response:
```json
{"id": "1234567890"}
```

### Step 2 — Send Video Message
```
curl -s -X POST "https://graph.facebook.com/v21.0/<PHONE_NUMBER_ID>/messages" \
  -H "Authorization: Bearer <ACCESS_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "messaging_product": "whatsapp",
    "to": "<RECIPIENT_PHONE>",
    "type": "video",
    "video": {
      "id": "<MEDIA_ID>",
      "caption": "Clip title here"
    }
  }'
```

Success response:
```json
{"messaging_product": "whatsapp", "contacts": [{"wa_id": "14155551234"}], "messages": [{"id": "wamid.HBgL..."}]}
```

### File Size Limit
WhatsApp allows up to **16MB** for video uploads. If a clip exceeds this:
```
ffmpeg -i clip_N_final.mp4 -fs 15M -c:v libx264 -crf 30 -preset fast -c:a aac -movflags +faststart -y clip_N_wa.mp4
```

### 24-Hour Messaging Window
WhatsApp requires the recipient to have messaged you within the last 24 hours (for non-template messages). If you get a "template required" error, either:
- Ask the recipient to send any message to the business number first
- Use a pre-approved message template instead of a free-form video message

### Common Errors
| Error Code | Description | Fix |
|------------|-------------|-----|
| 100 | Invalid parameter | Check phone_number_id and recipient format (no + prefix, no spaces) |
| 190 | Invalid/expired access token | Regenerate token in Meta Business Settings; temporary tokens expire in 24h |
| 131030 | Recipient not in allowed list | In test mode, add recipient to allowed numbers in Meta Developer Portal |
| 131047 | Re-engagement message / template required | Recipient hasn't messaged within 24h — use a template or ask them to message first |
| 131053 | Media upload failed | File too large or unsupported format — re-encode as MP4 under 16MB |
clip-hand-skill | SkillHub