SkillHub ClubWrite Technical DocsFull StackData / AITech Writer

youtube-ai-digest

Browses AI-related YouTube videos from subscribed channels, fetches transcripts, generates summaries, and creates Markdown reports. Use when the user mentions YouTube AI videos, video summaries, channel subscriptions, or asks about recent AI content from YouTube creators.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

Hot score

Updated

March 20, 2026

Overall rating

C3.1

Composite score

3.1

Best-practice grade

S96.0

Install command

npx @skill-hub/cli install yizhiyanhua-ai-youtube-ai-digest

Repository

yizhiyanhua-ai/yizhiyanhua-ai-youtube-ai-digest

Open repository

Best for

Primary workflow: Write Technical Docs.

Technical facets: Full Stack, Data / AI, Tech Writer.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: yizhiyanhua-ai.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install youtube-ai-digest into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://www.skillhub.club/skills/yizhiyanhua-ai-youtube-ai-digest before adding youtube-ai-digest to shared team environments
Use youtube-ai-digest for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: youtube-ai-digest
description: Browses AI-related YouTube videos from subscribed channels, fetches transcripts, generates summaries, and creates Markdown reports. Use when the user mentions YouTube AI videos, video summaries, channel subscriptions, or asks about recent AI content from YouTube creators.
---

# YouTube AI Digest

Browse subscribed YouTube channels for AI-related videos, extract transcripts, and generate structured Markdown reports.

## Prerequisites

- Python 3.9+
- yt-dlp (`pip install yt-dlp`)

## Quick Start

### 1. Fetch Recent Videos

```bash
python scripts/fetch_videos.py --days 7 --keyword AI
```

Output: `data/videos.json` with filtered video list.

### 2. Get Transcript

```bash
python scripts/get_transcript.py --video-id VIDEO_ID
```

Output: `data/transcript_{VIDEO_ID}.txt` and `.json`.

### 3. Generate Report

```bash
python scripts/generate_report.py --video-id VIDEO_ID --summary "Your summary here"
```

Output: `data/output/{VIDEO_ID}/report.md` with thumbnail.

## Configuration

Edit `data/channels.json` to manage subscribed channels:

```json
{
  "channels": [
    {"name": "Two Minute Papers", "id": "UCbfYPyITQ-7l4upoX8nvctg"},
    {"name": "AI Explained", "id": "UCNJ1Ymd5yFuUPtn21xtRbbw"}
  ]
}
```

Find channel IDs from YouTube channel URLs: `youtube.com/channel/{CHANNEL_ID}`.

## Workflow

Copy this checklist to track progress:

```
Task Progress:
- [ ] Step 1: Fetch recent videos from channels
- [ ] Step 2: Review video list and select target
- [ ] Step 3: Get transcript for selected video
- [ ] Step 4: Analyze transcript and create summary
- [ ] Step 5: Generate Markdown report
```

**Step 1: Fetch recent videos**

Run `python scripts/fetch_videos.py --days 7` to get videos from the past week.

**Step 2: Review and select**

Check `data/videos.json` for available videos. Select one for analysis.

**Step 3: Get transcript**

Run `python scripts/get_transcript.py --video-id {ID}` to download subtitles.

**Step 4: Analyze and summarize**

Read the transcript file and create a concise summary covering:
- Main topics discussed
- Key insights and takeaways
- Notable timestamps

**Step 5: Generate report**

Run `python scripts/generate_report.py --video-id {ID} --summary "..."` to create the final Markdown report.

## Output Format

```markdown
# [Video Title]

![Thumbnail](thumbnail.webp)

## Video Info
- Channel: [Name]
- Published: [Date]
- Duration: [Length]
- Link: [URL]

## Summary
[AI-generated summary of content]

## Transcript
[Timestamped transcript excerpt]
```

## Scripts Reference

| Script | Purpose | Output |
|--------|---------|--------|
| `fetch_videos.py` | Fetch channel videos | `data/videos.json` |
| `get_transcript.py` | Download subtitles | `data/transcript_*.txt/json` |
| `generate_report.py` | Create Markdown report | `data/output/*/report.md` |

## Error Handling

**No transcript available**: Some videos lack subtitles. Check if auto-generated captions exist.

**Rate limiting**: Add delays between requests if fetching many channels.

**Network issues**: Retry with `--days 1` for fewer results.


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### scripts/fetch_videos.py

```python
#!/usr/bin/env python3
"""获取关注频道的最新视频列表"""
import json
import subprocess
from datetime import datetime
from pathlib import Path
import argparse

DATA_DIR = Path(__file__).parent.parent / "data"
CHANNELS_FILE = DATA_DIR / "channels.json"
OUTPUT_FILE = DATA_DIR / "videos.json"

def load_channels():
    if not CHANNELS_FILE.exists():
        return []
    with open(CHANNELS_FILE) as f:
        return json.load(f).get("channels", [])

def fetch_channel_videos(channel_id, days=1):  # noqa: ARG001
    """使用 yt-dlp 获取频道视频 (days 参数预留用于时间过滤)"""
    cmd = [
        "yt-dlp", "--flat-playlist", "--dump-json",
        f"https://www.youtube.com/channel/{channel_id}/videos"
    ]
    try:
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=60)
        videos = []
        for line in result.stdout.strip().split('\n'):
            if not line:
                continue
            video = json.loads(line)
            videos.append({
                "id": video.get("id"),
                "title": video.get("title"),
                "url": f"https://www.youtube.com/watch?v={video.get('id')}",
                "channel_id": channel_id
            })
            if len(videos) >= 10:  # 限制每个频道最多10个
                break
        return videos
    except Exception as e:
        print(f"Error fetching {channel_id}: {e}")
        return []

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--days", type=int, default=1)
    parser.add_argument("--keyword", default="AI")
    args = parser.parse_args()

    channels = load_channels()
    if not channels:
        print("No channels configured. Edit data/channels.json")
        return

    all_videos = []
    for ch in channels:
        print(f"Fetching: {ch['name']}...")
        videos = fetch_channel_videos(ch["id"], args.days)
        for v in videos:
            v["channel_name"] = ch["name"]
        all_videos.extend(videos)

    # 过滤 AI 相关
    keyword = args.keyword.lower()
    filtered = [v for v in all_videos if keyword in v.get("title", "").lower()]

    OUTPUT_FILE.parent.mkdir(parents=True, exist_ok=True)
    with open(OUTPUT_FILE, "w") as f:
        json.dump({"videos": filtered, "fetched_at": datetime.now().isoformat()}, f, indent=2, ensure_ascii=False)

    print(f"\nFound {len(filtered)} AI-related videos")
    for v in filtered:
        print(f"  - {v['title']} ({v['channel_name']})")

if __name__ == "__main__":
    main()

```

### scripts/get_transcript.py

```python
#!/usr/bin/env python3
"""获取视频字幕 - 使用 yt-dlp"""
import json
import argparse
import subprocess
from pathlib import Path

DATA_DIR = Path(__file__).parent.parent / "data"

def get_transcript_ytdlp(video_id):
    """使用 yt-dlp 获取字幕"""
    url = f"https://www.youtube.com/watch?v={video_id}"
    output_template = str(DATA_DIR / f"sub_{video_id}")

    # 尝试获取字幕
    cmd = [
        "yt-dlp", "--skip-download",
        "--write-auto-sub", "--write-sub",
        "--sub-lang", "en,zh",
        "--sub-format", "vtt",
        "-o", output_template,
        url
    ]
    subprocess.run(cmd, capture_output=True)

    # 查找生成的字幕文件
    for suffix in [".en.vtt", ".zh.vtt", ".en-orig.vtt"]:
        sub_file = DATA_DIR / f"sub_{video_id}{suffix}"
        if sub_file.exists():
            return parse_vtt(sub_file), suffix.split('.')[1]
    return None, None

def parse_vtt(vtt_file):
    """解析 VTT 字幕文件"""
    content = vtt_file.read_text(encoding="utf-8")
    lines = content.split('\n')
    transcript = []
    i = 0
    while i < len(lines):
        line = lines[i].strip()
        # 查找时间戳行 (00:00:00.000 --> 00:00:00.000)
        if '-->' in line:
            parts = line.split('-->')
            start_time = parts[0].strip()
            # 解析时间
            time_parts = start_time.replace(',', '.').split(':')
            if len(time_parts) == 3:
                h, m, s = time_parts
                start_seconds = int(h) * 3600 + int(m) * 60 + float(s.split('.')[0])
            else:
                start_seconds = 0
            # 获取字幕文本
            i += 1
            text_lines = []
            while i < len(lines) and lines[i].strip() and '-->' not in lines[i]:
                text = lines[i].strip()
                # 移除 VTT 标签
                if not text.isdigit() and '<' not in text:
                    text_lines.append(text)
                i += 1
            if text_lines:
                transcript.append({"start": start_seconds, "text": ' '.join(text_lines)})
        else:
            i += 1
    return transcript

def format_transcript(transcript):
    """格式化字幕为文本"""
    lines = []
    for entry in transcript:
        start = int(entry["start"])
        mins, secs = divmod(start, 60)
        text = entry["text"]
        lines.append(f"[{mins:02d}:{secs:02d}] {text}")
    return "\n".join(lines)

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--video-id", required=True)
    parser.add_argument("--output", help="输出文件路径")
    args = parser.parse_args()

    print(f"获取字幕: {args.video_id}")
    transcript, lang = get_transcript_ytdlp(args.video_id)

    if not transcript:
        print("无法获取字幕")
        return

    formatted = format_transcript(transcript)
    print(f"字幕语言: {lang}")
    print(f"字幕条数: {len(transcript)}")

    output_file = DATA_DIR / f"transcript_{args.video_id}.txt"
    output_file.write_text(formatted, encoding="utf-8")
    print(f"已保存到: {output_file}")

    # JSON 格式
    json_file = DATA_DIR / f"transcript_{args.video_id}.json"
    with open(json_file, "w", encoding="utf-8") as f:
        json.dump(transcript, f, ensure_ascii=False, indent=2)

if __name__ == "__main__":
    main()

```

### scripts/generate_report.py

```python
#!/usr/bin/env python3
"""生成 Markdown 报告"""
import json
import argparse
import subprocess
from datetime import datetime
from pathlib import Path

DATA_DIR = Path(__file__).parent.parent / "data"
OUTPUT_DIR = DATA_DIR / "output"

def get_video_info(video_id):
    """使用 yt-dlp 获取视频信息"""
    cmd = ["yt-dlp", "--dump-json", "--no-download", f"https://www.youtube.com/watch?v={video_id}"]
    try:
        result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
        return json.loads(result.stdout)
    except (subprocess.TimeoutExpired, json.JSONDecodeError, FileNotFoundError) as e:
        print(f"Error fetching video info: {e}")
        return {}

def download_thumbnail(video_id, output_path):
    """下载视频封面"""
    cmd = ["yt-dlp", "--write-thumbnail", "--skip-download", "-o", str(output_path / "thumbnail"),
           f"https://www.youtube.com/watch?v={video_id}"]
    subprocess.run(cmd, capture_output=True)

def generate_markdown(video_id, info, transcript_file, screenshots=None, summary=None):
    """生成 Markdown 报告"""
    title = info.get("title", "Unknown")
    channel = info.get("channel", "Unknown")
    upload_date = info.get("upload_date", "")
    duration = info.get("duration_string", "")
    url = f"https://www.youtube.com/watch?v={video_id}"

    md = f"""# {title}

![封面](thumbnail.webp)

## 视频信息
- 频道: {channel}
- 发布时间: {upload_date}
- 时长: {duration}
- 链接: {url}

## 内容摘要
{summary or "[请使用 Claude 根据字幕生成摘要]"}

"""
    # 添加字幕内容
    if transcript_file and Path(transcript_file).exists():
        md += "## 字幕内容\n\n"
        md += "```\n"
        md += Path(transcript_file).read_text()[:3000]  # 限制长度
        md += "\n```\n\n"

    # 添加截图
    if screenshots:
        md += "## 关键截图\n\n"
        for i, ss in enumerate(screenshots, 1):
            md += f"![截图{i}]({ss})\n\n"

    return md

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--video-id", required=True)
    parser.add_argument("--output", default=str(OUTPUT_DIR))
    parser.add_argument("--summary", help="摘要内容")
    args = parser.parse_args()

    output_dir = Path(args.output) / args.video_id
    output_dir.mkdir(parents=True, exist_ok=True)

    print(f"获取视频信息: {args.video_id}")
    info = get_video_info(args.video_id)

    print("下载封面...")
    download_thumbnail(args.video_id, output_dir)

    transcript_file = DATA_DIR / f"transcript_{args.video_id}.txt"

    md = generate_markdown(args.video_id, info, transcript_file, summary=args.summary)

    report_file = output_dir / "report.md"
    report_file.write_text(md, encoding="utf-8")
    print(f"报告已生成: {report_file}")

if __name__ == "__main__":
    main()

```

### data/channels.json

```json
{
  "channels": [
    {"name": "Two Minute Papers", "id": "UCbfYPyITQ-7l4upoX8nvctg"},
    {"name": "Yannic Kilcher", "id": "UCZHmQk67mN31gbHey6BVyNw"},
    {"name": "AI Explained", "id": "UCNJ1Ymd5yFuUPtn21xtRbbw"}
  ]
}

```