youtube-ai-digest
Browses AI-related YouTube videos from subscribed channels, fetches transcripts, generates summaries, and creates Markdown reports. Use when the user mentions YouTube AI videos, video summaries, channel subscriptions, or asks about recent AI content from YouTube creators.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install yizhiyanhua-ai-youtube-ai-digest
Repository
Browses AI-related YouTube videos from subscribed channels, fetches transcripts, generates summaries, and creates Markdown reports. Use when the user mentions YouTube AI videos, video summaries, channel subscriptions, or asks about recent AI content from YouTube creators.
Open repositoryBest for
Primary workflow: Write Technical Docs.
Technical facets: Full Stack, Data / AI, Tech Writer.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: yizhiyanhua-ai.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install youtube-ai-digest into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://www.skillhub.club/skills/yizhiyanhua-ai-youtube-ai-digest before adding youtube-ai-digest to shared team environments
- Use youtube-ai-digest for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: youtube-ai-digest
description: Browses AI-related YouTube videos from subscribed channels, fetches transcripts, generates summaries, and creates Markdown reports. Use when the user mentions YouTube AI videos, video summaries, channel subscriptions, or asks about recent AI content from YouTube creators.
---
# YouTube AI Digest
Browse subscribed YouTube channels for AI-related videos, extract transcripts, and generate structured Markdown reports.
## Prerequisites
- Python 3.9+
- yt-dlp (`pip install yt-dlp`)
## Quick Start
### 1. Fetch Recent Videos
```bash
python scripts/fetch_videos.py --days 7 --keyword AI
```
Output: `data/videos.json` with filtered video list.
### 2. Get Transcript
```bash
python scripts/get_transcript.py --video-id VIDEO_ID
```
Output: `data/transcript_{VIDEO_ID}.txt` and `.json`.
### 3. Generate Report
```bash
python scripts/generate_report.py --video-id VIDEO_ID --summary "Your summary here"
```
Output: `data/output/{VIDEO_ID}/report.md` with thumbnail.
## Configuration
Edit `data/channels.json` to manage subscribed channels:
```json
{
"channels": [
{"name": "Two Minute Papers", "id": "UCbfYPyITQ-7l4upoX8nvctg"},
{"name": "AI Explained", "id": "UCNJ1Ymd5yFuUPtn21xtRbbw"}
]
}
```
Find channel IDs from YouTube channel URLs: `youtube.com/channel/{CHANNEL_ID}`.
## Workflow
Copy this checklist to track progress:
```
Task Progress:
- [ ] Step 1: Fetch recent videos from channels
- [ ] Step 2: Review video list and select target
- [ ] Step 3: Get transcript for selected video
- [ ] Step 4: Analyze transcript and create summary
- [ ] Step 5: Generate Markdown report
```
**Step 1: Fetch recent videos**
Run `python scripts/fetch_videos.py --days 7` to get videos from the past week.
**Step 2: Review and select**
Check `data/videos.json` for available videos. Select one for analysis.
**Step 3: Get transcript**
Run `python scripts/get_transcript.py --video-id {ID}` to download subtitles.
**Step 4: Analyze and summarize**
Read the transcript file and create a concise summary covering:
- Main topics discussed
- Key insights and takeaways
- Notable timestamps
**Step 5: Generate report**
Run `python scripts/generate_report.py --video-id {ID} --summary "..."` to create the final Markdown report.
## Output Format
```markdown
# [Video Title]

## Video Info
- Channel: [Name]
- Published: [Date]
- Duration: [Length]
- Link: [URL]
## Summary
[AI-generated summary of content]
## Transcript
[Timestamped transcript excerpt]
```
## Scripts Reference
| Script | Purpose | Output |
|--------|---------|--------|
| `fetch_videos.py` | Fetch channel videos | `data/videos.json` |
| `get_transcript.py` | Download subtitles | `data/transcript_*.txt/json` |
| `generate_report.py` | Create Markdown report | `data/output/*/report.md` |
## Error Handling
**No transcript available**: Some videos lack subtitles. Check if auto-generated captions exist.
**Rate limiting**: Add delays between requests if fetching many channels.
**Network issues**: Retry with `--days 1` for fewer results.
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### scripts/fetch_videos.py
```python
#!/usr/bin/env python3
"""获取关注频道的最新视频列表"""
import json
import subprocess
from datetime import datetime
from pathlib import Path
import argparse
DATA_DIR = Path(__file__).parent.parent / "data"
CHANNELS_FILE = DATA_DIR / "channels.json"
OUTPUT_FILE = DATA_DIR / "videos.json"
def load_channels():
if not CHANNELS_FILE.exists():
return []
with open(CHANNELS_FILE) as f:
return json.load(f).get("channels", [])
def fetch_channel_videos(channel_id, days=1): # noqa: ARG001
"""使用 yt-dlp 获取频道视频 (days 参数预留用于时间过滤)"""
cmd = [
"yt-dlp", "--flat-playlist", "--dump-json",
f"https://www.youtube.com/channel/{channel_id}/videos"
]
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=60)
videos = []
for line in result.stdout.strip().split('\n'):
if not line:
continue
video = json.loads(line)
videos.append({
"id": video.get("id"),
"title": video.get("title"),
"url": f"https://www.youtube.com/watch?v={video.get('id')}",
"channel_id": channel_id
})
if len(videos) >= 10: # 限制每个频道最多10个
break
return videos
except Exception as e:
print(f"Error fetching {channel_id}: {e}")
return []
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--days", type=int, default=1)
parser.add_argument("--keyword", default="AI")
args = parser.parse_args()
channels = load_channels()
if not channels:
print("No channels configured. Edit data/channels.json")
return
all_videos = []
for ch in channels:
print(f"Fetching: {ch['name']}...")
videos = fetch_channel_videos(ch["id"], args.days)
for v in videos:
v["channel_name"] = ch["name"]
all_videos.extend(videos)
# 过滤 AI 相关
keyword = args.keyword.lower()
filtered = [v for v in all_videos if keyword in v.get("title", "").lower()]
OUTPUT_FILE.parent.mkdir(parents=True, exist_ok=True)
with open(OUTPUT_FILE, "w") as f:
json.dump({"videos": filtered, "fetched_at": datetime.now().isoformat()}, f, indent=2, ensure_ascii=False)
print(f"\nFound {len(filtered)} AI-related videos")
for v in filtered:
print(f" - {v['title']} ({v['channel_name']})")
if __name__ == "__main__":
main()
```
### scripts/get_transcript.py
```python
#!/usr/bin/env python3
"""获取视频字幕 - 使用 yt-dlp"""
import json
import argparse
import subprocess
from pathlib import Path
DATA_DIR = Path(__file__).parent.parent / "data"
def get_transcript_ytdlp(video_id):
"""使用 yt-dlp 获取字幕"""
url = f"https://www.youtube.com/watch?v={video_id}"
output_template = str(DATA_DIR / f"sub_{video_id}")
# 尝试获取字幕
cmd = [
"yt-dlp", "--skip-download",
"--write-auto-sub", "--write-sub",
"--sub-lang", "en,zh",
"--sub-format", "vtt",
"-o", output_template,
url
]
subprocess.run(cmd, capture_output=True)
# 查找生成的字幕文件
for suffix in [".en.vtt", ".zh.vtt", ".en-orig.vtt"]:
sub_file = DATA_DIR / f"sub_{video_id}{suffix}"
if sub_file.exists():
return parse_vtt(sub_file), suffix.split('.')[1]
return None, None
def parse_vtt(vtt_file):
"""解析 VTT 字幕文件"""
content = vtt_file.read_text(encoding="utf-8")
lines = content.split('\n')
transcript = []
i = 0
while i < len(lines):
line = lines[i].strip()
# 查找时间戳行 (00:00:00.000 --> 00:00:00.000)
if '-->' in line:
parts = line.split('-->')
start_time = parts[0].strip()
# 解析时间
time_parts = start_time.replace(',', '.').split(':')
if len(time_parts) == 3:
h, m, s = time_parts
start_seconds = int(h) * 3600 + int(m) * 60 + float(s.split('.')[0])
else:
start_seconds = 0
# 获取字幕文本
i += 1
text_lines = []
while i < len(lines) and lines[i].strip() and '-->' not in lines[i]:
text = lines[i].strip()
# 移除 VTT 标签
if not text.isdigit() and '<' not in text:
text_lines.append(text)
i += 1
if text_lines:
transcript.append({"start": start_seconds, "text": ' '.join(text_lines)})
else:
i += 1
return transcript
def format_transcript(transcript):
"""格式化字幕为文本"""
lines = []
for entry in transcript:
start = int(entry["start"])
mins, secs = divmod(start, 60)
text = entry["text"]
lines.append(f"[{mins:02d}:{secs:02d}] {text}")
return "\n".join(lines)
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--video-id", required=True)
parser.add_argument("--output", help="输出文件路径")
args = parser.parse_args()
print(f"获取字幕: {args.video_id}")
transcript, lang = get_transcript_ytdlp(args.video_id)
if not transcript:
print("无法获取字幕")
return
formatted = format_transcript(transcript)
print(f"字幕语言: {lang}")
print(f"字幕条数: {len(transcript)}")
output_file = DATA_DIR / f"transcript_{args.video_id}.txt"
output_file.write_text(formatted, encoding="utf-8")
print(f"已保存到: {output_file}")
# JSON 格式
json_file = DATA_DIR / f"transcript_{args.video_id}.json"
with open(json_file, "w", encoding="utf-8") as f:
json.dump(transcript, f, ensure_ascii=False, indent=2)
if __name__ == "__main__":
main()
```
### scripts/generate_report.py
```python
#!/usr/bin/env python3
"""生成 Markdown 报告"""
import json
import argparse
import subprocess
from datetime import datetime
from pathlib import Path
DATA_DIR = Path(__file__).parent.parent / "data"
OUTPUT_DIR = DATA_DIR / "output"
def get_video_info(video_id):
"""使用 yt-dlp 获取视频信息"""
cmd = ["yt-dlp", "--dump-json", "--no-download", f"https://www.youtube.com/watch?v={video_id}"]
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
return json.loads(result.stdout)
except (subprocess.TimeoutExpired, json.JSONDecodeError, FileNotFoundError) as e:
print(f"Error fetching video info: {e}")
return {}
def download_thumbnail(video_id, output_path):
"""下载视频封面"""
cmd = ["yt-dlp", "--write-thumbnail", "--skip-download", "-o", str(output_path / "thumbnail"),
f"https://www.youtube.com/watch?v={video_id}"]
subprocess.run(cmd, capture_output=True)
def generate_markdown(video_id, info, transcript_file, screenshots=None, summary=None):
"""生成 Markdown 报告"""
title = info.get("title", "Unknown")
channel = info.get("channel", "Unknown")
upload_date = info.get("upload_date", "")
duration = info.get("duration_string", "")
url = f"https://www.youtube.com/watch?v={video_id}"
md = f"""# {title}

## 视频信息
- 频道: {channel}
- 发布时间: {upload_date}
- 时长: {duration}
- 链接: {url}
## 内容摘要
{summary or "[请使用 Claude 根据字幕生成摘要]"}
"""
# 添加字幕内容
if transcript_file and Path(transcript_file).exists():
md += "## 字幕内容\n\n"
md += "```\n"
md += Path(transcript_file).read_text()[:3000] # 限制长度
md += "\n```\n\n"
# 添加截图
if screenshots:
md += "## 关键截图\n\n"
for i, ss in enumerate(screenshots, 1):
md += f"\n\n"
return md
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--video-id", required=True)
parser.add_argument("--output", default=str(OUTPUT_DIR))
parser.add_argument("--summary", help="摘要内容")
args = parser.parse_args()
output_dir = Path(args.output) / args.video_id
output_dir.mkdir(parents=True, exist_ok=True)
print(f"获取视频信息: {args.video_id}")
info = get_video_info(args.video_id)
print("下载封面...")
download_thumbnail(args.video_id, output_dir)
transcript_file = DATA_DIR / f"transcript_{args.video_id}.txt"
md = generate_markdown(args.video_id, info, transcript_file, summary=args.summary)
report_file = output_dir / "report.md"
report_file.write_text(md, encoding="utf-8")
print(f"报告已生成: {report_file}")
if __name__ == "__main__":
main()
```
### data/channels.json
```json
{
"channels": [
{"name": "Two Minute Papers", "id": "UCbfYPyITQ-7l4upoX8nvctg"},
{"name": "Yannic Kilcher", "id": "UCZHmQk67mN31gbHey6BVyNw"},
{"name": "AI Explained", "id": "UCNJ1Ymd5yFuUPtn21xtRbbw"}
]
}
```