gemini-video-analyzer
Native video analysis using Google Gemini API. Upload and analyze video files — describe scenes, extract text/UI, answer questions about content, transcribe speech, identify objects and actions. Use when: (1) User sends a video file and wants it analyzed, (2) Video summarization or description needed, (3) Extracting text, UI elements, or information from screen recordings, (4) Answering questions about video content, (5) Comparing multiple videos, (6) Analyzing tutorials, demos, or walkthroughs.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install openclaw-skills-a6-gemini-video-analyzer
Repository
Skill path: skills/aiwithabidi/a6-gemini-video-analyzer
Native video analysis using Google Gemini API. Upload and analyze video files — describe scenes, extract text/UI, answer questions about content, transcribe speech, identify objects and actions. Use when: (1) User sends a video file and wants it analyzed, (2) Video summarization or description needed, (3) Extracting text, UI elements, or information from screen recordings, (4) Answering questions about video content, (5) Comparing multiple videos, (6) Analyzing tutorials, demos, or walkthroughs.
Open repositoryBest for
Primary workflow: Write Technical Docs.
Technical facets: Full Stack, Frontend, Backend, Tech Writer.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: openclaw.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install gemini-video-analyzer into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/openclaw/skills before adding gemini-video-analyzer to shared team environments
- Use gemini-video-analyzer for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: gemini-video-analyzer
description: |
Native video analysis using Google Gemini API. Upload and analyze video files — describe scenes, extract text/UI, answer questions about content, transcribe speech, identify objects and actions. Use when: (1) User sends a video file and wants it analyzed, (2) Video summarization or description needed, (3) Extracting text, UI elements, or information from screen recordings, (4) Answering questions about video content, (5) Comparing multiple videos, (6) Analyzing tutorials, demos, or walkthroughs.
homepage: https://www.agxntsix.ai
metadata:
{
"openclaw":
{
"emoji": "🎬",
"requires": { "bins": ["python3", "curl"], "env": ["GOOGLE_AI_API_KEY"] },
"primaryEnv": "GOOGLE_AI_API_KEY",
},
}
---
# Gemini Video Analyzer
Analyze videos natively using Google Gemini's multimodal API. No frame extraction needed — Gemini processes video at 1 FPS with full motion, audio, and visual understanding.
## Quick Start
```bash
# Analyze a video with default prompt (full description)
GOOGLE_AI_API_KEY=$GOOGLE_AI_API_KEY python3 {baseDir}/scripts/analyze.py /path/to/video.mp4
# Ask a specific question
GOOGLE_AI_API_KEY=$GOOGLE_AI_API_KEY python3 {baseDir}/scripts/analyze.py /path/to/video.mp4 "What text is visible on screen?"
# Manage uploaded files
GOOGLE_AI_API_KEY=$GOOGLE_AI_API_KEY python3 {baseDir}/scripts/manage_files.py list
GOOGLE_AI_API_KEY=$GOOGLE_AI_API_KEY python3 {baseDir}/scripts/manage_files.py cleanup
```
## Supported Formats
MP4, AVI, MOV, MKV, WebM, FLV, MPEG, MPG, WMV, 3GP — up to 2GB per file.
## How It Works
1. Video uploads to Google's Files API (temporary, auto-deletes after 48h)
2. Gemini processes at 1 frame/sec — understands motion, transitions, audio context
3. Model generates response based on your prompt
4. Way better than frame extraction for understanding temporal content
## Use Cases
| Task | Example Prompt |
|------|---------------|
| General description | *(default — no prompt needed)* |
| UI/text extraction | `"What text and UI elements are visible?"` |
| Tutorial summary | `"Summarize the steps shown in this tutorial"` |
| Bug report from video | `"Describe what went wrong in this screen recording"` |
| Meeting notes | `"Summarize the key points discussed"` |
| Content comparison | Upload 2 videos, ask for differences |
## Configuration
Set `GOOGLE_AI_API_KEY` in your environment or `.env` file. Get a free key at [aistudio.google.com](https://aistudio.google.com/apikey).
Default model: `gemini-2.5-flash` (fast, cheap, excellent vision). Override with `--model gemini-2.5-pro` for complex analysis.
## API Reference
See [references/gemini-files-api.md](references/gemini-files-api.md) for file upload limits, processing details, and advanced options.
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### references/gemini-files-api.md
```markdown
# Gemini Files API Reference
## Upload Limits
- Max file size: 2GB per video
- Project quota: 20GB total storage
- Storage duration: 48 hours (auto-deleted)
- Processing rate: 1 frame per second
## Supported Video Formats
MP4, AVI, MOV, MKV, WebM, FLV, MPEG, MPG, WMV, 3GP
## Processing
- Videos are processed server-side at 1 FPS
- Small videos (<100MB) can be sent inline
- Larger videos use resumable upload via Files API
- Same file URI can be reused across multiple prompts (within 48h)
## Models
| Model | Context | Cost (in/out per 1M) | Best For |
|-------|---------|---------------------|----------|
| gemini-2.5-flash | 1M tokens | $0.30/$2.50 | Fast, cheap, daily use |
| gemini-2.5-pro | 1M tokens | $1.25/$10.00 | Complex analysis |
| gemini-3-flash-preview | 1M tokens | $0.50/$3.00 | Latest vision |
## Token Usage
- Video: ~258 tokens per second of content
- 1 minute video ≈ 15,480 tokens
- 1 hour video ≈ 928,800 tokens (fits in 1M context)
## Tips
- Reuse file URIs to avoid re-uploading the same video
- Use `manage_files.py cleanup` to free quota when done
- For batch analysis, upload all videos first, then query
```
---
## Skill Companion Files
> Additional files collected from the skill directory layout.
### _meta.json
```json
{
"owner": "aiwithabidi",
"slug": "a6-gemini-video-analyzer",
"displayName": "Gemini Video Analyzer",
"latest": {
"version": "1.0.0",
"publishedAt": 1771250411910,
"commit": "https://github.com/openclaw/skills/commit/c5a173b2af3928a925ecde41a385f9a0e70f2099"
},
"history": []
}
```
### scripts/analyze.py
```python
#!/usr/bin/env python3
"""
Analyze video using Google Gemini API (native video understanding).
Uploads video to Gemini Files API, then queries the model.
Usage:
python3 analyze.py /path/to/video.mp4 "What's happening?"
python3 analyze.py /path/to/video.mp4 # default: full description
python3 analyze.py /path/to/video.mp4 "prompt" --model gemini-2.5-pro
"""
import sys, os, json, time, mimetypes, argparse
import urllib.request, urllib.error
GOOGLE_API_KEY = os.environ.get("GOOGLE_AI_API_KEY", "")
DEFAULT_MODEL = "gemini-2.5-flash"
BASE_URL = "https://generativelanguage.googleapis.com"
DEFAULT_PROMPT = (
"Describe what's happening in this video in detail. "
"Include any text, UI elements, spoken words, or important visual information."
)
def upload_file(filepath):
"""Upload video to Gemini Files API (resumable upload)."""
filesize = os.path.getsize(filepath)
mime_type = mimetypes.guess_type(filepath)[0] or "video/mp4"
display_name = os.path.basename(filepath)
# Initiate resumable upload
headers = {
"X-Goog-Upload-Protocol": "resumable",
"X-Goog-Upload-Command": "start",
"X-Goog-Upload-Header-Content-Length": str(filesize),
"X-Goog-Upload-Header-Content-Type": mime_type,
"Content-Type": "application/json",
}
metadata = json.dumps({"file": {"display_name": display_name}}).encode()
req = urllib.request.Request(
f"{BASE_URL}/upload/v1beta/files?key={GOOGLE_API_KEY}",
data=metadata, headers=headers, method="POST"
)
with urllib.request.urlopen(req) as resp:
upload_url = resp.headers.get("X-Goog-Upload-URL")
if not upload_url:
raise Exception("Failed to get upload URL")
# Upload bytes
with open(filepath, "rb") as f:
file_data = f.read()
req2 = urllib.request.Request(
upload_url, data=file_data,
headers={
"X-Goog-Upload-Offset": "0",
"X-Goog-Upload-Command": "upload, finalize",
"Content-Length": str(filesize),
},
method="PUT"
)
with urllib.request.urlopen(req2) as resp:
result = json.loads(resp.read())
file_uri = result.get("file", {}).get("uri", "")
file_name = result.get("file", {}).get("name", "")
state = result.get("file", {}).get("state", "")
print(f"[video] Uploaded: {display_name} ({filesize:,} bytes)", file=sys.stderr)
print(f"[video] State: {state}", file=sys.stderr)
# Wait for processing if needed
if state == "PROCESSING":
print("[video] Processing...", file=sys.stderr)
for i in range(120):
time.sleep(5)
check_req = urllib.request.Request(
f"{BASE_URL}/v1beta/{file_name}?key={GOOGLE_API_KEY}"
)
with urllib.request.urlopen(check_req) as resp:
status = json.loads(resp.read())
state = status.get("state", "")
if state == "ACTIVE":
print("[video] Ready.", file=sys.stderr)
break
elif state == "FAILED":
raise Exception(f"Processing failed: {json.dumps(status)}")
return file_uri, mime_type, file_name
def analyze(file_uri, mime_type, prompt, model=DEFAULT_MODEL):
"""Send video to Gemini for analysis."""
payload = {
"contents": [{
"parts": [
{"file_data": {"mime_type": mime_type, "file_uri": file_uri}},
{"text": prompt}
]
}],
"generationConfig": {"temperature": 0.4, "maxOutputTokens": 8192}
}
req = urllib.request.Request(
f"{BASE_URL}/v1beta/models/{model}:generateContent?key={GOOGLE_API_KEY}",
data=json.dumps(payload).encode(),
headers={"Content-Type": "application/json"},
method="POST"
)
with urllib.request.urlopen(req, timeout=180) as resp:
result = json.loads(resp.read())
candidates = result.get("candidates", [])
if candidates:
parts = candidates[0].get("content", {}).get("parts", [])
return "\n".join(p.get("text", "") for p in parts if "text" in p)
return f"No response. Raw: {json.dumps(result)}"
def main():
parser = argparse.ArgumentParser(description="Analyze video with Gemini")
parser.add_argument("video", help="Path to video file")
parser.add_argument("prompt", nargs="?", default=DEFAULT_PROMPT, help="Question about the video")
parser.add_argument("--model", default=DEFAULT_MODEL, help=f"Gemini model (default: {DEFAULT_MODEL})")
parser.add_argument("--json", action="store_true", help="Output as JSON")
args = parser.parse_args()
if not GOOGLE_API_KEY:
print("Error: Set GOOGLE_AI_API_KEY environment variable", file=sys.stderr)
sys.exit(1)
if not os.path.exists(args.video):
print(f"Error: File not found: {args.video}", file=sys.stderr)
sys.exit(1)
file_uri, mime_type, file_name = upload_file(args.video)
result = analyze(file_uri, mime_type, args.prompt, args.model)
if args.json:
print(json.dumps({"model": args.model, "prompt": args.prompt, "response": result}))
else:
print(result)
if __name__ == "__main__":
main()
```
### scripts/manage_files.py
```python
#!/usr/bin/env python3
"""
Manage files in Google Gemini Files API.
List, inspect, and clean up uploaded video files.
Usage:
python3 manage_files.py list # List all uploaded files
python3 manage_files.py cleanup # Delete all uploaded files
python3 manage_files.py delete <name> # Delete a specific file
"""
import sys, os, json
import urllib.request
GOOGLE_API_KEY = os.environ.get("GOOGLE_AI_API_KEY", "")
BASE_URL = "https://generativelanguage.googleapis.com"
def list_files():
req = urllib.request.Request(f"{BASE_URL}/v1beta/files?key={GOOGLE_API_KEY}")
with urllib.request.urlopen(req) as resp:
data = json.loads(resp.read())
files = data.get("files", [])
if not files:
print("No files uploaded.")
return
for f in files:
size = int(f.get("sizeBytes", 0))
print(f" {f.get('name', '?'):40s} {f.get('displayName', '?'):30s} {size:>12,} bytes {f.get('state', '?')}")
print(f"\nTotal: {len(files)} files")
def delete_file(name):
req = urllib.request.Request(
f"{BASE_URL}/v1beta/{name}?key={GOOGLE_API_KEY}",
method="DELETE"
)
urllib.request.urlopen(req)
print(f"Deleted: {name}")
def cleanup():
req = urllib.request.Request(f"{BASE_URL}/v1beta/files?key={GOOGLE_API_KEY}")
with urllib.request.urlopen(req) as resp:
data = json.loads(resp.read())
files = data.get("files", [])
if not files:
print("No files to clean up.")
return
for f in files:
try:
delete_file(f["name"])
except Exception as e:
print(f"Failed to delete {f['name']}: {e}")
print(f"\nCleaned up {len(files)} files.")
if __name__ == "__main__":
if not GOOGLE_API_KEY:
print("Error: Set GOOGLE_AI_API_KEY", file=sys.stderr)
sys.exit(1)
cmd = sys.argv[1] if len(sys.argv) > 1 else "list"
if cmd == "list":
list_files()
elif cmd == "cleanup":
cleanup()
elif cmd == "delete" and len(sys.argv) > 2:
delete_file(sys.argv[2])
else:
print("Usage: manage_files.py [list|cleanup|delete <name>]")
```