Back to skills
SkillHub ClubWrite Technical DocsFull StackFrontendBackend

gemini-video-analyzer

Native video analysis using Google Gemini API. Upload and analyze video files — describe scenes, extract text/UI, answer questions about content, transcribe speech, identify objects and actions. Use when: (1) User sends a video file and wants it analyzed, (2) Video summarization or description needed, (3) Extracting text, UI elements, or information from screen recordings, (4) Answering questions about video content, (5) Comparing multiple videos, (6) Analyzing tutorials, demos, or walkthroughs.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
3,129
Hot score
99
Updated
March 20, 2026
Overall rating
C4.0
Composite score
4.0
Best-practice grade
A92.0

Install command

npx @skill-hub/cli install openclaw-skills-a6-gemini-video-analyzer

Repository

openclaw/skills

Skill path: skills/aiwithabidi/a6-gemini-video-analyzer

Native video analysis using Google Gemini API. Upload and analyze video files — describe scenes, extract text/UI, answer questions about content, transcribe speech, identify objects and actions. Use when: (1) User sends a video file and wants it analyzed, (2) Video summarization or description needed, (3) Extracting text, UI elements, or information from screen recordings, (4) Answering questions about video content, (5) Comparing multiple videos, (6) Analyzing tutorials, demos, or walkthroughs.

Open repository

Best for

Primary workflow: Write Technical Docs.

Technical facets: Full Stack, Frontend, Backend, Tech Writer.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install gemini-video-analyzer into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/openclaw/skills before adding gemini-video-analyzer to shared team environments
  • Use gemini-video-analyzer for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: gemini-video-analyzer
description: |
  Native video analysis using Google Gemini API. Upload and analyze video files — describe scenes, extract text/UI, answer questions about content, transcribe speech, identify objects and actions. Use when: (1) User sends a video file and wants it analyzed, (2) Video summarization or description needed, (3) Extracting text, UI elements, or information from screen recordings, (4) Answering questions about video content, (5) Comparing multiple videos, (6) Analyzing tutorials, demos, or walkthroughs.
homepage: https://www.agxntsix.ai
metadata:
  {
    "openclaw":
      {
        "emoji": "🎬",
        "requires": { "bins": ["python3", "curl"], "env": ["GOOGLE_AI_API_KEY"] },
        "primaryEnv": "GOOGLE_AI_API_KEY",
      },
  }
---

# Gemini Video Analyzer

Analyze videos natively using Google Gemini's multimodal API. No frame extraction needed — Gemini processes video at 1 FPS with full motion, audio, and visual understanding.

## Quick Start

```bash
# Analyze a video with default prompt (full description)
GOOGLE_AI_API_KEY=$GOOGLE_AI_API_KEY python3 {baseDir}/scripts/analyze.py /path/to/video.mp4

# Ask a specific question
GOOGLE_AI_API_KEY=$GOOGLE_AI_API_KEY python3 {baseDir}/scripts/analyze.py /path/to/video.mp4 "What text is visible on screen?"

# Manage uploaded files
GOOGLE_AI_API_KEY=$GOOGLE_AI_API_KEY python3 {baseDir}/scripts/manage_files.py list
GOOGLE_AI_API_KEY=$GOOGLE_AI_API_KEY python3 {baseDir}/scripts/manage_files.py cleanup
```

## Supported Formats

MP4, AVI, MOV, MKV, WebM, FLV, MPEG, MPG, WMV, 3GP — up to 2GB per file.

## How It Works

1. Video uploads to Google's Files API (temporary, auto-deletes after 48h)
2. Gemini processes at 1 frame/sec — understands motion, transitions, audio context
3. Model generates response based on your prompt
4. Way better than frame extraction for understanding temporal content

## Use Cases

| Task | Example Prompt |
|------|---------------|
| General description | *(default — no prompt needed)* |
| UI/text extraction | `"What text and UI elements are visible?"` |
| Tutorial summary | `"Summarize the steps shown in this tutorial"` |
| Bug report from video | `"Describe what went wrong in this screen recording"` |
| Meeting notes | `"Summarize the key points discussed"` |
| Content comparison | Upload 2 videos, ask for differences |

## Configuration

Set `GOOGLE_AI_API_KEY` in your environment or `.env` file. Get a free key at [aistudio.google.com](https://aistudio.google.com/apikey).

Default model: `gemini-2.5-flash` (fast, cheap, excellent vision). Override with `--model gemini-2.5-pro` for complex analysis.

## API Reference

See [references/gemini-files-api.md](references/gemini-files-api.md) for file upload limits, processing details, and advanced options.


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### references/gemini-files-api.md

```markdown
# Gemini Files API Reference

## Upload Limits
- Max file size: 2GB per video
- Project quota: 20GB total storage
- Storage duration: 48 hours (auto-deleted)
- Processing rate: 1 frame per second

## Supported Video Formats
MP4, AVI, MOV, MKV, WebM, FLV, MPEG, MPG, WMV, 3GP

## Processing
- Videos are processed server-side at 1 FPS
- Small videos (<100MB) can be sent inline
- Larger videos use resumable upload via Files API
- Same file URI can be reused across multiple prompts (within 48h)

## Models
| Model | Context | Cost (in/out per 1M) | Best For |
|-------|---------|---------------------|----------|
| gemini-2.5-flash | 1M tokens | $0.30/$2.50 | Fast, cheap, daily use |
| gemini-2.5-pro | 1M tokens | $1.25/$10.00 | Complex analysis |
| gemini-3-flash-preview | 1M tokens | $0.50/$3.00 | Latest vision |

## Token Usage
- Video: ~258 tokens per second of content
- 1 minute video ≈ 15,480 tokens
- 1 hour video ≈ 928,800 tokens (fits in 1M context)

## Tips
- Reuse file URIs to avoid re-uploading the same video
- Use `manage_files.py cleanup` to free quota when done
- For batch analysis, upload all videos first, then query

```



---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### _meta.json

```json
{
  "owner": "aiwithabidi",
  "slug": "a6-gemini-video-analyzer",
  "displayName": "Gemini Video Analyzer",
  "latest": {
    "version": "1.0.0",
    "publishedAt": 1771250411910,
    "commit": "https://github.com/openclaw/skills/commit/c5a173b2af3928a925ecde41a385f9a0e70f2099"
  },
  "history": []
}

```

### scripts/analyze.py

```python
#!/usr/bin/env python3
"""
Analyze video using Google Gemini API (native video understanding).
Uploads video to Gemini Files API, then queries the model.

Usage:
    python3 analyze.py /path/to/video.mp4 "What's happening?"
    python3 analyze.py /path/to/video.mp4  # default: full description
    python3 analyze.py /path/to/video.mp4 "prompt" --model gemini-2.5-pro
"""
import sys, os, json, time, mimetypes, argparse
import urllib.request, urllib.error

GOOGLE_API_KEY = os.environ.get("GOOGLE_AI_API_KEY", "")
DEFAULT_MODEL = "gemini-2.5-flash"
BASE_URL = "https://generativelanguage.googleapis.com"

DEFAULT_PROMPT = (
    "Describe what's happening in this video in detail. "
    "Include any text, UI elements, spoken words, or important visual information."
)


def upload_file(filepath):
    """Upload video to Gemini Files API (resumable upload)."""
    filesize = os.path.getsize(filepath)
    mime_type = mimetypes.guess_type(filepath)[0] or "video/mp4"
    display_name = os.path.basename(filepath)

    # Initiate resumable upload
    headers = {
        "X-Goog-Upload-Protocol": "resumable",
        "X-Goog-Upload-Command": "start",
        "X-Goog-Upload-Header-Content-Length": str(filesize),
        "X-Goog-Upload-Header-Content-Type": mime_type,
        "Content-Type": "application/json",
    }
    metadata = json.dumps({"file": {"display_name": display_name}}).encode()

    req = urllib.request.Request(
        f"{BASE_URL}/upload/v1beta/files?key={GOOGLE_API_KEY}",
        data=metadata, headers=headers, method="POST"
    )
    with urllib.request.urlopen(req) as resp:
        upload_url = resp.headers.get("X-Goog-Upload-URL")

    if not upload_url:
        raise Exception("Failed to get upload URL")

    # Upload bytes
    with open(filepath, "rb") as f:
        file_data = f.read()

    req2 = urllib.request.Request(
        upload_url, data=file_data,
        headers={
            "X-Goog-Upload-Offset": "0",
            "X-Goog-Upload-Command": "upload, finalize",
            "Content-Length": str(filesize),
        },
        method="PUT"
    )
    with urllib.request.urlopen(req2) as resp:
        result = json.loads(resp.read())

    file_uri = result.get("file", {}).get("uri", "")
    file_name = result.get("file", {}).get("name", "")
    state = result.get("file", {}).get("state", "")

    print(f"[video] Uploaded: {display_name} ({filesize:,} bytes)", file=sys.stderr)
    print(f"[video] State: {state}", file=sys.stderr)

    # Wait for processing if needed
    if state == "PROCESSING":
        print("[video] Processing...", file=sys.stderr)
        for i in range(120):
            time.sleep(5)
            check_req = urllib.request.Request(
                f"{BASE_URL}/v1beta/{file_name}?key={GOOGLE_API_KEY}"
            )
            with urllib.request.urlopen(check_req) as resp:
                status = json.loads(resp.read())
            state = status.get("state", "")
            if state == "ACTIVE":
                print("[video] Ready.", file=sys.stderr)
                break
            elif state == "FAILED":
                raise Exception(f"Processing failed: {json.dumps(status)}")

    return file_uri, mime_type, file_name


def analyze(file_uri, mime_type, prompt, model=DEFAULT_MODEL):
    """Send video to Gemini for analysis."""
    payload = {
        "contents": [{
            "parts": [
                {"file_data": {"mime_type": mime_type, "file_uri": file_uri}},
                {"text": prompt}
            ]
        }],
        "generationConfig": {"temperature": 0.4, "maxOutputTokens": 8192}
    }

    req = urllib.request.Request(
        f"{BASE_URL}/v1beta/models/{model}:generateContent?key={GOOGLE_API_KEY}",
        data=json.dumps(payload).encode(),
        headers={"Content-Type": "application/json"},
        method="POST"
    )

    with urllib.request.urlopen(req, timeout=180) as resp:
        result = json.loads(resp.read())

    candidates = result.get("candidates", [])
    if candidates:
        parts = candidates[0].get("content", {}).get("parts", [])
        return "\n".join(p.get("text", "") for p in parts if "text" in p)

    return f"No response. Raw: {json.dumps(result)}"


def main():
    parser = argparse.ArgumentParser(description="Analyze video with Gemini")
    parser.add_argument("video", help="Path to video file")
    parser.add_argument("prompt", nargs="?", default=DEFAULT_PROMPT, help="Question about the video")
    parser.add_argument("--model", default=DEFAULT_MODEL, help=f"Gemini model (default: {DEFAULT_MODEL})")
    parser.add_argument("--json", action="store_true", help="Output as JSON")
    args = parser.parse_args()

    if not GOOGLE_API_KEY:
        print("Error: Set GOOGLE_AI_API_KEY environment variable", file=sys.stderr)
        sys.exit(1)

    if not os.path.exists(args.video):
        print(f"Error: File not found: {args.video}", file=sys.stderr)
        sys.exit(1)

    file_uri, mime_type, file_name = upload_file(args.video)
    result = analyze(file_uri, mime_type, args.prompt, args.model)

    if args.json:
        print(json.dumps({"model": args.model, "prompt": args.prompt, "response": result}))
    else:
        print(result)


if __name__ == "__main__":
    main()

```

### scripts/manage_files.py

```python
#!/usr/bin/env python3
"""
Manage files in Google Gemini Files API.
List, inspect, and clean up uploaded video files.

Usage:
    python3 manage_files.py list          # List all uploaded files
    python3 manage_files.py cleanup       # Delete all uploaded files
    python3 manage_files.py delete <name> # Delete a specific file
"""
import sys, os, json
import urllib.request

GOOGLE_API_KEY = os.environ.get("GOOGLE_AI_API_KEY", "")
BASE_URL = "https://generativelanguage.googleapis.com"


def list_files():
    req = urllib.request.Request(f"{BASE_URL}/v1beta/files?key={GOOGLE_API_KEY}")
    with urllib.request.urlopen(req) as resp:
        data = json.loads(resp.read())
    files = data.get("files", [])
    if not files:
        print("No files uploaded.")
        return
    for f in files:
        size = int(f.get("sizeBytes", 0))
        print(f"  {f.get('name', '?'):40s}  {f.get('displayName', '?'):30s}  {size:>12,} bytes  {f.get('state', '?')}")
    print(f"\nTotal: {len(files)} files")


def delete_file(name):
    req = urllib.request.Request(
        f"{BASE_URL}/v1beta/{name}?key={GOOGLE_API_KEY}",
        method="DELETE"
    )
    urllib.request.urlopen(req)
    print(f"Deleted: {name}")


def cleanup():
    req = urllib.request.Request(f"{BASE_URL}/v1beta/files?key={GOOGLE_API_KEY}")
    with urllib.request.urlopen(req) as resp:
        data = json.loads(resp.read())
    files = data.get("files", [])
    if not files:
        print("No files to clean up.")
        return
    for f in files:
        try:
            delete_file(f["name"])
        except Exception as e:
            print(f"Failed to delete {f['name']}: {e}")
    print(f"\nCleaned up {len(files)} files.")


if __name__ == "__main__":
    if not GOOGLE_API_KEY:
        print("Error: Set GOOGLE_AI_API_KEY", file=sys.stderr)
        sys.exit(1)

    cmd = sys.argv[1] if len(sys.argv) > 1 else "list"
    if cmd == "list":
        list_files()
    elif cmd == "cleanup":
        cleanup()
    elif cmd == "delete" and len(sys.argv) > 2:
        delete_file(sys.argv[2])
    else:
        print("Usage: manage_files.py [list|cleanup|delete <name>]")

```