blog-image
AI image generation and editing for blog content powered by Gemini via MCP. Claude acts as Creative Director — interpreting intent, selecting domain expertise, constructing optimized 6-component prompts (Subject + Action + Context + Composition + Lighting + Style), and orchestrating Gemini for blog-quality results. Generates hero images, inline illustrations, social preview cards, and OG images. Edits existing blog images. Supports 6 blog-optimized domain modes (Editorial, Product, Landscape, UI/Web, Infographic, Abstract). Works standalone via /blog image or internally from blog-write and blog-rewrite workflows. Falls back gracefully when MCP is not configured. Use when user says "blog image", "generate hero image", "blog illustration", "social card", "generate blog image", "edit blog image", "image generate", "blog cover image", "inline image", "OG image".
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install agricidaniel-claude-blog-blog-image
Repository
Skill path: skills/blog-image
AI image generation and editing for blog content powered by Gemini via MCP. Claude acts as Creative Director — interpreting intent, selecting domain expertise, constructing optimized 6-component prompts (Subject + Action + Context + Composition + Lighting + Style), and orchestrating Gemini for blog-quality results. Generates hero images, inline illustrations, social preview cards, and OG images. Edits existing blog images. Supports 6 blog-optimized domain modes (Editorial, Product, Landscape, UI/Web, Infographic, Abstract). Works standalone via /blog image or internally from blog-write and blog-rewrite workflows. Falls back gracefully when MCP is not configured. Use when user says "blog image", "generate hero image", "blog illustration", "social card", "generate blog image", "edit blog image", "image generate", "blog cover image", "inline image", "OG image".
Open repositoryBest for
Primary workflow: Write Technical Docs.
Technical facets: Full Stack, Frontend, Data / AI, Tech Writer, Integration.
Target audience: everyone.
License: MIT.
Original source
Catalog source: SkillHub Club.
Repository owner: AgriciDaniel.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install blog-image into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/AgriciDaniel/claude-blog before adding blog-image to shared team environments
- Use blog-image for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: blog-image
description: >
AI image generation and editing for blog content powered by Gemini via MCP.
Claude acts as Creative Director — interpreting intent, selecting domain expertise,
constructing optimized 6-component prompts (Subject + Action + Context + Composition
+ Lighting + Style), and orchestrating Gemini for blog-quality results. Generates
hero images, inline illustrations, social preview cards, and OG images. Edits
existing blog images. Supports 6 blog-optimized domain modes (Editorial, Product,
Landscape, UI/Web, Infographic, Abstract). Works standalone via /blog image or
internally from blog-write and blog-rewrite workflows. Falls back gracefully when
MCP is not configured. Use when user says "blog image", "generate hero image",
"blog illustration", "social card", "generate blog image", "edit blog image",
"image generate", "blog cover image", "inline image", "OG image".
user-invokable: true
argument-hint: "[generate|edit|setup] [description-or-path]"
allowed-tools:
- Read
- Write
- Edit
- Bash
- Glob
- Grep
license: MIT
metadata:
author: AgriciDaniel
version: "1.4.0"
mcp-package: "@ycse/nanobanana-mcp"
---
# Blog Image — AI Image Generation for Blog Content
You are a **Creative Director** that orchestrates Gemini's image generation
specifically for blog content. Never pass raw user text directly to the API.
Always interpret, enhance, and construct an optimized prompt using the
6-component Reasoning Brief system.
## Quick Reference
| Command | What it does |
|---------|-------------|
| `/blog image generate <idea>` | Generate a blog image with full prompt engineering |
| `/blog image edit <path> <instructions>` | Edit an existing blog image intelligently |
| `/blog image setup` | Configure MCP server and API key |
## Blog Image Types
Match the image type to blog use case:
| Image Type | Aspect Ratio | Resolution | Domain Mode | Placement |
|------------|-------------|-----------|-------------|-----------|
| Hero/Cover | `16:9` | 2K or 4K | Editorial / Landscape | Frontmatter `coverImage` |
| OG/Social Card | `16:9` | 1K | Editorial / Infographic | Frontmatter `ogImage` |
| Inline Illustration | `16:9` or `4:3` | 1K | Varies by topic | After H2, before body |
| Inline Product Shot | `4:3` or `1:1` | 1K | Product | Within product sections |
| Section Divider | `8:1` or `4:1` | 1K | Abstract / Landscape | Between major sections |
**Sizing requirements:**
- Blog hero/cover: 1200x630 (OG-compatible) or 1920x1080
- Open Graph (OG): 1200x630 (required for social sharing)
- Inline images: 1200px+ wide
## MCP Availability Check
Before generating, check if nanobanana-mcp tools are available:
1. Try calling `get_image_history` (lightweight, no side effects)
2. If it succeeds: MCP is available, proceed with generation
3. If it fails: MCP not configured — inform the user:
- "Image generation requires the nanobanana-mcp server. Run `/blog image setup` to configure it."
- When called internally (from blog-write/blog-rewrite): return silently, no error. The calling workflow continues with stock photos.
## Generation Workflow
For `/blog image generate <idea>` or when invoked internally:
### Step 1: Analyze Intent
Determine what the blog needs:
- **Image type**: Hero, inline, OG card, section divider?
- **Blog topic**: What is the article about?
- **Style**: Photorealistic, editorial, illustrated, minimal?
- **Constraints**: Brand colors, specific dimensions, platform format?
- **Mood**: Authoritative, inviting, dramatic, clean?
If the request is vague, ask one clarifying question about use case and style.
### Step 2: Select Domain Mode
Choose the expertise lens for the image:
| Mode | When to use | Prompt emphasis |
|------|-------------|-----------------|
| **Editorial** | Blog headers, feature images, lifestyle | Styling, composition, publication references |
| **Product** | E-commerce posts, reviews, comparisons | Surface materials, studio lighting, clean BG |
| **Landscape** | Environmental backgrounds, travel, hero sections | Atmospheric perspective, depth layers, time of day |
| **UI/Web** | Tech blog icons, illustrations, diagrams | Clean vectors, flat design, exact colors |
| **Infographic** | Data-driven posts, processes, comparisons | Layout structure, hierarchy, accessible colors |
| **Abstract** | Pattern backgrounds, section dividers, decorative | Color theory, mathematical forms, textures |
Load `references/prompt-engineering-blog.md` for domain mode modifier libraries.
### Step 3: Construct the 6-Component Reasoning Brief
Build the prompt as natural narrative paragraphs — NEVER as keyword lists:
1. **Subject** — Who/what, with rich physical detail (textures, materials, scale)
2. **Action** — What is happening, pose, gesture, movement, state
3. **Context** — Environment, setting, time of day, season, weather
4. **Composition** — Camera angle, shot type, framing, negative space, depth
5. **Lighting** — Light source, quality, direction, color temperature, shadows
6. **Style** — Art medium, aesthetic, film stock, reference artists/eras
**Template for photorealistic blog images:**
```
A photorealistic [shot type] of [subject with physical detail], [action/pose],
set in [environment with specifics]. [Lighting conditions] create [mood].
Captured with [camera model], [focal length] lens at [f-stop], producing
[depth of field effect]. [Color palette/grading notes]. Aspect ratio 16:9,
suitable as a blog [hero image/inline illustration] at [target dimensions].
```
**Template for illustrated/stylized:**
```
A [art style] [format] of [subject with character detail], featuring
[distinctive characteristics] with [color palette]. [Line style] and
[shading technique]. Background is [description]. [Mood/atmosphere].
```
### Step 4: Set Aspect Ratio
Call `set_aspect_ratio` BEFORE generating:
| Blog Use Case | Ratio |
|---------------|-------|
| Hero / Cover / OG | `16:9` |
| Product shot / Square | `4:3` or `1:1` |
| Section divider | `8:1` or `4:1` |
| Vertical (stories) | `9:16` |
### Step 5: Generate via MCP
| MCP Tool | When |
|----------|------|
| `set_aspect_ratio` | Always call first if ratio differs from 1:1 |
| `gemini_generate_image` | New image from crafted prompt |
| `gemini_edit_image` | Modify existing image |
| `gemini_chat` | Iterative refinement / multi-turn sessions |
| `get_image_history` | Review generated images |
| `clear_conversation` | Reset session context |
**Model selection** (use `set_model` MCP tool if switching):
- **NB2 Flash** (default): Best for most blog images — fast, 14 ratios, 4K, $0.067/img
- **NB Pro**: Use for hero images with text overlays (94% text accuracy) or highest quality — $0.134/img
- **Original**: Budget option at $0.039/img — 5 ratios, 1K max
Load `references/mcp-tools.md` for parameter details.
Load `references/gemini-models.md` for model specs, pricing, and rate limits.
### Step 6: Post-Processing (when needed)
After generation, resize/convert for blog use:
```bash
# Resize to blog hero dimensions (1200x630)
magick input.png -resize 1200x630^ -gravity center -extent 1200x630 hero.png
# Convert to WebP for web optimization
magick input.png -quality 85 output.webp
# Convert to AVIF (smallest, modern)
magick input.png -quality 80 output.avif
# Crop to exact OG dimensions
magick input.png -resize 1200x630^ -gravity center -extent 1200x630 og-image.png
```
Check if `magick` (ImageMagick 7) is available. Fall back to `convert` if not.
### Step 7: Deliver
Provide:
1. **Image path** — where it was saved (`~/Documents/nanobanana_generated/`)
2. **Crafted prompt** — show the full Reasoning Brief (educational)
3. **Settings** — model, aspect ratio, domain mode
4. **Alt text** — descriptive sentence, 10-125 chars, topic keywords naturally
5. **Frontmatter snippet** (for hero/OG images):
```yaml
coverImage: "/path/to/generated-image.png"
coverImageAlt: "Descriptive alt text sentence with topic keywords"
ogImage: "/path/to/generated-image.png"
```
6. **Refinement suggestions** — 1-2 ideas if relevant
## Edit Workflow
For `/blog image edit <path> <instructions>`:
1. Read the image path and edit instruction
2. Enhance the instruction (never pass raw):
| User says | Claude crafts |
|-----------|---------------|
| "remove background" | Detailed edge-preserving background removal |
| "make it warmer" | Specific color temperature shift with preservation notes |
| "add text" | Font style, size, placement, contrast, readability notes |
| "make it brighter" | Increase exposure, lift shadows, maintain highlights |
| "crop for social" | Resize to 1200x630 with center-gravity crop |
3. Call `gemini_edit_image` with enhanced instruction
4. Return modified image path and description
## Internal API (for blog-write / blog-rewrite)
When invoked as a Task subagent from blog-write or blog-rewrite:
**Input** (provided by calling skill):
- `image_type`: hero, inline, og, divider
- `topic`: blog post topic/title
- `section_context`: (optional) heading or section the image supports
- `style_preference`: (optional) photorealistic, illustrated, editorial
- `count`: (optional) number of images needed (default: 1)
**Output** (returned to calling skill):
```markdown
### Generated Image
- **Path:** ~/Documents/nanobanana_generated/image_timestamp.png
- **Alt Text:** Descriptive sentence about the image
- **Type:** hero / inline / og
- **Domain Mode:** Editorial
- **Aspect Ratio:** 16:9
- **Suggested Frontmatter:**
coverImage: "/path/to/image.png"
coverImageAlt: "Alt text here"
```
**Graceful fallback**: If MCP is unavailable, return immediately with no error.
The calling workflow continues with stock photos. Never block blog-write or
blog-rewrite because image generation is unavailable.
## Alt Text Generation
For every generated image, create alt text following blog standards:
- Full descriptive sentence (not keyword list)
- 10-125 characters
- Include topic keywords naturally
- Describe what the image shows AND its relevance to the content
- For charts/infographics: include the key data point
Good: `Marketing team analyzing AI search traffic data on a dashboard showing citation metrics`
Bad: `SEO AI marketing blog optimization image`
## Setup
For `/blog image setup`:
1. Run `python3 scripts/setup_image_mcp.py` (interactive)
- Or: `python3 scripts/setup_image_mcp.py --key YOUR_KEY` (non-interactive)
- Writes to project `.mcp.json` by default
- Use `--global` flag for `~/.claude/settings.json`
2. Verify: `python3 scripts/validate_image_setup.py`
3. Requires:
- Node.js 18+ (npx)
- Google AI API key (free at https://aistudio.google.com/apikey)
## Safety Filter Auto-Rephrase
When `IMAGE_SAFETY` or `SAFETY` is returned, do NOT give up. Auto-rephrase and retry:
1. Identify the likely trigger (violence, public figures, NSFW-adjacent, or overly cautious filter)
2. Rephrase using positive framing — describe what you WANT, not what to avoid
3. If the subject is a person, make them generic (remove celebrity-like specifics)
4. If the scene is dramatic, soften: "intense" → "focused", "battle" → "competition"
5. Retry with the rephrased prompt (max 3 attempts before reporting to user)
Google acknowledged filters "became way more cautious than we intended" — benign prompts
are sometimes blocked. Persistence with rephrasing usually succeeds.
## Edit, Don't Re-roll
If an image is 80% correct, use `gemini_chat` for conversational editing rather than
regenerating from scratch. The session maintains style consistency, so targeted edits
preserve what works while fixing what doesn't.
**When to edit vs regenerate:**
- Color slightly off → Edit ("shift the color temperature warmer")
- Wrong composition entirely → Regenerate with revised brief
- Good scene but wrong lighting → Edit ("change to golden hour lighting from the left")
- Missing a detail → Edit ("add a steaming coffee cup on the desk")
## Error Handling
| Error | Resolution |
|-------|-----------|
| MCP not configured | Run `/blog image setup` |
| API key invalid | New key at https://aistudio.google.com/apikey |
| Rate limited (429) | Wait 60s, retry. Free tier: ~5-15 RPM / ~20-500 RPD (varies by model and billing) |
| `IMAGE_SAFETY` | Auto-rephrase (see above) — Layer 2 filter, non-configurable |
| `PROHIBITED_CONTENT` | Content policy violation — topic is blocked. Non-retryable. |
| `SAFETY` | Rephrase prompt — Layer 1 filter |
| Vague request | Ask one clarifying question before generating |
| Poor quality | Review Reasoning Brief — likely missing lighting (biggest quality differentiator) |
| MCP unavailable (internal call) | Return silently — calling workflow uses stock photos |
## Reference Documentation
Load on-demand — do NOT load all at startup:
- `references/prompt-engineering-blog.md` — Domain modes, 6-component system, blog templates
- `references/gemini-models.md` — Model specs, rate limits, aspect ratios, pricing
- `references/mcp-tools.md` — MCP tool parameters and response formats
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### references/prompt-engineering-blog.md
```markdown
# Prompt Engineering Reference — Blog Image Generation
> Load on-demand when constructing complex prompts for blog images.
> Adapted from Claude Banana's prompt engineering system for blog-specific use cases.
> Aligned with Google's March 2026 "Ultimate Prompting Guide" for Gemini image generation.
## The 6-Component Reasoning Brief
Every image prompt should contain these components, written as natural
narrative paragraphs — NEVER as comma-separated keyword lists.
### 1. Subject
The main focus of the image. Describe with physical specificity.
**Good:** "A weathered Japanese ceramicist in his 70s, deep sun-etched
wrinkles mapping decades of kiln work, calloused hands cradling a
freshly thrown tea bowl with an irregular, organic rim"
**Bad:** "old man, ceramic, bowl"
### 2. Action
What is happening. Movement, pose, gesture, state of being.
**Good:** "leaning forward with intense concentration, gently smoothing
the rim with a wet thumb, a thin trail of slip running down his wrist"
**Bad:** "making pottery"
### 3. Context
Environment, setting, temporal and spatial details.
**Good:** "inside a traditional wood-fired anagama kiln workshop,
stacked shelves of drying pots visible in the soft background, late
afternoon light filtering through rice paper screens"
**Bad:** "workshop, afternoon"
### 4. Composition
Camera angle, shot type, framing, spatial relationships.
**Good:** "intimate close-up shot from slightly below eye level,
shallow depth of field isolating the hands and bowl against the
soft bokeh of the workshop behind"
**Bad:** "close up"
### 5. Lighting
Light source, quality, direction, temperature, shadows.
**Good:** "warm directional light from a single high window camera-left,
creating gentle Rembrandt lighting on the face with a soft triangle
of light on the shadow-side cheek, deep warm shadows in the workshop"
**Bad:** "natural lighting"
### 6. Style
Art medium, aesthetic reference, technical photographic details.
**Good:** "captured with a Sony A7R IV, 85mm f/1.4 GM lens, Kodak Portra
400 color grading with lifted shadows and muted earth tones, reminiscent
of Dorothea Lange's documentary portraiture"
**Bad:** "photorealistic, 8K, masterpiece"
## Blog Image Types
Map blog use cases to domain modes and aspect ratios:
| Image Type | Aspect Ratio | Domain Mode | Prompt Focus |
|------------|-------------|-------------|-------------|
| Hero/Cover | 16:9 | Editorial or Landscape | Wide composition, mood-setting, topic-relevant |
| OG/Social Card | 16:9 (1200x630) | Editorial or Infographic | Clean, readable at small sizes, topic icon |
| Inline Illustration | 16:9 or 4:3 | Varies by topic | Supports adjacent H2 content, contextual |
| Inline Product Shot | 4:3 or 1:1 | Product | Clean background, product focus |
| Section Divider | 8:1 or 4:1 | Abstract or Landscape | Wide strip, atmospheric, non-distracting |
## Blog-Specific Prompt Templates
### Hero/Cover Image
```
A [photorealistic/editorial] wide establishing shot of [topic-relevant scene],
[action or state that conveys the article's core message]. Set in [environment
with specifics that match blog topic]. [Wide, balanced composition with rule of
thirds]. [Dramatic or inviting lighting] creating [mood that matches article tone].
[Style reference appropriate to blog niche]. Aspect ratio 16:9, suitable as a
blog hero image at 1200x630 or 1920x1080.
```
### Inline Illustration
```
A [style] [shot type] of [specific element from the blog section], [illustrating
the concept of the adjacent heading]. [Contextual environment]. [Clear, well-lit
composition that works at medium size]. [Color palette complementing blog design].
```
### Social/OG Card Image
```
A [clean, high-contrast] [format] showing [key visual concept of the article],
[simplified for recognition at thumbnail size]. [Minimal background, strong focal
point]. [Bold lighting that reads well at small sizes]. Text-free, designed for
social sharing preview at 1200x630.
```
## Domain Mode Libraries (Blog-Relevant)
### Editorial Mode
Best for: Blog headers, feature images, lifestyle content, storytelling.
**Publication refs:** National Geographic, Kinfolk, The Atlantic, Wired
**Styling notes:** layered textures, clean compositions, atmospheric depth
**Locations:** contextual to blog topic — offices, workshops, nature, urban
**Mood:** authoritative, inviting, professional
### Product Mode
Best for: E-commerce blogs, product reviews, comparison articles, tech posts.
**Surfaces:** polished marble, brushed concrete, raw linen, acrylic riser, gradient sweep
**Lighting:** softbox diffused, hard key with fill card, rim separation, tent lighting
**Angles:** 45-degree hero, flat lay, three-quarter, straight-on
**Style refs:** Apple product photography, Aesop minimal, clean and modern
### Landscape Mode
Best for: Environmental backgrounds, travel blogs, atmospheric hero sections.
**Depth layers:** foreground interest, midground subject, background atmosphere
**Atmospherics:** fog, mist, haze, volumetric light rays, dust particles
**Time of day:** blue hour (pre-dawn), golden hour, magic hour (post-sunset)
**Weather:** dramatic storm clouds, clearing after rain, sun-dappled
### UI/Web Mode
Best for: Tech blog icons, feature illustrations, app screenshots, diagrams.
**Styles:** flat vector, isometric 3D, line art, glassmorphism, material design
**Colors:** specify exact hex or descriptive palette (e.g., "cool blues #2563EB to #1E40AF")
**Sizing:** design at 2x for retina, specify exact pixel dimensions needed
**Backgrounds:** transparent (request solid white then post-process), gradient, solid color
### Infographic Mode
Best for: Data-driven posts, process explanations, comparison visuals.
**Layout:** modular sections, clear visual hierarchy, bento grid, flow top-to-bottom
**Text:** use quotes for exact text, descriptive font style, specify size hierarchy
**Data viz:** bar charts, pie charts, flow diagrams, timelines, comparison tables
**Colors:** high-contrast, accessible palette, consistent with blog brand
### Abstract Mode
Best for: Pattern backgrounds, section dividers, decorative headers, mood pieces.
**Geometry:** fractals, voronoi tessellation, spirals, organic flow, crystalline
**Textures:** marble veining, fluid dynamics, smoke wisps, ink diffusion, watercolor bleed
**Color palettes:** analogous harmony, complementary clash, monochromatic gradient
**Styles:** generative art, procedural, macro photography of materials
## Search-Grounded Generation (NB2 Feature)
For blog images that need real-world accuracy (current products, real locations,
data-driven infographics), use Google Search grounding with this 3-part formula:
```
[Source/Search request] + [Analytical task] + [Visual translation]
```
**Example:** "Search for the top 5 AI coding tools by GitHub stars in 2026, analyze their relative popularity, then generate a clean infographic comparison chart in a modern dark theme."
Requires `googleSearch` tool enabled in the API call. MCP server handles this when available.
## Advanced Techniques
### Text-First Hack
For images with text, establish the concept conversationally FIRST ("I need a header with 'AI Search 2026'"), then generate. Always enclose text in quotation marks. The model anchors on text mentioned early in the conversation.
### Camera Hardware Naming
Name real camera hardware for precise aesthetics: "Sony A7III, 85mm f/1.4 lens" locks precise bokeh better than "shallow depth of field with bokeh".
### Character Consistency (Multi-turn)
Use `gemini_chat` and maintain descriptive anchors:
- First turn: Generate character with exhaustive physical description
- Following turns: Reference "the same character" + repeat 2-3 key identifiers
- Key identifiers: hair color/style, distinctive clothing, facial feature
### Text Rendering Tips
- Quote exact text: `with the text "OPEN DAILY" in bold condensed sans-serif`
- **25 characters or less** — practical limit for reliable rendering
- **2-3 distinct phrases max** — more text fragments degrade quality
- Describe font characteristics, not font names
- Specify placement: "centered at the top third", "along the bottom edge"
- High contrast: light text on dark, or vice versa
### Positive Framing (No Negative Prompts)
Gemini does NOT support negative prompts. Rephrase exclusions:
- Instead of "no blur" → "sharp, in-focus, tack-sharp detail"
- Instead of "no people" → "empty, deserted, uninhabited"
- Instead of "no text" → "clean, uncluttered, text-free"
## Common Prompt Mistakes
1. **Keyword stuffing** — "8K, masterpiece, best quality" adds nothing to Gemini
2. **Tag lists instead of prose** — Gemini wants narrative, not "red car, sunset, cinematic"
3. **Missing lighting** — Single biggest quality differentiator; always specify
4. **No composition direction** — Results in generic centered framing
5. **Ignoring aspect ratio** — Always call `set_aspect_ratio` before generating
6. **Overlong prompts** — Diminishing returns past ~200 words; be precise
7. **Text > 25 chars** — Rendering degrades; use text-first hack for accuracy
8. **Not iterating** — Use `gemini_chat` for refinement instead of re-generating
```
### references/mcp-tools.md
```markdown
# MCP Tools Reference — @ycse/nanobanana-mcp
> Package: `@ycse/nanobanana-mcp`
> GitHub: https://github.com/YCSE/nanobanana-mcp
## Tools
### gemini_generate_image
Generate an image from a text prompt.
**Parameters:**
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `prompt` | string | Yes | Text description of the image to generate |
**Returns:** Image data + file path (saved to `~/Documents/nanobanana_generated/`)
**Example usage in Claude Code:**
```
User: "Generate a sunset over mountains in watercolor style"
→ Claude calls gemini_generate_image with prompt
→ Returns image path and description
```
### gemini_edit_image
Edit an existing image with text instructions.
**Parameters:**
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `imagePath` | string | Yes | Path to the image file to edit |
| `prompt` | string | Yes | Edit instructions |
**Returns:** Modified image data + file path
**Example:**
```
User: "Remove the background from ~/Documents/photo.png"
→ Claude calls gemini_edit_image with path and instruction
```
### gemini_chat
Multi-turn visual conversation maintaining session context.
**Parameters:**
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `message` | string | Yes | Chat message (can reference previous images) |
**Returns:** Text response + optional image
**Key feature:** Session consistency — maintains style, characters, and context across turns. Great for iterative refinement.
### set_aspect_ratio
Configure the aspect ratio for subsequent image generations.
**Parameters:**
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `ratio` | string | Yes | Aspect ratio (e.g., "16:9", "1:1", "9:16") |
**Supported ratios:** 1:1, 16:9, 9:16, 4:3, 3:4, 2:3, 3:2, 4:5, 5:4, 1:4, 4:1, 1:8, 8:1, 21:9
### set_model
Switch the active Gemini model.
**Parameters:**
| Param | Type | Required | Description |
|-------|------|----------|-------------|
| `model` | string | Yes | Model identifier |
**Available models:**
- `gemini-3.1-flash-image-preview` (default, recommended)
- `gemini-2.5-flash-image` (stable fallback)
### get_image_history
Retrieve list of images generated in the current session.
**Parameters:** None
**Returns:** Array of image entries with paths and prompts
### clear_conversation
Reset session context and conversation history.
**Parameters:** None
**Returns:** Confirmation of reset
## Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| `GOOGLE_AI_API_KEY` | Yes | API key from https://aistudio.google.com/apikey |
| `NANOBANANA_MODEL` | No | Override default model (default: `gemini-3.1-flash-image-preview`) |
## Output Directory
All generated images are saved to: `~/Documents/nanobanana_generated/`
Images are named with timestamps for easy identification.
## Feature Availability via MCP
Some newer Gemini API features depend on the MCP package version of `@ycse/nanobanana-mcp`. Check the package version to confirm support:
| Feature | API Status | MCP Support |
|---------|-----------|-------------|
| `imageSize` (resolution control) | Available | Depends on package version |
| Thinking level (`thinkingConfig`) | Available | Depends on package version |
| Search grounding (`googleSearch`) | Available | Depends on package version |
| Image-only output (`responseModalities: ["IMAGE"]`) | Available | Depends on package version |
| Multi-image input (up to 14 refs) | Available | Via `gemini_chat` with image paths |
| All 14 aspect ratios | Available | Via `set_aspect_ratio` |
If a feature is not yet supported by the MCP package, you can still use it via direct API calls with `curl` or the Google AI SDK.
```
### references/gemini-models.md
```markdown
# Gemini Image Generation Models — Nano Banana
> Last updated: 2026-03-14
> Aligned with Google's March 2026 API state and pricing
## Available Models
### gemini-3.1-flash-image-preview (Recommended — Speed + Quality)
| Property | Value |
|----------|-------|
| **Model ID** | `gemini-3.1-flash-image-preview` |
| **Tier** | Nano Banana 2 (Flash) |
| **Speed** | Fast — optimized for high-volume use |
| **Aspect Ratios** | All 14 ratios (see table below) |
| **Max Resolution** | Up to 4096×4096 (4K tier) |
| **Features** | Google Search grounding (web + image), thinking levels, image-only output, extreme aspect ratios, 512px drafts |
| **Rate Limits (Free)** | ~5-15 RPM / ~20-500 RPD (preview model — more restrictive than stable) |
| **Output Tokens** | ~1,290 output tokens per image |
| **Cost (1K)** | ~$0.067/image |
| **Best For** | Most blog images, rapid iteration, batch generation |
### gemini-3-pro-image-preview (Highest Quality — Text + Detail)
| Property | Value |
|----------|-------|
| **Model ID** | `gemini-3-pro-image-preview` |
| **Tier** | Nano Banana Pro |
| **Speed** | Slower — uses reasoning before generating (generates interim images internally) |
| **Aspect Ratios** | All 14 ratios |
| **Max Resolution** | Up to 4096×4096 (4K tier) |
| **Features** | 94% text accuracy (quoted text), 14 reference images, C2PA Content Credentials |
| **Rate Limits (Free)** | ~5-10 RPM / ~20-100 RPD |
| **Output Tokens** | Higher (reasoning + generation) |
| **Cost (1K)** | ~$0.134/image (2× Flash) |
| **Best For** | Hero images with text overlays, highest quality final assets, branded content |
**Note:** The base text model `gemini-3-pro-preview` was deprecated March 9, 2026, but the **image variant** (`gemini-3-pro-image-preview`) remains active on AI Studio and Vertex AI.
### gemini-2.5-flash-image (Stable Fallback)
| Property | Value |
|----------|-------|
| **Model ID** | `gemini-2.5-flash-image` |
| **Tier** | Nano Banana Original (stable) |
| **Speed** | Fast |
| **Aspect Ratios** | 1:1, 16:9, 9:16, 4:3, 3:4 (5 only) |
| **Max Resolution** | Up to 1024×1024 (1K tier) |
| **Rate Limits (Free)** | ~10-15 RPM / ~500 RPD (stable — more generous than preview models) |
| **Cost (1K)** | ~$0.039/image |
| **Best For** | Budget-conscious workflows, proven quality, stable fallback |
## Deprecated Models (DO NOT USE)
### gemini-2.5-flash-image-preview
- **Status:** Shut down — use the stable `gemini-2.5-flash-image` variant
### gemini-2.0-flash-exp
- **Status:** Deprecated, shutdown June 1, 2026. Use `gemini-2.5-flash-image`
## Model Selection for Blog Content
| Blog Use Case | Recommended Model | Why |
|---------------|-------------------|-----|
| Quick draft / iteration | NB2 Flash (512px) | Fastest, cheapest, good enough for review |
| Standard blog images | NB2 Flash (1K-2K) | Best speed/quality ratio |
| Hero images with text | NB Pro | 94% text accuracy, reasoning mode |
| Final hero / OG at 4K | NB2 Flash or Pro (4K) | Both support 4K output |
| Budget batch generation | Original (2.5 Flash) | $0.039/img, proven quality |
## Aspect Ratios
All 14 supported ratios. Availability varies by model:
| Ratio | Orientation | Blog Use Cases | NB2 Flash | Pro | Original |
|-------|-------------|---------------|:---------:|:---:|:--------:|
| `1:1` | Square | Social posts, thumbnails | ✅ | ✅ | ✅ |
| `16:9` | Landscape | Blog headers, OG images | ✅ | ✅ | ✅ |
| `9:16` | Portrait | Stories, Reels, mobile | ✅ | ✅ | ✅ |
| `4:3` | Landscape | Product shots, inline | ✅ | ✅ | ✅ |
| `3:4` | Portrait | Book covers, portrait | ✅ | ✅ | ✅ |
| `2:3` | Portrait | Pinterest pins, posters | ✅ | ✅ | ❌ |
| `3:2` | Landscape | DSLR standard, prints | ✅ | ✅ | ❌ |
| `4:5` | Portrait | Instagram portrait | ✅ | ✅ | ❌ |
| `5:4` | Landscape | Large format | ✅ | ✅ | ❌ |
| `1:4` | Tall strip | Vertical banners | ✅ | ✅ | ❌ |
| `4:1` | Wide strip | Section dividers, headers | ✅ | ✅ | ❌ |
| `1:8` | Extreme tall | Narrow strips | ✅ | ✅ | ❌ |
| `8:1` | Extreme wide | Ultra-wide banners | ✅ | ✅ | ❌ |
| `21:9` | Ultra-wide | Cinematic headers | ✅ | ✅ | ❌ |
## Resolution Tiers
| `imageSize` | Pixel Range | Model Availability | Cost Multiplier | Blog Use |
|-------------|-------------|-------------------|:---------------:|----------|
| `512` | Up to 512×512 | All models | 0.5× | Drafts, quick iteration |
| `1K` (default) | Up to 1024×1024 | All models | 1× | Standard web/social |
| `2K` | Up to 2048×2048 | NB2 Flash, Pro | 2× | Quality inline images |
| `4K` | Up to 4096×4096 | NB2 Flash, Pro | 4× | Print, hero images, final assets |
**Notes:**
- Actual pixel dimensions depend on aspect ratio (e.g., 4K at 16:9 = 4096×2304)
- Default is `1K` if `imageSize` is not specified
- Known bug: `imageSize` sometimes ignored through LiteLLM proxy and in image-to-image workflows
## Rate Limits
Google cut free-tier limits by ~92% in December 2025. Current structure:
| Tier | RPM | RPD | How to Get |
|------|-----|-----|-----------|
| Free | ~5-15 | ~20-500 | Default (API key only, no billing) |
| Tier 1 (Pay-as-you-go) | 150-300 | 1,500-10,000 | Enable billing on Google Cloud project |
| Tier 2 ($250+ spend) | 1,000+ | Unlimited | Cumulative $250+ API spend |
**Important:** Preview models (NB2, Pro) have more restrictive limits than stable models. Free tier for image generation may require billing to be enabled — some users report 0 IPM (images per minute) without billing.
## Pricing (March 2026)
| Model | Resolution | Cost per Image | Notes |
|-------|-----------|---------------|-------|
| NB2 Flash | 1K | ~$0.067 | Standard |
| NB2 Flash | 2K | ~$0.134 | 2× standard |
| NB2 Flash | 4K | ~$0.268 | 4× standard |
| Pro | 1K | ~$0.134 | 2× Flash |
| Pro | 4K | ~$0.536 | Premium quality |
| Original (2.5) | 1K | ~$0.039 | Budget option |
| Batch API | Any | 50% discount | Asynchronous, higher latency |
**Cost optimization:** Use 512px for drafts (cheapest), 1K for standard blog images, reserve 2K-4K for hero images and final assets.
## Multi-Image Input
| Feature | Limit | Notes |
|---------|-------|-------|
| Object references | Up to 6 | Style, composition, visual matching |
| Character references | Up to 5 | Assign names to preserve features |
| Total references | Up to 14 | Combined across types |
| Max input image size | 7 MB | Per image |
Useful for brand-consistent blog imagery: provide brand style references to maintain visual identity across generated images.
## Safety Filters — Dual Layer Architecture
### Layer 1: Input Filters (Configurable)
Standard harm category filtering via `safetySettings` API parameter. Covers hate speech, harassment, sexually explicit, and dangerous content.
### Layer 2: Output Filters (NON-CONFIGURABLE)
Server-side analysis of the **generated image itself**. Cannot be disabled through any API parameter.
- Returns `finishReason: "IMAGE_SAFETY"` (distinct from `"SAFETY"`)
- Known to be overly cautious — Google acknowledged "filters became way more cautious than we intended"
- Benign prompts like "dog" or "bowl of cereal" have been blocked
- Celebrity blocking tightened significantly with NB2
| `finishReason` | Meaning | Layer | Retryable? |
|----------------|---------|:-----:|:----------:|
| `STOP` | Successful generation | — | N/A |
| `IMAGE_SAFETY` | Output blocked by Layer 2 | 2 | Rephrase prompt |
| `PROHIBITED_CONTENT` | Content policy violation | 1 | No — topic blocked |
| `SAFETY` | General safety block | 1 | Rephrase prompt |
| `RECITATION` | Detected copyrighted content | 2 | Rephrase prompt |
**No workaround exists for Layer 2 blocks beyond rephrasing the prompt.**
## Content Credentials
- **SynthID watermarks** are always embedded (invisible, machine-readable). Survives rescaling, compression, and most edits — cannot be disabled
- **C2PA Content Credentials** are embedded on Nano Banana Pro images from Gemini App, Vertex AI, and Google Ads
## Key Limitations
- No native transparent backgrounds (workaround: prompt green background, then chromakey removal)
- Text rendering quality varies — keep text under 25 characters for best results (Pro achieves 94% accuracy with quoted text)
- Safety filters may block benign prompts — use auto-rephrase workflow
- Session context resets between Claude Code conversations
- `imageSize` and thinking level depend on MCP package version support
- No video generation (use Veo 3.1 for image-to-video workflows)
```
### scripts/setup_image_mcp.py
```python
#!/usr/bin/env python3
"""
Setup script for nanobanana-mcp in claude-blog.
Configures @ycse/nanobanana-mcp in the project's .mcp.json (default)
or Claude Code's global settings.json (with --global flag).
Usage:
python3 setup_image_mcp.py # Interactive (prompts for key)
python3 setup_image_mcp.py --key YOUR_KEY # Non-interactive
python3 setup_image_mcp.py --check # Verify existing setup
python3 setup_image_mcp.py --remove # Remove MCP config
python3 setup_image_mcp.py --global # Write to ~/.claude/settings.json
python3 setup_image_mcp.py --help # Show usage
"""
import json
import sys
import os
from pathlib import Path
MCP_NAME = "nanobanana-mcp"
MCP_PACKAGE = "@ycse/nanobanana-mcp"
DEFAULT_MODEL = "gemini-3.1-flash-image-preview"
GLOBAL_SETTINGS_PATH = Path.home() / ".claude" / "settings.json"
def find_project_mcp_json() -> Path:
"""Find the project-level .mcp.json by looking for .claude-plugin/plugin.json."""
current = Path(__file__).resolve().parent
for _ in range(10): # Max 10 levels up
candidate = current / ".claude-plugin" / "plugin.json"
if candidate.exists():
return current / ".mcp.json"
parent = current.parent
if parent == current:
break
current = parent
# Fallback: look from cwd
current = Path.cwd()
for _ in range(10):
candidate = current / ".claude-plugin" / "plugin.json"
if candidate.exists():
return current / ".mcp.json"
parent = current.parent
if parent == current:
break
current = parent
return None
def get_config_path(use_global: bool) -> Path:
"""Get the appropriate config file path."""
if use_global:
return GLOBAL_SETTINGS_PATH
project_path = find_project_mcp_json()
if project_path:
return project_path
print("Warning: Could not find project root (.claude-plugin/plugin.json).")
print("Falling back to global settings.")
return GLOBAL_SETTINGS_PATH
def load_config(path: Path) -> dict:
"""Load config file."""
if not path.exists():
return {}
with open(path, "r") as f:
return json.load(f)
def save_config(path: Path, config: dict) -> None:
"""Save config file."""
path.parent.mkdir(parents=True, exist_ok=True)
with open(path, "w") as f:
json.dump(config, f, indent=2)
f.write("\n")
print(f"Config saved to {path}")
def check_setup(use_global: bool) -> bool:
"""Check if MCP is already configured."""
# Check project-level first, then global
paths_to_check = []
if not use_global:
project_path = find_project_mcp_json()
if project_path:
paths_to_check.append(("Project .mcp.json", project_path))
paths_to_check.append(("Global settings.json", GLOBAL_SETTINGS_PATH))
for label, path in paths_to_check:
config = load_config(path)
servers = config.get("mcpServers", {})
if MCP_NAME in servers:
env = servers[MCP_NAME].get("env", {})
key = env.get("GOOGLE_AI_API_KEY", "")
masked = key[:8] + "..." + key[-4:] if len(key) > 12 else "(not set)"
print(f"MCP server '{MCP_NAME}' found in {label}.")
print(f" Path: {path}")
print(f" Package: {MCP_PACKAGE}")
print(f" API Key: {masked}")
print(f" Model: {env.get('NANOBANANA_MODEL', DEFAULT_MODEL)}")
return True
print(f"MCP server '{MCP_NAME}' is NOT configured.")
return False
def remove_mcp(use_global: bool) -> None:
"""Remove MCP configuration."""
path = get_config_path(use_global)
config = load_config(path)
servers = config.get("mcpServers", {})
if MCP_NAME in servers:
del servers[MCP_NAME]
config["mcpServers"] = servers
save_config(path, config)
print(f"Removed '{MCP_NAME}' from {path}.")
else:
print(f"'{MCP_NAME}' not found in {path}.")
def setup_mcp(api_key: str, use_global: bool) -> None:
"""Configure MCP server."""
if not api_key or not api_key.strip():
print("Error: API key cannot be empty.")
sys.exit(1)
api_key = api_key.strip()
path = get_config_path(use_global)
config = load_config(path)
if "mcpServers" not in config:
config["mcpServers"] = {}
config["mcpServers"][MCP_NAME] = {
"command": "npx",
"args": ["-y", MCP_PACKAGE],
"env": {
"GOOGLE_AI_API_KEY": api_key,
"NANOBANANA_MODEL": DEFAULT_MODEL,
},
}
save_config(path, config)
print(f"\nMCP server '{MCP_NAME}' configured successfully!")
print(f" Package: {MCP_PACKAGE}")
print(f" Model: {DEFAULT_MODEL}")
print(f" Config: {path}")
print(f"\nRestart Claude Code for changes to take effect.")
print(f"Generated images will be saved to: ~/Documents/nanobanana_generated/")
def main() -> None:
args = sys.argv[1:]
use_global = "--global" in args
if "--help" in args or "-h" in args:
print("Usage: python3 setup_image_mcp.py [OPTIONS]")
print()
print("Options:")
print(" --key KEY Provide API key non-interactively")
print(" --check Verify existing setup")
print(" --remove Remove MCP configuration")
print(" --global Write to ~/.claude/settings.json (default: project .mcp.json)")
print(" --help, -h Show this help message")
print()
print("Get a free API key at: https://aistudio.google.com/apikey")
sys.exit(0)
if "--check" in args:
check_setup(use_global)
return
if "--remove" in args:
remove_mcp(use_global)
return
# Get API key
api_key = None
for i, arg in enumerate(args):
if arg == "--key" and i + 1 < len(args):
api_key = args[i + 1]
break
if not api_key:
api_key = os.environ.get("GOOGLE_AI_API_KEY")
if not api_key:
print("claude-blog — Image Generation MCP Setup")
print("=" * 45)
print(f"\nGet your free API key at: https://aistudio.google.com/apikey")
print()
try:
api_key = input("Enter your Google AI API key: ")
except (EOFError, KeyboardInterrupt):
print("\nError: No input received. Provide a key with --key or set GOOGLE_AI_API_KEY env var.")
sys.exit(1)
setup_mcp(api_key, use_global)
if __name__ == "__main__":
main()
```
### scripts/validate_image_setup.py
```python
#!/usr/bin/env python3
"""
Validate that nanobanana-mcp is properly configured for claude-blog.
Checks project .mcp.json first, then falls back to global ~/.claude/settings.json.
Checks:
1. Config file has the MCP entry
2. API key is present
3. Node.js/npx is available
4. Output directory exists or can be created
Usage:
python3 validate_image_setup.py
"""
import json
import shutil
import sys
from pathlib import Path
MCP_NAME = "nanobanana-mcp"
OUTPUT_DIR = Path.home() / "Documents" / "nanobanana_generated"
GLOBAL_SETTINGS_PATH = Path.home() / ".claude" / "settings.json"
def find_project_mcp_json() -> Path:
"""Find the project-level .mcp.json by looking for .claude-plugin/plugin.json."""
current = Path(__file__).resolve().parent
for _ in range(10):
candidate = current / ".claude-plugin" / "plugin.json"
if candidate.exists():
return current / ".mcp.json"
parent = current.parent
if parent == current:
break
current = parent
current = Path.cwd()
for _ in range(10):
candidate = current / ".claude-plugin" / "plugin.json"
if candidate.exists():
return current / ".mcp.json"
parent = current.parent
if parent == current:
break
current = parent
return None
def check(label: str, passed: bool, detail: str = "") -> bool:
status = "PASS" if passed else "FAIL"
msg = f" [{status}] {label}"
if detail:
msg += f" — {detail}"
print(msg)
return passed
def find_mcp_config() -> tuple:
"""Find MCP config in project or global settings. Returns (config_dict, path_label)."""
# Try project .mcp.json first
project_path = find_project_mcp_json()
if project_path and project_path.exists():
try:
with open(project_path) as f:
config = json.load(f)
if MCP_NAME in config.get("mcpServers", {}):
return config, f"project .mcp.json ({project_path})"
except (json.JSONDecodeError, OSError):
pass
# Fallback to global settings
if GLOBAL_SETTINGS_PATH.exists():
try:
with open(GLOBAL_SETTINGS_PATH) as f:
config = json.load(f)
if MCP_NAME in config.get("mcpServers", {}):
return config, f"global settings ({GLOBAL_SETTINGS_PATH})"
except (json.JSONDecodeError, OSError):
pass
return None, None
def main() -> int:
print("claude-blog — Image Generation Setup Validation")
print("=" * 48)
results = []
# 1-2. Find and load config
config, config_label = find_mcp_config()
if config is None:
results.append(check(
"MCP config found",
False,
"Not found in project .mcp.json or global settings.json",
))
print(f"\nRun: python3 scripts/setup_image_mcp.py --key YOUR_KEY")
return 1
results.append(check("MCP config found", True, config_label))
# 3. MCP entry exists
servers = config.get("mcpServers", {})
has_mcp = MCP_NAME in servers
results.append(check(f"MCP server '{MCP_NAME}' configured", has_mcp))
if has_mcp:
mcp = servers[MCP_NAME]
# 4. Command is npx
results.append(check(
"Command is 'npx'",
mcp.get("command") == "npx",
mcp.get("command", "(missing)"),
))
# 5. Package is correct
args = mcp.get("args", [])
has_pkg = "@ycse/nanobanana-mcp" in args
results.append(check(
"Package is @ycse/nanobanana-mcp",
has_pkg,
str(args),
))
# 6. API key present
env = mcp.get("env", {})
key = env.get("GOOGLE_AI_API_KEY", "")
# Accept env var placeholders as configured, but warn about ${} syntax
key_set = bool(key) and key != ""
is_placeholder = key.startswith("${") and key.endswith("}")
if is_placeholder:
results.append(check(
"GOOGLE_AI_API_KEY is set",
True,
f"{key} (env var placeholder — ensure this variable is exported in your shell)",
))
else:
results.append(check(
"GOOGLE_AI_API_KEY is set",
key_set,
f"{key[:8]}...{key[-4:]}" if len(key) > 12 else key or "(empty)",
))
# 7. Model configured (optional — package has a default)
model = env.get("NANOBANANA_MODEL", "")
results.append(check(
"NANOBANANA_MODEL is set",
True, # Always pass — model is optional, package defaults to gemini-3.1-flash
model or "(not set — package will use default model)",
))
# 8. Node.js/npx available
has_npx = shutil.which("npx") is not None
results.append(check(
"npx is available in PATH",
has_npx,
shutil.which("npx") or "not found — install Node.js 18+",
))
# 9. Output directory
if OUTPUT_DIR.exists():
results.append(check("Output directory exists", True, str(OUTPUT_DIR)))
else:
try:
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
results.append(check("Output directory created", True, str(OUTPUT_DIR)))
except OSError as e:
results.append(check("Output directory writable", False, str(e)))
# Summary
passed = sum(1 for r in results if r)
total = len(results)
print(f"\n{'=' * 48}")
print(f"Results: {passed}/{total} checks passed")
if passed == total:
print("Status: Ready to generate blog images!")
return 0
else:
print("Status: Some checks failed. Fix the issues above.")
print("Setup: python3 scripts/setup_image_mcp.py --key YOUR_KEY")
return 1
if __name__ == "__main__":
sys.exit(main())
```