SkillHub ClubAnalyze Data & AIFull StackBackendData / AI

Model Manager

Test, validate, and add new AI models to the eval suite. Use when user asks to add new models, test model access, check pricing, or update models.yml.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

Hot score

Updated

March 19, 2026

Overall rating

C2.8

Composite score

2.8

Best-practice grade

C57.6

Install command

npx @skill-hub/cli install sunholo-data-ailang-model-manager

apitestingautomationconfiguration

Repository

sunholo-data/ailang

Skill path: .claude/skills/model-manager

Test, validate, and add new AI models to the eval suite. Use when user asks to add new models, test model access, check pricing, or update models.yml.

Open repository

Best for

Primary workflow: Analyze Data & AI.

Technical facets: Full Stack, Backend, Data / AI, Testing.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: sunholo-data.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install Model Manager into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/sunholo-data/ailang before adding Model Manager to shared team environments
Use Model Manager for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: Model Manager
description: Test, validate, and add new AI models to the eval suite. Use when user asks to add new models, test model access, check pricing, or update models.yml.
---

# Model Manager

Test API access, validate configurations, and add new AI models to the AILANG eval suite.

## Quick Start

**Most common usage:**
```bash
# User says: "Can we add GPT-5.1 to the eval suite?"
# This skill will:
# 1. Test API access to GPT-5.1
# 2. Find the correct API model name
# 3. Look up pricing information
# 4. Update models.yml configuration
# 5. Run a test benchmark to verify
```

## When to Use This Skill

Invoke this skill when:
- User asks to "add a new model" to eval suite
- User mentions checking if a model is "accessible" or "available"
- User wants to "test API access" to a model
- User asks to "update models.yml" or "check pricing"
- User says "can we use [model name]?" for evaluations

## Available Scripts

### `scripts/test_model_access.sh <provider> <model-name>`
Test API access to a model and display authentication status.

**Usage:**
```bash
# Test OpenAI model
scripts/test_model_access.sh openai gpt-5.1

# Test Anthropic model
scripts/test_model_access.sh anthropic claude-sonnet-4-5-20250929

# Test Google Gemini via Vertex AI
scripts/test_model_access.sh google gemini-3-pro-preview-11-2025
```

**Output:**
```
Testing: openai/gpt-5.1
✓ OPENAI_API_KEY found
✓ API call successful
✓ Model: gpt-5.1-2025-11-13
✓ Tokens: 13 input, 10 output (10 reasoning)
Ready to add to models.yml
```

### `scripts/find_model_info.sh <model-keywords>`
Search for model information using web search and return API names + pricing.

**Usage:**
```bash
# Find GPT-5.1 info
scripts/find_model_info.sh "GPT-5.1 API model name pricing"

# Find Gemini 3 Pro info
scripts/find_model_info.sh "Gemini 3 Pro API documentation"
```

**Output:**
```
Searching for: GPT-5.1 API model name pricing
✓ Found API names:
  - gpt-5.1 (Thinking mode)
  - gpt-5.1-chat-latest (Instant mode)
✓ Pricing:
  Input: $1.25 per 1M tokens
  Output: $10.00 per 1M tokens
  Cached: $0.125 per 1M tokens
```

### `scripts/update_models_yml.sh <friendly-name> <api-name> <provider> <input-price> <output-price>`
Add a new model to models.yml configuration.

**Usage:**
```bash
# Add GPT-5.1
scripts/update_models_yml.sh \
  gpt5-1 \
  "gpt-5.1" \
  openai \
  0.00125 \
  0.01
```

**Output:**
```
Adding model to models.yml:
  Friendly name: gpt5-1
  API name: gpt-5.1
  Provider: openai
  Pricing: $0.00125 / $0.01 per 1K tokens

✓ Updated models.yml
✓ Validated YAML syntax
✓ Ready to test
```

### `scripts/verify_vertex_model.sh <model-name>`
Check if a Gemini model is available in Vertex AI.

**Usage:**
```bash
# Check if Gemini 3 Pro is available
scripts/verify_vertex_model.sh gemini-3-pro-preview-11-2025
```

**Output:**
```
Checking Vertex AI for: gemini-3-pro-preview-11-2025
✓ GCP project: multivac-internal-prod
✓ Access token obtained
✗ Model not found (404)
Recommendation: Monitor for availability, check again in 1-2 weeks
```

### `scripts/run_test_benchmark.sh <model-name>`
Run a small test benchmark to verify model works end-to-end.

**Usage:**
```bash
# Test GPT-5.1 with fizzbuzz benchmark
scripts/run_test_benchmark.sh gpt5-1
```

**Output:**
```
Running test benchmark: fizzbuzz
Model: gpt5-1
✓ Benchmark completed
✓ Result: PASS (100%)
✓ Tokens: 245 input, 89 output
✓ Cost: $0.002
Model is ready for production use
```

## Workflow

### 1. Test API Access

**First, verify you can call the model:**

```bash
# Use test_model_access.sh
scripts/test_model_access.sh openai gpt-5.1
```

**What to check:**
- API key is set (OPENAI_API_KEY, ANTHROPIC_API_KEY, or gcloud auth)
- API call succeeds (not 401/403/404)
- Model returns expected structure
- Token usage is reported

**For Gemini models:**
- Uses Vertex AI (not public API)
- Requires `gcloud auth application-default login`
- Check availability with `verify_vertex_model.sh`

### 2. Find Model Information

**Search for official documentation:**

```bash
# Find API model name and pricing
scripts/find_model_info.sh "GPT-5.1 API documentation pricing"
```

**What to gather:**
- Exact API model name (e.g., `gpt-5.1` not `GPT-5.1`)
- Provider (openai, anthropic, google)
- Input price per 1K tokens
- Output price per 1K tokens
- Context limits (if relevant)
- Special features (adaptive reasoning, caching, etc.)

**Reference:** See [resources/provider_endpoints.md](resources/provider_endpoints.md)

### 3. Update models.yml

**Add the model configuration:**

```bash
# Add to models.yml
scripts/update_models_yml.sh \
  <friendly-name> \
  <api-name> \
  <provider> \
  <input-per-1k> \
  <output-per-1k>
```

**Naming conventions:**
- Friendly name: `gpt5-1`, `claude-sonnet-4-5`, `gemini-3-pro`
- API name: Exact string for API calls
- Use hyphens, lowercase

**Also update:**
- Model suites (`benchmark_suite`, `extended_suite`, `dev_models`)
- Add notes about special features
- Document agent CLI support (if available)

### 4. Run Test Benchmark

**Verify end-to-end:**

```bash
# Test with a simple benchmark
scripts/run_test_benchmark.sh <model-name>
```

**What to verify:**
- Benchmark completes successfully
- Results are reasonable (not garbage output)
- Token usage matches expectations
- Cost calculation works
- No errors in logs

### 5. Document the Model

**Update relevant documentation:**
- Add model to this skill's resource guide
- Note any special parameters (e.g., `max_completion_tokens` for GPT-5.1)
- Document authentication requirements
- Add to teaching prompts if needed

### 6. Optional: Run Full Eval

**If model looks good:**

```bash
# Run small eval suite
ailang eval-suite --models <model-name> --benchmarks fizzbuzz,recursion_factorial

# Run full suite (expensive!)
make eval-baseline EVAL_VERSION=vX.Y.Z FULL=true
```

## Resources

### Provider Endpoints
See [resources/provider_endpoints.md](resources/provider_endpoints.md) for:
- API endpoint URLs for each provider
- Authentication methods
- How to test access manually
- Common errors and fixes

### Pricing Guide
See [resources/pricing_guide.md](resources/pricing_guide.md) for:
- How to find official pricing
- Price conversion (per 1M → per 1K)
- Cost calculation verification
- Caching and discounts

## Progressive Disclosure

This skill loads information progressively:

1. **Always loaded**: This SKILL.md file (workflow and script descriptions)
2. **Execute as needed**: Scripts in `scripts/` (testing, updating, verification)
3. **Load on demand**: Resources (detailed endpoint docs, pricing references)

## Notes

**Important:**
- Always test API access BEFORE updating models.yml
- Vertex AI (Gemini) requires gcloud auth, not API key
- GPT-5.1+ uses `max_completion_tokens` instead of `max_tokens`
- New models may not be available in all regions immediately
- Check for preview/beta status before adding to production suites

**Prerequisites:**
- API keys set in environment (OPENAI_API_KEY, ANTHROPIC_API_KEY)
- For Gemini: `gcloud` CLI installed and authenticated
- For Gemini: GCP project set (`gcloud config set project PROJECT_ID`)
- `curl`, `python3`, and `jq` available in PATH

**Files modified by this skill:**
- `internal/eval_harness/models.yml` - Model configurations
- (Optional) `prompts/vX.Y.Z.md` - Teaching prompts
- (Optional) `.claude/skills/model-manager/resources/` - Local model database


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### resources/provider_endpoints.md

```markdown
# Provider API Endpoints

Quick reference for testing AI model access across different providers.

## OpenAI

**Endpoint**: `https://api.openai.com/v1/chat/completions`

**Authentication**: Bearer token via `OPENAI_API_KEY` environment variable

**Test command**:
```bash
curl -s https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-5.1",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_completion_tokens": 10
  }'
```

**Important notes**:
- GPT-5.1+ uses `max_completion_tokens` instead of `max_tokens`
- Reasoning tokens reported separately in `completion_tokens_details.reasoning_tokens`
- Model name resolves to dated version (e.g., `gpt-5.1-2025-11-13`)

**Documentation**: https://platform.openai.com/docs/api-reference

---

## Anthropic

**Endpoint**: `https://api.anthropic.com/v1/messages`

**Authentication**: API key via `x-api-key` header (not Bearer token!)

**Test command**:
```bash
curl -s https://api.anthropic.com/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-5-20250929",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 10
  }'
```

**Important notes**:
- API key goes in `x-api-key` header, not `Authorization`
- Requires `anthropic-version` header
- Model names include full date suffix (e.g., `claude-sonnet-4-5-20250929`)
- Token fields: `input_tokens`, `output_tokens` (not `prompt_tokens`/`completion_tokens`)

**Documentation**: https://docs.anthropic.com/en/api/getting-started

---

## Google Gemini (Vertex AI)

**Endpoint**: `https://{REGION}-aiplatform.googleapis.com/v1/projects/{PROJECT}/locations/{REGION}/publishers/google/models/{MODEL}:generateContent`

**Authentication**: OAuth2 via `gcloud` Application Default Credentials

**Setup**:
```bash
# Install gcloud
# https://cloud.google.com/sdk/docs/install

# Authenticate
gcloud auth application-default login

# Set project
gcloud config set project YOUR_PROJECT_ID
```

**Test command**:
```bash
# Get access token
ACCESS_TOKEN=$(gcloud auth application-default print-access-token)
PROJECT_ID=$(gcloud config get-value project)
REGION="us-central1"
MODEL="gemini-2.5-pro"

curl -s -X POST \
  "https://$REGION-aiplatform.googleapis.com/v1/projects/$PROJECT_ID/locations/$REGION/publishers/google/models/$MODEL:generateContent" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "role": "user",
      "parts": [{"text": "Hello"}]
    }]
  }'
```

**Important notes**:
- Uses Vertex AI, not public Gemini API
- No API key needed - uses `gcloud` authentication
- Token fields: `promptTokenCount`, `candidatesTokenCount`, `totalTokenCount`
- New models may not be available immediately (typically 1-2 weeks after announcement)
- Check availability: Look for 404 errors

**Documentation**: https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/gemini

---

## Common Errors

### 401 Unauthorized
- **OpenAI**: Check `OPENAI_API_KEY` is set
- **Anthropic**: Check `ANTHROPIC_API_KEY` is set
- **Google**: Run `gcloud auth application-default login`

### 403 Forbidden
- **OpenAI**: API key may be invalid or quota exceeded
- **Anthropic**: API key invalid or account issue
- **Google**: Check GCP project permissions

### 404 Not Found
- **OpenAI**: Model name incorrect (check for typos)
- **Anthropic**: Model name incorrect (check date suffix)
- **Google**: Model not yet available in Vertex AI (check again in 1-2 weeks)

### 429 Rate Limit
- Wait and retry with exponential backoff
- Consider upgrading API tier (OpenAI)
- Check quota limits (Google Cloud Console)

---

## Pricing Endpoints

**OpenAI**: https://openai.com/api/pricing/
**Anthropic**: https://www.anthropic.com/pricing
**Google**: https://ai.google.dev/pricing

Convert pricing:
- **Per 1M tokens** → **Per 1K tokens**: Divide by 1000
- Example: $1.25 per 1M = $0.00125 per 1K

```

### resources/pricing_guide.md

```markdown
# Model Pricing Guide

How to find, verify, and convert pricing information for AI models.

## Finding Official Pricing

### OpenAI
**Pricing page**: https://openai.com/api/pricing/

**How to find**:
1. Go to OpenAI Platform pricing page
2. Find model in the list (e.g., "GPT-5.1")
3. Note input and output prices per 1M tokens
4. Check for cached token discounts

**Example** (GPT-5.1):
```
Input:  $1.25 per 1M tokens
Output: $10.00 per 1M tokens
Cached: $0.125 per 1M tokens (90% discount)
```

### Anthropic
**Pricing page**: https://www.anthropic.com/pricing

**How to find**:
1. Go to Anthropic pricing page
2. Find model family (e.g., "Claude Sonnet 4.5")
3. Note input and output prices per 1M tokens
4. Check for prompt caching discounts

**Example** (Claude Sonnet 4.5):
```
Input:  $3.00 per 1M tokens
Output: $15.00 per 1M tokens
```

### Google Gemini
**Pricing page**: https://ai.google.dev/pricing

**How to find**:
1. Go to Google AI pricing page
2. Find model (e.g., "Gemini 2.5 Pro")
3. Note context-dependent pricing (≤200k vs >200k)
4. Check for caching discounts

**Example** (Gemini 2.5 Pro):
```
Context ≤200k:
  Input:  $1.25 per 1M tokens
  Output: $10.00 per 1M tokens

Context >200k:
  Input:  $2.50 per 1M tokens
  Output: $15.00 per 1M tokens
```

---

## Price Conversion

**models.yml uses price per 1K tokens**, not per 1M.

**Conversion formula**:
```
Price per 1K = (Price per 1M) / 1000
```

**Examples**:

| Per 1M | Per 1K (models.yml) |
|--------|---------------------|
| $1.25  | 0.00125             |
| $10.00 | 0.01                |
| $0.25  | 0.00025             |
| $2.00  | 0.002               |

**Quick reference**:
```bash
# Calculate per 1K from per 1M
$ python3 -c "print(1.25 / 1000)"  # Input price
0.00125

$ python3 -c "print(10.0 / 1000)"  # Output price
0.01
```

---

## Cost Calculation Verification

After adding a model, verify cost calculation works:

**Test script**:
```bash
# Run test benchmark
.claude/skills/model-manager/scripts/run_test_benchmark.sh <model-name>

# Check output for cost
# Should show: "✓ Cost: $0.XXX"
```

**Manual verification**:
```python
# Example for GPT-5.1
input_tokens = 245
output_tokens = 89

input_cost = (245 / 1000) * 0.00125   # $0.00031
output_cost = (89 / 1000) * 0.01      # $0.00089
total_cost = input_cost + output_cost  # $0.00120
```

**Check against AILANG cost tracking**:
```bash
# Look for cost in benchmark result JSON
cat eval_results/.../fizzbuzz_ailang_<model>_*.json | jq '.cost'
```

---

## Context Caching

Some models offer reduced pricing for cached tokens.

### OpenAI (Prompt Caching)
**Applies to**: GPT-5, GPT-5.1
**Discount**: 90% (e.g., $1.25 → $0.125 per 1M)
**Retention**: Up to 24 hours (GPT-5.1), 5 minutes (GPT-5)
**Notes**: Automatic when prompt prefix is reused

### Anthropic (Prompt Caching)
**Applies to**: Claude Sonnet/Opus models
**Discount**: ~90% (check current pricing)
**Retention**: 5 minutes
**Notes**: Requires explicit cache control headers

### Google (Context Caching)
**Applies to**: Gemini 2.5 Pro, Gemini 2.5 Flash
**Discount**: 90% (e.g., $1.25 → $0.125 per 1M for ≤200k)
**Retention**: Configurable (up to 1 hour default)
**Notes**: Minimum 2048 tokens to enable caching

**Important**: AILANG eval harness does not yet track cached token costs separately. All costs use standard pricing.

---

## Special Pricing Features

### GPT-5.1 Adaptive Reasoning
- Reasoning tokens charged at output token rate
- Reported separately in `completion_tokens_details.reasoning_tokens`
- Can vary based on task complexity (adaptive)

### Gemini Context-Dependent Pricing
- Different rates for ≤200k vs >200k context
- models.yml uses ≤200k pricing by default
- Note >200k pricing in model notes section

### Anthropic Message Batching
- Discounts available for batch API (not used in eval harness)
- Standard pricing applies for streaming/synchronous calls

---

## Verifying Published Pricing

When adding a new model, cross-reference multiple sources:

1. **Official pricing page** (authoritative)
2. **API documentation** (may include pricing)
3. **Release announcement** (often mentions pricing)
4. **Community resources** (verify against official)

**Red flags**:
- ⚠️ Third-party sites may have outdated pricing
- ⚠️ Beta/preview pricing may change at GA
- ⚠️ Regional pricing variations (models.yml assumes US pricing)
- ⚠️ Volume discounts not reflected in published rates

**When in doubt**:
- Test with a small API call
- Check token usage and compare expected cost
- Monitor actual billing in provider dashboard

---

## Example: Adding GPT-5.1

**Step 1**: Find pricing
- Go to https://openai.com/api/pricing/
- Find "GPT-5.1": $1.25 input, $10.00 output per 1M

**Step 2**: Convert to per 1K
- Input: 1.25 / 1000 = 0.00125
- Output: 10.00 / 1000 = 0.01

**Step 3**: Add to models.yml
```bash
.claude/skills/model-manager/scripts/update_models_yml.sh \
  gpt5-1 \
  "gpt-5.1" \
  openai \
  0.00125 \
  0.01 \
  "GPT-5.1 with adaptive reasoning"
```

**Step 4**: Verify cost calculation
```bash
.claude/skills/model-manager/scripts/run_test_benchmark.sh gpt5-1

# Expected output:
# ✓ Cost: $0.002  (approximately, depends on token usage)
```

---

## Cost Optimization Tips

**For development** (use cheap models):
- `dev_models` suite: gpt5-mini, claude-haiku-4-5, gemini-2-5-flash
- ~5x cheaper than flagship models
- Good enough for quick iteration

**For benchmarks** (use balanced models):
- `benchmark_suite`: gpt5-1, claude-sonnet-4-5, gemini-2-5-pro
- Flagship models for quality
- Moderate cost

**For full evals** (be prepared for cost):
- `extended_suite`: All 6+ models
- Comprehensive coverage
- ~2-3x cost of benchmark suite

**Example costs** (approximate, for full eval suite):
```
dev_models (3 models, 50 benchmarks):    ~$5-10
benchmark_suite (3 models, 50 benchmarks): ~$15-25
extended_suite (6 models, 50 benchmarks):  ~$30-50
```

---

## Troubleshooting

### Cost shows $0.00
**Possible causes**:
- Model not in models.yml pricing config
- Token usage not reported by API
- Cost calculation bug

**Fix**:
1. Check model exists in `internal/eval_harness/models.yml`
2. Check pricing fields are set (`input_per_1k`, `output_per_1k`)
3. Verify token usage in API response

### Cost seems too high/low
**Possible causes**:
- Pricing conversion error (per 1M vs per 1K)
- Wrong model selected
- Context caching applied (reduces cost)

**Fix**:
1. Manually calculate expected cost
2. Compare with API response token usage
3. Check provider billing dashboard for actual charges

```

### scripts/test_model_access.sh

```bash
#!/usr/bin/env bash
# Test API access to a model
# Usage: test_model_access.sh <provider> <model-name>

set -euo pipefail

if [ $# -ne 2 ]; then
    echo "Usage: $0 <provider> <model-name>"
    echo ""
    echo "Providers: openai, anthropic, google"
    echo ""
    echo "Examples:"
    echo "  $0 openai gpt-5.1"
    echo "  $0 anthropic claude-sonnet-4-5-20250929"
    echo "  $0 google gemini-3-pro-preview-11-2025"
    exit 1
fi

PROVIDER="$1"
MODEL="$2"

echo "Testing: $PROVIDER/$MODEL"
echo ""

# Test based on provider
case "$PROVIDER" in
    openai)
        # Check API key
        if [ -z "${OPENAI_API_KEY:-}" ]; then
            echo "✗ OPENAI_API_KEY not set"
            exit 1
        fi
        echo "✓ OPENAI_API_KEY found"

        # Test API call
        RESPONSE=$(curl -s https://api.openai.com/v1/chat/completions \
          -H "Content-Type: application/json" \
          -H "Authorization: Bearer $OPENAI_API_KEY" \
          -d "{
            \"model\": \"$MODEL\",
            \"messages\": [{\"role\": \"user\", \"content\": \"Say hello in exactly 3 words\"}],
            \"max_completion_tokens\": 10
          }")

        # Check for errors
        if echo "$RESPONSE" | grep -q '"error"'; then
            echo "✗ API call failed"
            echo "$RESPONSE" | python3 -m json.tool
            exit 1
        fi

        echo "✓ API call successful"

        # Extract info
        ACTUAL_MODEL=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['model'])")
        INPUT_TOKENS=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['usage']['prompt_tokens'])")
        OUTPUT_TOKENS=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['usage']['completion_tokens'])")

        echo "✓ Model: $ACTUAL_MODEL"
        echo "✓ Tokens: $INPUT_TOKENS input, $OUTPUT_TOKENS output"

        # Check for reasoning tokens (GPT-5.1+)
        REASONING_TOKENS=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['usage'].get('completion_tokens_details', {}).get('reasoning_tokens', 0))" 2>/dev/null || echo "0")
        if [ "$REASONING_TOKENS" != "0" ]; then
            echo "✓ Reasoning tokens: $REASONING_TOKENS (adaptive reasoning enabled)"
        fi

        echo ""
        echo "✓ Ready to add to models.yml"
        ;;

    anthropic)
        # Check API key
        if [ -z "${ANTHROPIC_API_KEY:-}" ]; then
            echo "✗ ANTHROPIC_API_KEY not set"
            exit 1
        fi
        echo "✓ ANTHROPIC_API_KEY found"

        # Test API call
        RESPONSE=$(curl -s https://api.anthropic.com/v1/messages \
          -H "Content-Type: application/json" \
          -H "x-api-key: $ANTHROPIC_API_KEY" \
          -H "anthropic-version: 2023-06-01" \
          -d "{
            \"model\": \"$MODEL\",
            \"messages\": [{\"role\": \"user\", \"content\": \"Say hello in exactly 3 words\"}],
            \"max_tokens\": 10
          }")

        # Check for errors
        if echo "$RESPONSE" | grep -q '"error"'; then
            echo "✗ API call failed"
            echo "$RESPONSE" | python3 -m json.tool
            exit 1
        fi

        echo "✓ API call successful"

        # Extract info
        ACTUAL_MODEL=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['model'])")
        INPUT_TOKENS=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['usage']['input_tokens'])")
        OUTPUT_TOKENS=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['usage']['output_tokens'])")

        echo "✓ Model: $ACTUAL_MODEL"
        echo "✓ Tokens: $INPUT_TOKENS input, $OUTPUT_TOKENS output"
        echo ""
        echo "✓ Ready to add to models.yml"
        ;;

    google)
        # Check gcloud auth
        if ! command -v gcloud &> /dev/null; then
            echo "✗ gcloud CLI not found"
            echo "Install: https://cloud.google.com/sdk/docs/install"
            exit 1
        fi
        echo "✓ gcloud CLI found"

        # Get access token
        ACCESS_TOKEN=$(gcloud auth application-default print-access-token 2>/dev/null || true)
        if [ -z "$ACCESS_TOKEN" ]; then
            echo "✗ Not authenticated"
            echo "Run: gcloud auth application-default login"
            exit 1
        fi
        echo "✓ Access token obtained"

        # Get project
        PROJECT_ID=$(gcloud config get-value project 2>/dev/null || true)
        if [ -z "$PROJECT_ID" ]; then
            echo "✗ No GCP project set"
            echo "Run: gcloud config set project PROJECT_ID"
            exit 1
        fi
        echo "✓ GCP project: $PROJECT_ID"

        # Test API call (Vertex AI)
        # Try global endpoint first (required for newer models like Gemini 3)
        LOCATION="global"
        URL="https://aiplatform.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION/publishers/google/models/$MODEL:generateContent"

        echo "✓ Testing with global endpoint"

        RESPONSE=$(curl -s -X POST "$URL" \
          -H "Authorization: Bearer $ACCESS_TOKEN" \
          -H "Content-Type: application/json" \
          -d '{
            "contents": {
              "role": "user",
              "parts": [
                {
                  "text": "Say hello in exactly 3 words"
                }
              ]
            }
          }')

        # Check for errors
        if echo "$RESPONSE" | grep -q '"error"'; then
            ERROR_CODE=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['error']['code'])" 2>/dev/null || echo "unknown")
            echo "✗ API call failed (error code: $ERROR_CODE)"

            if [ "$ERROR_CODE" = "404" ]; then
                echo ""
                echo "Model not available in Vertex AI yet."
                echo "Recommendation: Check again in 1-2 weeks"
                echo "Note: Newer models (Gemini 3+) require global endpoint"
            else
                echo "$RESPONSE" | python3 -m json.tool
            fi
            exit 1
        fi

        echo "✓ API call successful"

        # Extract info
        INPUT_TOKENS=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['usageMetadata']['promptTokenCount'])")
        OUTPUT_TOKENS=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['usageMetadata']['candidatesTokenCount'])")

        echo "✓ Model: $MODEL"
        echo "✓ Tokens: $INPUT_TOKENS input, $OUTPUT_TOKENS output"

        # Check for reasoning tokens (Gemini 3+)
        REASONING_TOKENS=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['usageMetadata'].get('thoughtsTokenCount', 0))" 2>/dev/null || echo "0")
        if [ "$REASONING_TOKENS" != "0" ]; then
            echo "✓ Reasoning tokens: $REASONING_TOKENS (adaptive reasoning enabled)"
        fi

        echo ""
        echo "✓ Ready to add to models.yml"
        ;;

    *)
        echo "✗ Unknown provider: $PROVIDER"
        echo "Supported providers: openai, anthropic, google"
        exit 1
        ;;
esac

```

### scripts/find_model_info.sh

```bash
#!/usr/bin/env bash
# Helper script to remind Claude to use WebSearch for model information
# This is a placeholder - actual search is done by Claude using WebSearch tool

echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "Model Information Search"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo ""
echo "⚠️  This script is a placeholder for Claude's WebSearch tool."
echo ""
echo "When invoked by Claude, the workflow is:"
echo ""
echo "1. Use WebSearch tool to find:"
echo "   - Official API documentation"
echo "   - Exact API model names"
echo "   - Pricing information"
echo "   - Release announcements"
echo ""
echo "2. Look for:"
echo "   • API model name (e.g., 'gpt-5.1', not 'GPT-5.1')"
echo "   • Provider documentation (OpenAI Platform, Anthropic, Google AI)"
echo "   • Pricing per 1M tokens (convert to per 1K)"
echo "   • Context limits"
echo "   • Special features (caching, reasoning, etc.)"
echo ""
echo "3. Search queries to try:"
echo "   • '[Model Name] API documentation pricing'"
echo "   • '[Model Name] API model name'"
echo "   • '[Provider] pricing page 2025'"
echo ""
echo "4. Verify information from:"
echo "   ✓ platform.openai.com (OpenAI)"
echo "   ✓ docs.anthropic.com (Anthropic)"
echo "   ✓ ai.google.dev or cloud.google.com (Google)"
echo "   ✗ Avoid third-party aggregators (may be outdated)"
echo ""
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo ""
echo "For Claude: Use the WebSearch tool now to find this information."

```

### scripts/update_models_yml.sh

```bash
#!/usr/bin/env bash
# Add a new model to models.yml configuration
# Usage: update_models_yml.sh <friendly-name> <api-name> <provider> <input-price> <output-price> [description]

set -euo pipefail

if [ $# -lt 5 ]; then
    echo "Usage: $0 <friendly-name> <api-name> <provider> <input-per-1k> <output-per-1k> [description]"
    echo ""
    echo "Arguments:"
    echo "  friendly-name : Short name for the model (e.g., gpt5-1)"
    echo "  api-name      : Exact API model name (e.g., gpt-5.1)"
    echo "  provider      : openai, anthropic, or google"
    echo "  input-per-1k  : Input price per 1K tokens (e.g., 0.00125)"
    echo "  output-per-1k : Output price per 1K tokens (e.g., 0.01)"
    echo "  description   : Optional description"
    echo ""
    echo "Examples:"
    echo "  $0 gpt5-1 'gpt-5.1' openai 0.00125 0.01 'GPT-5.1 with adaptive reasoning'"
    echo "  $0 gemini-3-pro 'gemini-3-pro-preview-11-2025' google 0.002 0.012"
    exit 1
fi

FRIENDLY_NAME="$1"
API_NAME="$2"
PROVIDER="$3"
INPUT_PRICE="$4"
OUTPUT_PRICE="$5"
DESCRIPTION="${6:-$FRIENDLY_NAME model}"

MODELS_YML="internal/eval_harness/models.yml"

if [ ! -f "$MODELS_YML" ]; then
    echo "✗ models.yml not found at: $MODELS_YML"
    echo "Are you in the AILANG project root?"
    exit 1
fi

echo "Adding model to models.yml:"
echo "  Friendly name: $FRIENDLY_NAME"
echo "  API name: $API_NAME"
echo "  Provider: $PROVIDER"
echo "  Pricing: \$$INPUT_PRICE / \$$OUTPUT_PRICE per 1K tokens"
echo "  Description: $DESCRIPTION"
echo ""

# Create backup
cp "$MODELS_YML" "$MODELS_YML.backup"
echo "✓ Created backup: $MODELS_YML.backup"

# Determine env var based on provider
case "$PROVIDER" in
    openai)
        ENV_VAR="OPENAI_API_KEY"
        ;;
    anthropic)
        ENV_VAR="ANTHROPIC_API_KEY"
        ;;
    google)
        ENV_VAR="GOOGLE_API_KEY"
        ;;
    *)
        echo "✗ Unknown provider: $PROVIDER"
        echo "Supported: openai, anthropic, google"
        exit 1
        ;;
esac

# Create model entry
# Note: This is a simple append - for production, you'd want YAML-aware editing
MODEL_ENTRY="
  # $DESCRIPTION
  $FRIENDLY_NAME:
    api_name: \"$API_NAME\"
    provider: \"$PROVIDER\"
    description: \"$DESCRIPTION\"
    env_var: \"$ENV_VAR\"
    agent_cli: null  # Agent CLI support not yet implemented
    pricing:
      input_per_1k: $INPUT_PRICE
      output_per_1k: $OUTPUT_PRICE
    notes: |
      Added: $(date +%Y-%m-%d)
      Test before adding to production suites.
"

# Check if model already exists
if grep -q "^  $FRIENDLY_NAME:" "$MODELS_YML"; then
    echo "⚠ Model '$FRIENDLY_NAME' already exists in models.yml"
    echo ""
    read -p "Overwrite? (y/N): " -n 1 -r
    echo
    if [[ ! $REPLY =~ ^[Yy]$ ]]; then
        echo "Aborted."
        rm "$MODELS_YML.backup"
        exit 1
    fi

    # Remove existing entry (simple approach - stops at next model)
    sed -i.tmp "/^  $FRIENDLY_NAME:/,/^  [a-z]/{ /^  $FRIENDLY_NAME:/d; /^  [a-z]/!d; }" "$MODELS_YML"
    rm "$MODELS_YML.tmp"
fi

# Find insertion point (after last model, before "# Default model")
# This is a simple approach - insert before the "default:" line
if grep -q "^# Default model for benchmarks" "$MODELS_YML"; then
    # Insert before "# Default model"
    sed -i.tmp "/^# Default model for benchmarks/i\\
$MODEL_ENTRY
" "$MODELS_YML"
    rm "$MODELS_YML.tmp"
else
    # Append to end of models section
    echo "$MODEL_ENTRY" >> "$MODELS_YML"
fi

echo "✓ Updated models.yml"

# Validate YAML syntax (if python3 available)
if command -v python3 &> /dev/null; then
    if python3 -c "import yaml; yaml.safe_load(open('$MODELS_YML'))" 2>/dev/null; then
        echo "✓ YAML syntax valid"
    else
        echo "✗ YAML syntax error!"
        echo "Restoring backup..."
        mv "$MODELS_YML.backup" "$MODELS_YML"
        exit 1
    fi
else
    echo "⚠ python3 not available, skipping YAML validation"
fi

rm "$MODELS_YML.backup"

echo ""
echo "✓ Ready to test"
echo ""
echo "Next steps:"
echo "  1. Review changes: git diff $MODELS_YML"
echo "  2. Test model: .claude/skills/model-manager/scripts/run_test_benchmark.sh $FRIENDLY_NAME"
echo "  3. Add to suites (dev_models, extended_suite, benchmark_suite) if needed"

```

### scripts/verify_vertex_model.sh

```bash
#!/usr/bin/env bash
# Check if a Gemini model is available in Vertex AI
# Usage: verify_vertex_model.sh <model-name>

set -euo pipefail

if [ $# -ne 1 ]; then
    echo "Usage: $0 <model-name>"
    echo ""
    echo "Examples:"
    echo "  $0 gemini-3-pro-preview-11-2025"
    echo "  $0 gemini-2.5-pro"
    exit 1
fi

MODEL="$1"

echo "Checking Vertex AI for: $MODEL"
echo ""

# Check gcloud
if ! command -v gcloud &> /dev/null; then
    echo "✗ gcloud CLI not found"
    echo "Install: https://cloud.google.com/sdk/docs/install"
    exit 1
fi

# Get access token
ACCESS_TOKEN=$(gcloud auth application-default print-access-token 2>/dev/null || true)
if [ -z "$ACCESS_TOKEN" ]; then
    echo "✗ Not authenticated"
    echo "Run: gcloud auth application-default login"
    exit 1
fi
echo "✓ Access token obtained"

# Get project
PROJECT_ID=$(gcloud config get-value project 2>/dev/null || true)
if [ -z "$PROJECT_ID" ]; then
    echo "✗ No GCP project set"
    echo "Run: gcloud config set project PROJECT_ID"
    exit 1
fi
echo "✓ GCP project: $PROJECT_ID"

# Test API call (use global endpoint for newer models)
LOCATION="global"
URL="https://aiplatform.googleapis.com/v1/projects/$PROJECT_ID/locations/$LOCATION/publishers/google/models/$MODEL:generateContent"

RESPONSE=$(curl -s -X POST "$URL" \
  -H "Authorization: Bearer $ACCESS_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": {
      "role": "user",
      "parts": [
        {
          "text": "test"
        }
      ]
    }
  }')

# Check for errors
if echo "$RESPONSE" | grep -q '"error"'; then
    ERROR_CODE=$(echo "$RESPONSE" | python3 -c "import sys, json; print(json.load(sys.stdin)['error']['code'])" 2>/dev/null || echo "unknown")

    if [ "$ERROR_CODE" = "404" ]; then
        echo "✗ Model not found (404)"
        echo ""
        echo "Recommendation: Monitor for availability, check again in 1-2 weeks"
        echo "Note: New models are typically announced before Vertex AI rollout"
        echo "Note: Testing with global endpoint (required for Gemini 3+)"
        exit 1
    else
        echo "✗ Error: $ERROR_CODE"
        echo "$RESPONSE" | python3 -m json.tool
        exit 1
    fi
fi

echo "✓ Model is available in Vertex AI"
echo ""
echo "Ready to add to models.yml!"

```

### scripts/run_test_benchmark.sh

```bash
#!/usr/bin/env bash
# Run a small test benchmark to verify model works end-to-end
# Usage: run_test_benchmark.sh <model-name> [benchmark]

set -euo pipefail

if [ $# -lt 1 ]; then
    echo "Usage: $0 <model-name> [benchmark]"
    echo ""
    echo "Arguments:"
    echo "  model-name : Friendly model name from models.yml (e.g., gpt5-1)"
    echo "  benchmark  : Optional benchmark name (default: fizzbuzz)"
    echo ""
    echo "Examples:"
    echo "  $0 gpt5-1"
    echo "  $0 gemini-3-pro recursion_factorial"
    exit 1
fi

MODEL="$1"
BENCHMARK="${2:-fizzbuzz}"

echo "Running test benchmark: $BENCHMARK"
echo "Model: $MODEL"
echo ""

# Check if ailang is available
if ! command -v ailang &> /dev/null; then
    echo "✗ ailang command not found"
    echo "Run: make install"
    exit 1
fi

# Create temp directory for results
TEMP_DIR=$(mktemp -d)
trap "rm -rf $TEMP_DIR" EXIT

echo "✓ Running benchmark (output: $TEMP_DIR)..."
echo ""

# Run eval suite with single benchmark and model
if ailang eval-suite \
    --models "$MODEL" \
    --benchmarks "$BENCHMARK" \
    --output "$TEMP_DIR" \
    --timeout 30s; then

    echo ""
    echo "✓ Benchmark completed"

    # Check results (search recursively in subdirectories like standard/)
    RESULT_FILE=$(find "$TEMP_DIR" -name "${BENCHMARK}_ailang_${MODEL}_*.json" -type f 2>/dev/null | head -1)

    if [ -z "$RESULT_FILE" ]; then
        echo "⚠ No result file found for ailang"
        # Try Python result as fallback
        RESULT_FILE=$(find "$TEMP_DIR" -name "${BENCHMARK}_python_${MODEL}_*.json" -type f 2>/dev/null | head -1)
        if [ -z "$RESULT_FILE" ]; then
            echo "⚠ No result file found at all"
            echo "Check $TEMP_DIR for details"
            exit 1
        fi
        echo "Using Python result file instead"
    fi

    # Parse result - JSON uses stdout_ok (bool), not result (string)
    PASSED=$(python3 -c "import json; r = json.load(open('$RESULT_FILE')); print('true' if r.get('stdout_ok', False) else 'false')")
    INPUT_TOKENS=$(python3 -c "import json; r = json.load(open('$RESULT_FILE')); print(r.get('input_tokens', 0))")
    OUTPUT_TOKENS=$(python3 -c "import json; r = json.load(open('$RESULT_FILE')); print(r.get('output_tokens', 0))")
    COST=$(python3 -c "import json; r = json.load(open('$RESULT_FILE')); print(r.get('cost_usd', r.get('cost', 0)))")

    if [ "$PASSED" = "true" ]; then
        echo "✓ Result: PASS"
    else
        echo "✗ Result: FAIL"
        echo ""
        echo "Generated code:"
        python3 -c "import json; r = json.load(open('$RESULT_FILE')); print(r.get('code', r.get('generated_code', 'N/A')))"
        echo ""
        echo "Error:"
        python3 -c "import json; r = json.load(open('$RESULT_FILE')); print(r.get('error', r.get('stderr', 'N/A')))"
        exit 1
    fi

    echo "✓ Tokens: $INPUT_TOKENS input, $OUTPUT_TOKENS output"
    echo "✓ Cost: \$$COST"
    echo ""
    echo "✓ Model is ready for production use"

else
    echo "✗ Benchmark failed"
    echo "Check logs for details"
    exit 1
fi

```