openrouter
Expert OpenRouter API assistant for AI agents. Use when making API calls to OpenRouter's unified API for 400+ AI models. Covers chat completions, streaming, tool calling, structured outputs, web search, embeddings, multimodal inputs, model selection, routing, and error handling.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install dimitrigilbert-ai-skills-openrouter
Repository
Skill path: openrouter
Expert OpenRouter API assistant for AI agents. Use when making API calls to OpenRouter's unified API for 400+ AI models. Covers chat completions, streaming, tool calling, structured outputs, web search, embeddings, multimodal inputs, model selection, routing, and error handling.
Open repositoryBest for
Primary workflow: Analyze Data & AI.
Technical facets: Full Stack, Backend, Data / AI.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: dimitrigilbert.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install openrouter into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/dimitrigilbert/ai-skills before adding openrouter to shared team environments
- Use openrouter for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: openrouter
description: Expert OpenRouter API assistant for AI agents. Use when making API calls to OpenRouter's unified API for 400+ AI models. Covers chat completions, streaming, tool calling, structured outputs, web search, embeddings, multimodal inputs, model selection, routing, and error handling.
---
# OpenRouter API for AI Agents
Expert guidance for AI agents integrating with OpenRouter API - unified access to 400+ models from 90+ providers.
**When to use this skill:**
- Making chat completions via OpenRouter API
- Selecting appropriate models and variants
- Implementing streaming responses
- Using tool/function calling
- Enforcing structured outputs
- Integrating web search
- Handling multimodal inputs (images, audio, video, PDFs)
- Managing model routing and fallbacks
- Handling errors and retries
- Optimizing cost and performance
---
## API Basics
### Making a Request
**Endpoint**: `POST https://openrouter.ai/api/v1/chat/completions`
**Headers** (required):
```typescript
{
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
// Optional: for app attribution
'HTTP-Referer': 'https://your-app.com',
'X-Title': 'Your App Name'
}
```
**Minimal request structure**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [
{ role: 'user', content: 'Your prompt here' }
]
})
});
```
### Response Structure
**Non-streaming response**:
```json
{
"id": "gen-abc123",
"choices": [{
"message": {
"role": "assistant",
"content": "Response text here"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
},
"model": "anthropic/claude-3.5-sonnet"
}
```
**Key fields**:
- `choices[0].message.content` - The assistant's response
- `choices[0].finish_reason` - Why generation stopped (stop, length, tool_calls, etc.)
- `usage` - Token counts and cost information
- `model` - Actual model used (may differ from requested)
### When to Use Streaming vs Non-Streaming
**Use streaming (`stream: true`)** when:
- Real-time responses needed (chat interfaces, interactive tools)
- Latency matters (user-facing applications)
- Large responses expected (long-form content)
- Want to show progressive output
**Use non-streaming** when:
- Processing in background (batch jobs, async tasks)
- Need complete response before processing
- Building to an API/endpoint
- Response is short (few tokens)
**Streaming basics**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: '...' }],
stream: true
})
});
for await (const chunk of response.body) {
const text = new TextDecoder().decode(chunk);
const lines = text.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6); // Remove 'data: '
if (data === '[DONE]') break;
const parsed = JSON.parse(data);
const content = parsed.choices?.[0]?.delta?.content;
if (content) {
// Accumulate or display content
}
}
}
```
---
## Model Selection
### Model Identifier Format
**Format**: `provider/model-name[:variant]`
Examples:
- `anthropic/claude-3.5-sonnet` - Specific model
- `openai/gpt-4o:online` - With web search enabled
- `google/gemini-2.0-flash:free` - Free tier variant
### Model Variants and When to Use Them
| Variant | Use When | Tradeoffs |
|---------|----------|-----------|
| `:free` | Cost is primary concern, testing, prototyping | Rate limits, lower quality models |
| `:online` | Need current information, real-time data | Higher cost, web search latency |
| `:extended` | Large context window needed | May be slower, higher cost |
| `:thinking` | Complex reasoning, multi-step problems | Higher token usage, slower |
| `:nitro` | Speed is critical | May have quality tradeoffs |
| `:exacto` | Need specific provider | No fallbacks, may be less available |
### Default Model Choices by Task
**General purpose**: `anthropic/claude-3.5-sonnet` or `openai/gpt-4o`
- Balanced quality, speed, cost
- Good for most tasks
**Coding**: `anthropic/claude-3.5-sonnet` or `openai/gpt-4o`
- Strong code generation and understanding
- Good reasoning
**Complex reasoning**: `anthropic/claude-opus-4:thinking` or `openai/o3`
- Deep reasoning capabilities
- Higher cost, slower
**Fast responses**: `openai/gpt-4o-mini:nitro` or `google/gemini-2.0-flash`
- Minimal latency
- Good for real-time applications
**Cost-sensitive**: `google/gemini-2.0-flash:free` or `meta-llama/llama-3.1-70b:free`
- No cost with limits
- Good for high-volume, lower-complexity tasks
**Current information**: `anthropic/claude-3.5-sonnet:online` or `google/gemini-2.5-pro:online`
- Web search built-in
- Real-time data
**Large context**: `anthropic/claude-3.5-sonnet:extended` or `google/gemini-2.5-pro:extended`
- 200K+ context windows
- Document analysis, codebase understanding
### Provider Routing Preferences
**Default behavior**: OpenRouter automatically selects best provider
**Explicit provider order**:
```typescript
{
provider: {
order: ['anthropic', 'openai', 'google'],
allow_fallbacks: true,
sort: 'price' // 'price', 'latency', or 'throughput'
}
}
```
**When to set provider order**:
- Have preferred provider arrangements
- Need to optimize for specific metric (cost, speed)
- Want to exclude certain providers
- Have BYOK (Bring Your Own Key) for specific providers
### Model Fallbacks
**Automatic fallback** - try multiple models in order:
```typescript
{
models: [
'anthropic/claude-3.5-sonnet',
'openai/gpt-4o',
'google/gemini-2.0-flash'
]
}
```
**When to use fallbacks**:
- High reliability required
- Multiple providers acceptable
- Want graceful degradation
- Avoid single point of failure
**Fallback behavior**:
- Tries first model
- Falls to next on error (5xx, 429, timeout)
- Uses whichever succeeds
- Returns which model was used in `model` field
---
## Parameters You Need
### Core Parameters
**model** (string, optional)
- Which model to use
- Default: user's default model
- **Always specify for consistency**
**messages** (Message[], required)
- Conversation history
- Structure: `{ role: 'user'|'assistant'|'system', content: string | ContentPart[] }`
- For multimodal: content can be array of text and image_url parts
**stream** (boolean, default: false)
- Enable Server-Sent Events streaming
- Use for real-time responses
**temperature** (float, 0.0-2.0, default: 1.0)
- Controls randomness
- **0.0-0.3**: Deterministic, factual responses (code, precise answers)
- **0.4-0.7**: Balanced (general use)
- **0.8-1.2**: Creative (brainstorming, creative writing)
- **1.3-2.0**: Highly creative, unpredictable (experimental)
**max_tokens** (integer, optional)
- Maximum tokens to generate
- **Always set** to control cost and prevent runaway responses
- Typical: 100-500 for short, 1000-2000 for long responses
- Model limit: context_length - prompt_length
**top_p** (float, 0.0-1.0, default: 1.0)
- Nucleus sampling - limits to top probability mass
- **Use instead of temperature** when you want predictable diversity
- **0.9-0.95**: Common settings for quality
**top_k** (integer, 0+, default: 0/disabled)
- Limit to K most likely tokens
- **1**: Always most likely (deterministic)
- **40-50**: Balanced
- Not available for OpenAI models
### Sampling Strategy Guidelines
**For code generation**: `temperature: 0.1-0.3, top_p: 0.95`
**For factual responses**: `temperature: 0.0-0.2`
**For creative writing**: `temperature: 0.8-1.2`
**For brainstorming**: `temperature: 1.0-1.5`
**For chat**: `temperature: 0.6-0.8`
### Tool Calling Parameters
**tools** (Tool[], default: [])
- Available functions for model to call
- Structure:
```typescript
{
type: 'function',
function: {
name: 'function_name',
description: 'What it does',
parameters: { /* JSON Schema */ }
}
}
```
**tool_choice** (string | object, default: 'auto')
- Control when tools are called
- `'auto'`: Model decides (default)
- `'none'`: Never call tools
- `'required'`: Must call a tool
- `{ type: 'function', function: { name: 'specific_tool' } }`: Force specific tool
**parallel_tool_calls** (boolean, default: true)
- Allow multiple tools simultaneously
- Set `false` for sequential execution
**When to use tools**:
- Need to query external APIs (weather, search, database)
- Need to perform calculations or data processing
- Building agentic systems
- Need structured data extraction
### Structured Output Parameters
**response_format** (object, optional)
- Enforce specific output format
**JSON object mode**:
```typescript
{ type: 'json_object' }
```
- Model returns valid JSON
- Must also instruct model in system message
**JSON Schema mode** (strict):
```typescript
{
type: 'json_schema',
json_schema: {
name: 'schema_name',
strict: true,
schema: { /* JSON Schema */ }
}
}
```
- Model returns JSON matching exact schema
- **Use when structure is critical** (APIs, data processing)
**When to use structured outputs**:
- Need predictable response format
- Integrating with systems (APIs, databases)
- Data extraction
- Form filling
### Web Search Parameters
**Enable via model variant** (simplest):
```typescript
{ model: 'anthropic/claude-3.5-sonnet:online' }
```
**Enable via plugin**:
```typescript
{
plugins: [{
id: 'web',
enabled: true,
max_results: 5
}]
}
```
**When to use web search**:
- Need current information (news, prices, events)
- User asks about recent developments
- Need factual verification
- Topic requires real-time data
### Other Important Parameters
**user** (string, optional)
- Stable identifier for end-user
- **Set when you have user IDs**
- Helps with abuse detection and caching
**session_id** (string, optional)
- Group related requests
- **Set for conversation tracking**
- Improves caching and observability
**metadata** (Record<string, string>, optional)
- Custom metadata (max 16 key-value pairs)
- **Use for analytics and tracking**
- Keys: max 64 chars, Values: max 512 chars
**stop** (string | string[], optional)
- Stop sequences to halt generation
- Common: `['\n\n', '###', 'END']`
---
## Handling Responses
### Non-Streaming Responses
Extract content:
```typescript
const response = await fetch(/* ... */);
const data = await response.json();
const content = data.choices[0].message.content;
const finishReason = data.choices[0].finish_reason;
const usage = data.usage;
```
Check for tool calls:
```typescript
const toolCalls = data.choices[0].message.tool_calls;
if (toolCalls) {
// Model wants to call tools
for (const toolCall of toolCalls) {
const { name, arguments: args } = toolCall.function;
const parsedArgs = JSON.parse(args);
// Execute tool...
}
}
```
### Streaming Responses
Process SSE stream:
```typescript
let fullContent = '';
const response = await fetch(/* ... */);
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6);
if (data === '[DONE]') break;
const parsed = JSON.parse(data);
const content = parsed.choices?.[0]?.delta?.content;
if (content) {
fullContent += content;
// Process incrementally...
}
// Handle usage in final chunk
if (parsed.usage) {
console.log('Usage:', parsed.usage);
}
}
}
```
Handle streaming tool calls:
```typescript
// Tool calls stream across multiple chunks
let currentToolCall = null;
let toolArgs = '';
for (const parsed of chunks) {
const toolCallChunk = parsed.choices?.[0]?.delta?.tool_calls?.[0];
if (toolCallChunk?.function?.name) {
currentToolCall = { id: toolCallChunk.id, ...toolCallChunk.function };
}
if (toolCallChunk?.function?.arguments) {
toolArgs += toolCallChunk.function.arguments;
}
if (parsed.choices?.[0]?.finish_reason === 'tool_calls' && currentToolCall) {
// Complete tool call
currentToolCall.arguments = toolArgs;
// Execute tool...
}
}
```
### Usage and Cost Tracking
```typescript
const { usage } = data;
console.log(`Prompt: ${usage.prompt_tokens}`);
console.log(`Completion: ${usage.completion_tokens}`);
console.log(`Total: ${usage.total_tokens}`);
// Cost (if available)
if (usage.cost) {
console.log(`Cost: $${usage.cost.toFixed(6)}`);
}
// Detailed breakdown
console.log(usage.prompt_tokens_details);
console.log(usage.completion_tokens_details);
```
---
## Error Handling
### Common HTTP Status Codes
**400 Bad Request**
- Invalid request format
- Missing required fields
- Parameter out of range
- **Fix**: Validate request structure and parameters
**401 Unauthorized**
- Missing or invalid API key
- **Fix**: Check API key format and permissions
**403 Forbidden**
- Insufficient permissions
- Model not allowed
- **Fix**: Check guardrails, model access, API key permissions
**402 Payment Required**
- Insufficient credits
- **Fix**: Add credits to account
**408 Request Timeout**
- Request took too long
- **Fix**: Reduce prompt length, use streaming, try simpler model
**429 Rate Limited**
- Too many requests
- **Fix**: Implement exponential backoff, reduce request rate
**502 Bad Gateway**
- Provider error
- **Fix**: Use model fallbacks, retry with different model
**503 Service Unavailable**
- Service overloaded
- **Fix**: Retry with backoff, use fallbacks
### Retry Strategy
**Exponential backoff**:
```typescript
async function requestWithRetry(url, body, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fetch(url, body);
if (response.ok) {
return await response.json();
}
// Retry on rate limit or server errors
if (response.status === 429 || response.status >= 500) {
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
// Don't retry other errors
return response;
} catch (error) {
if (attempt === maxRetries - 1) throw error;
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
}
```
**Retryable status codes**: 408, 429, 502, 503
**Do not retry**: 400, 401, 403, 402
### Graceful Degradation
**Use model fallbacks**:
```typescript
{
models: [
'anthropic/claude-3.5-sonnet', // Primary
'openai/gpt-4o', // Fallback 1
'google/gemini-2.0-flash' // Fallback 2
]
}
```
**Handle partial failures**:
- Log errors but continue
- Fall back to simpler features
- Use cached responses when available
- Provide degraded experience rather than failing completely
---
## Advanced Features
### When to Use Tool Calling
**Good use cases**:
- Querying external APIs (weather, stock prices, databases)
- Performing calculations or data processing
- Extracting structured data from unstructured text
- Building agentic systems with multiple steps
- When decisions require external information
**Implementation pattern**:
1. Define tools with clear descriptions and parameters
2. Send request with `tools` array
3. Check if `tool_calls` present in response
4. Execute tools with parsed arguments
5. Send tool results back in a new request
6. Repeat until model provides final answer
**See**: `references/ADVANCED_PATTERNS.md` for complete agentic loop implementation
### When to Use Structured Outputs
**Good use cases**:
- API responses (need specific schema)
- Data extraction (forms, documents)
- Configuration files (JSON, YAML)
- Database operations (structured queries)
- When downstream processing requires specific format
**Implementation pattern**:
1. Define JSON Schema for desired output
2. Set `response_format: { type: 'json_schema', json_schema: { ... } }`
3. Instruct model to produce JSON (system or user message)
4. Validate response against schema
5. Handle parsing errors gracefully
**Add response healing** for robustness:
```typescript
{
response_format: { /* ... */ },
plugins: [{ id: 'response-healing' }]
}
```
### When to Use Web Search
**Good use cases**:
- User asks about recent events, news, or current data
- Need verification of facts
- Questions with time-sensitive information
- Topic requires up-to-date information
- User explicitly requests current information
**Simple implementation** (variant):
```typescript
{
model: 'anthropic/claude-3.5-sonnet:online'
}
```
**Advanced implementation** (plugin):
```typescript
{
model: 'openrouter.ai/auto',
plugins: [{
id: 'web',
enabled: true,
max_results: 5,
engine: 'exa' // or 'native'
}]
}
```
### When to Use Multimodal Inputs
**Images** (vision):
- OCR, image understanding, visual analysis
- Models: `openai/gpt-4o`, `anthropic/claude-3.5-sonnet`, `google/gemini-2.5-pro`
**Audio**:
- Speech-to-text, audio analysis
- Models with audio support
**Video**:
- Video understanding, frame analysis
- Models with video support
**PDFs**:
- Document parsing, content extraction
- Requires `file-parser` plugin
**Implementation**: See `references/ADVANCED_PATTERNS.md` for multimodal patterns
---
## Best Practices for AI
### Default Model Selection
**Start with**: `anthropic/claude-3.5-sonnet` or `openai/gpt-4o`
- Good balance of quality, speed, cost
- Strong at most tasks
- Wide compatibility
**Switch based on needs**:
- Need speed → `openai/gpt-4o-mini:nitro` or `google/gemini-2.0-flash`
- Complex reasoning → `anthropic/claude-opus-4:thinking`
- Need web search → `:online` variant
- Large context → `:extended` variant
- Cost-sensitive → `:free` variant
### Default Parameters
```typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [...],
temperature: 0.6, // Balanced creativity
max_tokens: 1000, // Reasonable length
top_p: 0.95 // Common for quality
}
```
**Adjust based on task**:
- Code: `temperature: 0.2`
- Creative: `temperature: 1.0`
- Factual: `temperature: 0.0-0.3`
### When to Prefer Streaming
**Always prefer streaming when**:
- User-facing (chat, interactive tools)
- Response length unknown
- Want progressive feedback
- Latency matters
**Use non-streaming when**:
- Batch processing
- Need complete response before acting
- Building API endpoints
- Very short responses (< 50 tokens)
### When to Enable Specific Features
**Tools**: Enable when you need external data or actions
**Structured outputs**: Enable when response format matters
**Web search**: Enable when current information needed
**Streaming**: Enable for user-facing, real-time responses
**Model fallbacks**: Enable when reliability critical
**Provider routing**: Enable when you have preferences or constraints
### Cost Optimization Patterns
**Use free models for**:
- Testing and prototyping
- Low-complexity tasks
- High-volume, low-value operations
**Use routing to optimize**:
```typescript
{
provider: {
order: ['openai', 'anthropic'],
sort: 'price', // Optimize for cost
allow_fallbacks: true
}
}
```
**Set max_tokens** to prevent runaway responses
**Use caching** via `user` and `session_id` parameters
**Enable prompt caching** when supported
### Performance Optimization
**Reduce latency**:
- Use `:nitro` variants for speed
- Use streaming for perceived speed
- Set `user` ID for caching benefits
- Choose faster models (mini, flash) when quality allows
**Increase throughput**:
- Use provider routing with `sort: 'throughput'`
- Parallelize independent requests
- Use streaming to reduce wait time
**Optimize for specific metrics**:
```typescript
{
provider: {
sort: 'latency' // or 'price' or 'throughput'
}
}
```
---
## Progressive Disclosure
For detailed reference information, consult:
### Parameters Reference
**File**: `references/PARAMETERS.md`
- Complete parameter reference (50+ parameters)
- Types, ranges, defaults
- Parameter support by model
- Usage examples
### Error Codes Reference
**File**: `references/ERROR_CODES.md`
- All HTTP status codes
- Error response structure
- Error metadata types
- Native finish reasons
- Retry strategies
### Model Selection Guide
**File**: `references/MODEL_SELECTION.md`
- Model families and capabilities
- Model variants explained
- Selection criteria by use case
- Model capability matrix
- Provider routing preferences
### Routing Strategies
**File**: `references/ROUTING_STRATEGIES.md`
- Model fallbacks configuration
- Provider selection patterns
- Auto router setup
- Routing by use case (cost, latency, quality)
### Advanced Patterns
**File**: `references/ADVANCED_PATTERNS.md`
- Tool calling with agentic loops
- Structured outputs implementation
- Web search integration
- Multimodal handling
- Streaming patterns
- Framework integrations
### Working Examples
**File**: `references/EXAMPLES.md`
- TypeScript patterns for common tasks
- Python examples
- cURL examples
- Advanced patterns
- Framework integration examples
### Ready-to-Use Templates
**Directory**: `templates/`
- `basic-request.ts` - Minimal working request
- `streaming-request.ts` - SSE streaming with cancellation
- `tool-calling.ts` - Complete agentic loop with tools
- `structured-output.ts` - JSON Schema enforcement
- `error-handling.ts` - Robust retry logic
---
## Quick Reference
### Minimal Request
```typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: 'Your prompt' }]
}
```
### With Streaming
```typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: '...' }],
stream: true
}
```
### With Tools
```typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: '...' }],
tools: [{ type: 'function', function: { name, description, parameters } }],
tool_choice: 'auto'
}
```
### With Structured Output
```typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'system', content: 'Output JSON only...' }],
response_format: { type: 'json_object' }
}
```
### With Web Search
```typescript
{
model: 'anthropic/claude-3.5-sonnet:online',
messages: [{ role: 'user', content: '...' }]
}
```
### With Model Fallbacks
```typescript
{
models: ['anthropic/claude-3.5-sonnet', 'openai/gpt-4o'],
messages: [{ role: 'user', content: '...' }]
}
```
---
**Remember**: OpenRouter is OpenAI-compatible. Use the OpenAI SDK with `baseURL: 'https://openrouter.ai/api/v1'` for a familiar experience.
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### references/ADVANCED_PATTERNS.md
```markdown
# Advanced Patterns
Comprehensive guide to advanced OpenRouter API patterns including tool calling, structured outputs, web search, streaming, multimodal handling, and framework integrations.
**Source**: https://openrouter.ai/docs/guides/features/
---
## Tool / Function Calling
### Overview
Three-step process for enabling LLMs to execute external functions.
**1. Inference Request**: Send tools in initial request
**2. Tool Execution**: Execute requested tools client-side
**3. Response with Results**: Send tool results back to model
### Step 1: Request with Tools
**Define tools**:
```typescript
const tools = [{
type: 'function',
function: {
name: 'get_weather',
description: 'Get current weather for a location',
parameters: {
type: 'object',
properties: {
location: {
type: 'string',
description: 'City name'
},
unit: {
type: 'string',
enum: ['celsius', 'fahrenheit']
}
},
required: ['location']
}
}
}, {
type: 'function',
function: {
name: 'search_database',
description: 'Search the database for records',
parameters: {
type: 'object',
properties: {
query: {
type: 'string',
description: 'Search query'
},
limit: {
type: 'integer',
description: 'Maximum results',
default: 10
}
},
required: ['query']
}
}
}];
```
**Make request**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [
{ role: 'user', content: 'What\'s the weather in San Francisco?' }
],
tools: tools,
tool_choice: 'auto'
})
});
const data = await response.json();
```
### Step 2: Execute Tools
**Check for tool calls**:
```typescript
const toolCalls = data.choices[0].message.tool_calls;
if (toolCalls) {
for (const toolCall of toolCalls) {
const { name, arguments: args } = toolCall.function;
const parsedArgs = JSON.parse(args);
console.log('Calling tool:', name, parsedArgs);
// Execute tool
const result = await executeTool(name, parsedArgs);
console.log('Tool result:', result);
}
}
```
**Tool execution function**:
```typescript
async function executeTool(name, args) {
switch (name) {
case 'get_weather':
return await getWeatherAPI(args.location, args.unit);
case 'search_database':
return await searchDatabase(args.query, args.limit);
default:
throw new Error(`Unknown tool: ${name}`);
}
}
```
### Step 3: Send Results Back
**Add tool response to messages**:
```typescript
const messages = [
{ role: 'user', content: 'What\'s the weather in San Francisco?' },
{
role: 'assistant',
content: null,
tool_calls: toolCalls
}
];
// Add tool results
for (const toolCall of toolCalls) {
const result = await executeTool(toolCall.function.name, JSON.parse(toolCall.function.arguments));
messages.push({
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(result)
});
}
// Send final request
const finalResponse = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: messages,
tools: tools
})
});
const finalData = await finalResponse.json();
console.log('Final response:', finalData.choices[0].message.content);
```
### Agentic Loop Pattern
**Automatic multi-turn tool execution**:
```typescript
async function runAgenticLoop(initialPrompt, tools, maxIterations = 10) {
let messages = [{ role: 'user', content: initialPrompt }];
let iterations = 0;
while (iterations < maxIterations) {
iterations++;
// Call LLM
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: messages,
tools: tools,
tool_choice: 'auto',
parallel_tool_calls: true
})
});
const data = await response.json();
const assistantMessage = data.choices[0].message;
// Add assistant message to history
messages.push(assistantMessage);
// Check if done (no tool calls)
if (!assistantMessage.tool_calls) {
console.log('Agentic loop complete:', assistantMessage.content);
return assistantMessage.content;
}
// Execute all tools in parallel
const toolPromises = assistantMessage.tool_calls.map(async (toolCall) => {
const result = await executeTool(toolCall.function.name, JSON.parse(toolCall.function.arguments));
return {
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(result)
};
});
// Wait for all tools to complete
const toolResults = await Promise.all(toolPromises);
messages.push(...toolResults);
console.log(`Iteration ${iterations} complete, ${toolResults.length} tools called`);
}
throw new Error('Agentic loop exceeded max iterations');
}
```
**Usage**:
```typescript
const tools = [/* tool definitions */];
const result = await runAgenticLoop(
'Research the latest AI developments and summarize them',
tools,
10
);
```
### Tool Choice Control
**Auto** (default):
```typescript
{ tool_choice: 'auto' }
```
Model decides whether to call tools.
**None**:
```typescript
{ tool_choice: 'none' }
```
Never call tools, generate text only.
**Required**:
```typescript
{ tool_choice: 'required' }
```
Model must call at least one tool.
**Specific function**:
```typescript
{
tool_choice: {
type: 'function',
function: { name: 'get_weather' }
}
}
```
Force specific tool call.
### Parallel vs Sequential Tool Calls
**Parallel** (default, `parallel_tool_calls: true`):
```typescript
{
tools: [tool1, tool2, tool3],
parallel_tool_calls: true // Default
}
```
**Sequential** (`parallel_tool_calls: false`):
```typescript
{
tools: [tool1, tool2, tool3],
parallel_tool_calls: false
}
```
**When to use parallel**:
- Independent tools (no dependencies)
- Speed matters
- Tools don't have side effects
**When to use sequential**:
- Tools have dependencies
- Order matters
- Tools have side effects
---
## Structured Outputs
### JSON Object Mode
**Simple JSON enforcement**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [
{
role: 'system',
content: 'Output valid JSON only. No other text.'
},
{
role: 'user',
content: 'Describe the weather in San Francisco'
}
],
response_format: { type: 'json_object' }
})
});
const data = await response.json();
const weatherData = JSON.parse(data.choices[0].message.content);
```
### JSON Schema Mode (Strict)
**Define JSON Schema**:
```typescript
const weatherSchema = {
type: 'object',
properties: {
location: {
type: 'string',
description: 'City name'
},
temperature: {
type: 'number',
description: 'Temperature in Celsius'
},
conditions: {
type: 'string',
description: 'Weather conditions'
},
humidity: {
type: 'number',
description: 'Humidity percentage'
}
},
required: ['location', 'temperature', 'conditions', 'humidity'],
additionalProperties: false
};
```
**Make request**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [{
role: 'user',
content: 'What\'s the weather in San Francisco?'
}],
response_format: {
type: 'json_schema',
json_schema: {
name: 'weather_report',
strict: true,
schema: weatherSchema
}
}
})
});
const data = await response.json();
const weatherData = JSON.parse(data.choices[0].message.content);
// Validate against schema
const isValid = validateSchema(weatherData, weatherSchema);
if (!isValid) {
throw new Error('Invalid response schema');
}
```
### Response Healing
**Automatically repair malformed JSON**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [{
role: 'user',
content: 'Extract key information...'
}],
response_format: { type: 'json_object' },
plugins: [{
id: 'response-healing' // Enable auto-repair
}]
})
});
const data = await response.json();
const content = data.choices[0].message.content;
// Parse JSON (will be valid even if model made errors)
const result = JSON.parse(content);
```
**Benefits**:
- Reduces parsing errors
- Fixes common JSON issues (missing quotes, trailing commas)
- Works with any model
---
## Web Search
### Simple :online Variant
**Easiest method**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet:online',
messages: [{
role: 'user',
content: 'What are the latest AI developments in 2026?'
}]
})
});
```
**Works with free models**:
```typescript
{
model: 'openai/gpt-oss-20b:free:online'
}
```
### Plugin Configuration
**Advanced web search**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'openrouter.ai/auto',
plugins: [{
id: 'web',
enabled: true,
max_results: 5,
engine: 'exa' // or 'native'
}],
messages: [{
role: 'user',
content: 'What\'s happening in AI today?'
}]
})
});
```
### Search Engines
**Native**: Provider's built-in search
- OpenAI, Anthropic, Perplexity, xAI
**Exa**: Third-party search API
- All other providers
- $4 per 1000 results
**Force Native**:
```typescript
{ plugins: [{ id: 'web', engine: 'native' }] }
```
**Force Exa**:
```typescript
{ plugins: [{ id: 'web', engine: 'exa' }] }
```
### Handling Citations
**Response with citations**:
```json
{
"choices": [{
"message": {
"role": "assistant",
"content": "Latest AI developments include new models released in 2026. According to [OpenAI](https://openai.com), they launched GPT-4o...",
"annotations": [{
"type": "url_citation",
"url_citation": {
"url": "https://openai.com",
"title": "OpenAI Blog",
"start_index": 100,
"end_index": 107
}
}]
}
}]
}
```
**Extract citations**:
```typescript
const message = data.choices[0].message;
const content = message.content;
const annotations = message.annotations || [];
for (const annotation of annotations) {
if (annotation.type === 'url_citation') {
const citation = annotation.url_citation;
console.log('Source:', citation.url);
console.log('Title:', citation.title);
console.log('Position:', `${citation.start_index}-${citation.end_index}`);
}
}
```
### Search Context Size
**Configure via web_search_options**:
```typescript
{
web_search_options: {
search_context_size: 'high' // 'low' | 'medium' | 'high'
}
}
```
**Effects**:
- `low`: Minimal context, lowest cost
- `medium`: Moderate context (default)
- `high`: Extensive context, higher cost
**User location**:
```typescript
{
web_search_options: {
user_location: {
type: 'approximate',
city: 'San Francisco',
country: 'USA'
}
}
}
```
---
## Streaming
### Basic Streaming
**Enable streaming**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true
})
});
```
**Process SSE stream**:
```typescript
let fullContent = '';
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6); // Remove 'data: '
if (data === '[DONE]') break;
const parsed = JSON.parse(data);
const content = parsed.choices?.[0]?.delta?.content;
if (content) {
fullContent += content;
// Process incrementally...
console.log(content);
}
// Usage in final chunk
if (parsed.usage) {
console.log('Usage:', parsed.usage);
}
}
}
console.log('Complete response:', fullContent);
```
### Streaming with Cancellation
**AbortController for cancellation**:
```typescript
const controller = new AbortController();
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: '...' }],
stream: true
}),
signal: controller.signal
});
// Process stream...
// Cancel stream
controller.abort();
```
**Handle cancellation**:
```typescript
try {
await processStream(response);
} catch (error) {
if (error.name === 'AbortError') {
console.log('Stream cancelled');
} else {
throw error;
}
}
```
### Streaming Tool Calls
**Tool calls stream across multiple chunks**:
```typescript
let currentToolCall = null;
let toolArgs = '';
let isToolStreaming = false;
for await (const chunk of stream) {
const parsed = JSON.parse(chunk);
const delta = parsed.choices?.[0]?.delta;
if (delta?.tool_calls) {
for (const toolCallChunk of delta.tool_calls) {
if (toolCallChunk.function?.name) {
currentToolCall = { id: toolCallChunk.id, ...toolCallChunk.function };
toolArgs = '';
isToolStreaming = true;
}
if (toolCallChunk.function?.arguments) {
toolArgs += toolCallChunk.function.arguments;
}
}
}
if (parsed.choices?.[0]?.finish_reason === 'tool_calls' && currentToolCall) {
isToolStreaming = false;
currentToolCall.arguments = toolArgs;
console.log('Complete tool call:', currentToolCall);
// Execute tool...
const result = await executeTool(currentToolCall.name, JSON.parse(currentToolCall.arguments));
// Send result back...
}
}
```
### Streaming with Usage in Every Chunk
**Enable usage tracking**:
```typescript
{
stream: true,
stream_options: {
include_usage: true // Include usage in every chunk
}
}
```
**Process usage**:
```typescript
for await (const chunk of stream) {
const parsed = JSON.parse(chunk);
// Content
const content = parsed.choices?.[0]?.delta?.content;
if (content) { /* ... */ }
// Usage (in every chunk)
if (parsed.usage) {
console.log('Running usage:', parsed.usage);
}
}
```
---
## Multimodal
### Image Input
**Vision model with image**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [{
role: 'user',
content: [
{
type: 'text',
text: 'What\'s in this image?'
},
{
type: 'image_url',
image_url: {
url: 'https://example.com/image.jpg',
detail: 'high' // 'low' | 'auto' | 'high'
}
}
]
}]
})
});
```
**Base64 encoded image**:
```typescript
{
type: 'image_url',
image_url: {
url: 'data:image/jpeg;base64,/9j/4AAQSkZJRg...'
}
}
```
**Detail levels**:
- `'low'`: Fastest, lowest resolution
- `'auto'`: Balanced (default)
- `'high'`: Slowest, highest resolution
### Audio Input
**Audio-capable model**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'openai/gpt-4o',
messages: [{
role: 'user',
content: [{
type: 'input_audio',
input_audio: {
data: 'base64_encoded_audio...',
format: 'mp3' // mp3, wav, m4a, etc.
}
}]
}]
})
});
```
### Video Input
**Video-capable model**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'openai/gpt-4o',
messages: [{
role: 'user',
content: [{
type: 'input_video',
video_url: {
url: 'https://example.com/video.mp4'
}
}]
}]
})
});
```
### PDF Input
**Parse PDF with file-parser plugin**:
```typescript
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
plugins: [{
id: 'file-parser',
enabled: true,
pdf: {
engine: 'mistral-ocr' // 'mistral-ocr' | 'pdf-text' | 'native'
}
}],
messages: [{
role: 'user',
content: [{
type: 'input_file',
file_id: 'file_abc123' // File ID from upload
}]
}]
})
});
```
**PDF engines**:
- `'mistral-ocr'`: OCR with Mistral
- `'pdf-text'`: Text extraction
- `'native'`: Provider native
---
## Framework Integrations
### OpenAI SDK
**Basic setup**:
```typescript
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: process.env.OPENROUTER_API_KEY,
defaultHeaders: {
'HTTP-Referer': 'https://your-app.com',
'X-Title': 'Your App'
}
});
const completion = await openai.chat.completions.create({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: 'Hello!' }]
});
console.log(completion.choices[0].message);
```
**Streaming with OpenAI SDK**:
```typescript
const stream = await openai.chat.completions.create({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
console.log(content);
}
}
```
**Tool calling with OpenAI SDK**:
```typescript
const response = await openai.chat.completions.create({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: 'What\'s the weather?' }],
tools: [/* tool definitions */],
tool_choice: 'auto'
});
const toolCalls = response.choices[0].message.tool_calls;
if (toolCalls) {
// Execute tools...
}
```
### @openrouter/sdk
**Official OpenRouter SDK**:
```typescript
import { OpenRouter } from '@openrouter/sdk';
const openRouter = new OpenRouter({
apiKey: process.env.OPENROUTER_API_KEY
});
const completion = await openRouter.chat.send({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: 'Hello!' }]
});
console.log(completion.choices[0].message);
```
**Streaming**:
```typescript
const stream = await openRouter.chat.send({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: 'Hello!' }],
stream: true
});
for await (const chunk of stream) {
console.log(chunk.choices[0].delta.content);
}
```
---
## Advanced Patterns
### Retry with Backoff
**Robust retry logic**:
```typescript
async function requestWithRetry(options, maxRetries = 3) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify(options)
});
if (response.ok) {
return await response.json();
}
// Retry on rate limit or server errors
if (response.status === 429 || response.status >= 500) {
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
const jitter = Math.random() * 1000;
await new Promise(resolve => setTimeout(resolve, delay + jitter));
continue;
}
return response;
} catch (error) {
if (attempt === maxRetries - 1) throw error;
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
}
```
### Batch Processing
**Process multiple requests in parallel**:
```typescript
async function batchProcess(prompts, model) {
const batchSize = 5; // Adjust based on rate limits
const results = [];
for (let i = 0; i < prompts.length; i += batchSize) {
const batch = prompts.slice(i, i + batchSize);
const batchPromises = batch.map(prompt =>
fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify({
model: model,
messages: [{ role: 'user', content: prompt }]
})
}).then(r => r.json())
);
const batchResults = await Promise.all(batchPromises);
results.push(...batchResults);
// Rate limiting delay if needed
if (i + batchSize < prompts.length) {
await new Promise(resolve => setTimeout(resolve, 100));
}
}
return results;
}
```
### Cost Tracking
**Track costs across requests**:
```typescript
let totalCost = 0;
async function trackCost(request) {
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: { /* ... */ },
body: JSON.stringify(request)
});
const data = await response.json();
if (data.usage?.cost) {
totalCost += data.usage.cost;
console.log(`Request cost: $${data.usage.cost.toFixed(6)}`);
console.log(`Total cost: $${totalCost.toFixed(6)}`);
}
return data;
}
```
---
## Quick Reference
### Tool Calling Pattern
1. Define tools
2. Request with tools
3. Check for tool_calls
4. Execute tools
5. Send results back
6. Repeat until final answer
### Structured Output Pattern
1. Define JSON Schema
2. Set response_format: { type: 'json_schema' }
3. Instruct model for JSON
4. Parse and validate response
5. Add response-healing plugin
### Web Search Pattern
1. Use :online variant (simplest)
2. Or use web plugin (advanced)
3. Handle citations in response
4. Configure search context as needed
### Streaming Pattern
1. Set stream: true
2. Read SSE stream
3. Parse each data: line
4. Extract delta content
5. Check for [DONE] marker
---
**Sources**:
- https://openrouter.ai/docs/guides/features/tool-calling.mdx
- https://openrouter.ai/docs/guides/features/structured-outputs.mdx
- https://openrouter.ai/docs/guides/features/plugins/web-search.mdx
- https://openrouter.ai/docs/api/reference/streaming.mdx
- https://openrouter.ai/docs/guides/overview/multimodal/images.mdx
```
### references/PARAMETERS.md
```markdown
# Parameters Reference
Complete reference for all OpenRouter API request parameters with types, ranges, defaults, and usage guidance.
**Source**: https://openrouter.ai/docs/api/reference/parameters.mdx
---
## Core Parameters
### model
- **Type**: `string`
- **Required**: No (uses user default if unspecified)
- **Description**: Model identifier to use
- **Format**: `provider/model-name[:variant]`
- **Examples**:
- `"anthropic/claude-3.5-sonnet"`
- `"openai/gpt-4o:online"`
- `"google/gemini-2.0-flash:free"`
- **Default**: User's default model
- **Guidance**: Always specify explicitly for consistency
---
### messages
- **Type**: `Message[]`
- **Required**: Yes
- **Description**: Conversation history
**Message structure**:
```typescript
type Message = {
role: 'system' | 'user' | 'assistant';
content: string | ContentPart[];
name?: string; // For non-OpenAI models, prepends to content
}
type ContentPart =
| { type: 'text'; text: string }
| { type: 'image_url'; image_url: { url: string; detail?: 'low' | 'auto' | 'high' } }
| { type: 'input_audio'; input_audio: { data: string; format: string } }
| { type: 'input_video'; video_url: { url: string } }
| { type: 'input_file'; file_id: string };
```
**Tool response message**:
```typescript
{
role: 'tool';
tool_call_id: string;
content: string; // JSON string of result
name?: string;
}
```
**Guidance**:
- Always start with system message for behavior guidance
- Include conversation history for context
- Use array of ContentPart for multimodal inputs
---
### stream
- **Type**: `boolean`
- **Required**: No
- **Default**: `false`
- **Description**: Enable Server-Sent Events (SSE) streaming
- **Effect**: Returns response chunks as they're generated
- **Response format**: SSE stream with `data: { ... }` lines
- **Guidance**: Use for real-time responses, user-facing applications
---
## Sampling Parameters
### temperature
- **Type**: `float`
- **Range**: `0.0` to `2.0`
- **Default**: `1.0`
- **Description**: Controls randomness in token selection
**Behavior**:
- `0.0`: Deterministic, always same output
- `0.1-0.3`: Low randomness (factual, precise)
- `0.4-0.7`: Balanced
- `0.8-1.2`: Higher creativity
- `1.3-2.0`: Highly creative, unpredictable
**Guidance**:
- Code generation: `0.1-0.3`
- Factual responses: `0.0-0.3`
- Chat: `0.6-0.8`
- Creative writing: `0.8-1.2`
- Brainstorming: `1.0-1.5`
---
### top_p
- **Type**: `float`
- **Range**: `0.0` to `1.0`
- **Default**: `1.0`
- **Description**: Nucleus sampling - limit to tokens whose probabilities sum to P
**Behavior**:
- `0.9`: Only top 90% of tokens by probability
- `0.95`: Only top 95% of tokens
- `1.0`: Consider all tokens (no limit)
**Guidance**:
- Use as alternative to temperature
- Common values: `0.9`, `0.95`
- Combining with temperature: typically only use one
---
### top_k
- **Type**: `integer`
- **Range**: `0` or above
- **Default**: `0` (disabled)
- **Description**: Limit to K most likely tokens at each step
**Behavior**:
- `1`: Always pick most likely token (deterministic)
- `10`: Consider top 10 tokens
- `50`: Consider top 50 tokens
- `0`: Consider all tokens (disabled)
**Guidance**:
- Not available for OpenAI models
- Good alternative to top_p for some models
- Lower values = more predictable
---
### frequency_penalty
- **Type**: `float`
- **Range**: `-2.0` to `2.0`
- **Default**: `0.0`
- **Description**: Penalize tokens based on frequency in input
**Behavior**:
- `0.0`: No effect
- `0.5-1.0`: Reduce repetition (positive)
- `-0.5 to -1.0`: Encourage repetition (negative)
- Scales with occurrence count
**Guidance**:
- Use to reduce word/phrase repetition
- Higher values may reduce coherence
- Combine with presence_penalty for best results
---
### presence_penalty
- **Type**: `float`
- **Range**: `-2.0` to `2.0`
- **Default**: `0.0`
- **Description**: Penalize tokens already used (regardless of frequency)
**Behavior**:
- `0.0`: No effect
- `0.5-1.0`: Encourage new topics, reduce repetition
- `-0.5 to -1.0`: Encourage staying on topic
- Does NOT scale with occurrence count
**Guidance**:
- Use to encourage topic diversity
- Good for exploration, brainstorming
- Combine with frequency_penalty
---
### repetition_penalty
- **Type**: `float`
- **Range**: `0.0` to `2.0`
- **Default**: `1.0`
- **Description**: Reduce token repetition from input
**Behavior**:
- `1.0`: No effect
- `1.2-1.5`: Reduce repetition
- Too high: May cause incoherence, run-on sentences
- Scales based on original token probability
**Guidance**:
- Alternative to frequency/presence penalties
- Available on non-OpenAI models
- Start with `1.1-1.2`
---
### min_p
- **Type**: `float`
- **Range**: `0.0` to `1.0`
- **Default**: `0.0`
- **Description**: Minimum probability relative to most likely token
**Behavior**:
- `0.1`: Only tokens at least 10% as probable as best token
- `0.5`: Only tokens at least 50% as probable
- `0.0`: No filtering
**Guidance**:
- Dynamic filtering based on confidence
- Adjusts automatically per token position
- Good alternative to top_p for some models
---
### top_a
- **Type**: `float`
- **Range**: `0.0` to `1.0`
- **Default**: `0.0`
- **Description**: Filter tokens with "sufficiently high" probability
**Behavior**:
- Similar to top_p but probability-based
- Lower: Narrower focus
- Higher: Broader consideration
- Adjusts dynamically based on max probability
**Guidance**:
- Good for creative writing
- Experimental parameter
- Works well with some open-source models
---
## Length Control Parameters
### max_tokens
- **Type**: `integer`
- **Range**: `1` to (context_length - prompt_length)
- **Default**: Model-dependent
- **Description**: Maximum tokens to generate
**Guidance**:
- **Always set** to control cost
- Prevents runaway responses
- Typical values:
- Short answers: 100-500
- Medium: 500-1000
- Long-form: 1000-2000
- Response stops at limit even if incomplete
---
### max_completion_tokens
- **Type**: `integer`
- **Range**: `1` to model limit
- **Default**: Model-dependent
- **Description**: Maximum tokens in completion (excluding reasoning tokens)
**Guidance**:
- Use with reasoning models
- Separate reasoning tokens from output tokens
- Controls actual response length, not reasoning
---
### stop
- **Type**: `string | string[]`
- **Default**: `null`
- **Description**: Stop sequences to halt generation
**Behavior**:
- Stops when any sequence encountered
- Sequences not included in output
- Case-sensitive
**Common examples**:
```typescript
stop: ['\n\n', '###', 'END', '---']
```
**Guidance**:
- Use to control output structure
- Good for code blocks, lists
- Prevents unwanted continuations
---
## Output Format Parameters
### response_format
- **Type**: `ResponseFormat`
- **Default**: `null`
- **Description**: Enforce specific output format
**Text mode** (default):
```typescript
{ type: 'text' }
```
**JSON object mode**:
```typescript
{ type: 'json_object' }
```
- Model returns valid JSON
- Must also instruct model in system message
- Does NOT enforce schema
**JSON Schema mode** (strict):
```typescript
{
type: 'json_schema',
json_schema: {
name: 'schema_name',
strict: true,
schema: { /* JSON Schema */ }
}
}
```
- Enforces exact schema
- Model must return valid JSON matching schema
- Supported by: OpenAI, Anthropic, Google, most open-source
**Grammar mode**:
```typescript
{
type: 'grammar',
grammar: 'custom_grammar'
}
```
- Model-specific grammar
- Advanced use cases
**Python mode**:
```typescript
{ type: 'python' }
```
- For Python code generation
**Guidance**:
- Use `json_object` for simple JSON
- Use `json_schema` for structured data, APIs
- Add response healing plugin for robustness
- Model support varies - check capabilities
---
## Tool/Function Calling Parameters
### tools
- **Type**: `Tool[]`
- **Default**: `[]`
- **Description**: Available functions for model to call
**Structure**:
```typescript
type Tool = {
type: 'function';
function: {
name: string; // Function name
description?: string; // What it does
parameters: object; // JSON Schema for arguments
strict?: boolean; // Enforce schema
};
};
```
**Example**:
```typescript
{
tools: [{
type: 'function',
function: {
name: 'get_weather',
description: 'Get current weather for a location',
parameters: {
type: 'object',
properties: {
location: {
type: 'string',
description: 'City name'
},
unit: {
type: 'string',
enum: ['celsius', 'fahrenheit']
}
},
required: ['location']
}
}
}]
}
```
**Guidance**:
- Provide clear descriptions for good tool selection
- Use JSON Schema for parameter validation
- Check model supports tools parameter
- Find supporting models: `openrouter.ai/models?supported_parameters=tools`
---
### tool_choice
- **Type**: `'auto' | 'none' | 'required' | { type: 'function'; function: { name: string } }`
- **Default**: `'auto'`
- **Description**: Control when/if tools are called
**Options**:
**'auto'** (default):
- Model decides whether to call tools
- Good default for most cases
**'none'**:
- Never call tools
- Model generates text only
- Use when you don't want tools
**'required'**:
- Model must call at least one tool
- Forces tool use
- Good for agentic workflows
**Specific function**:
```typescript
{
type: 'function',
function: { name: 'specific_function' }
}
```
- Force specific tool call
- Use when you know which tool is needed
**Guidance**:
- Default to `'auto'`
- Use `'required'` for multi-step tasks
- Use specific function when context is clear
---
### parallel_tool_calls
- **Type**: `boolean`
- **Default**: `true`
- **Description**: Allow parallel function calls
**Behavior**:
- `true`: Model can call multiple tools simultaneously
- `false`: Tools called sequentially
**Guidance**:
- Keep `true` for efficiency
- Set `false` when tools have dependencies
- Parallel calls reduce latency
---
## Reasoning Parameters
### reasoning
- **Type**: `object`
- **Default**: `null`
- **Description**: Configure model reasoning behavior
**Properties**:
**effort**:
- Type: `string`
- Options: `'xhigh' | 'high' | 'medium' | 'low' | 'minimal' | 'none'`
- Default: Model-dependent
- Description: Amount of computational effort
**summary**:
- Type: `string`
- Options: `'auto' | 'concise' | 'detailed'`
- Default: `'auto'`
- Description: Verbosity of reasoning summary
**Example**:
```typescript
{
reasoning: {
effort: 'high',
summary: 'detailed'
}
}
```
**Guidance**:
- Use with reasoning models (Claude Opus, OpenAI o1/o3)
- Higher effort = better reasoning, more cost
- Use `minimal` for simple tasks
---
### include_reasoning
- **Type**: `boolean`
- **Default**: `false`
- **Description**: Include reasoning in response
**Guidance**:
- Supported by reasoning-capable models
- Increases token usage and cost
- Use for debugging or transparency
---
## Probability Parameters
### logprobs
- **Type**: `boolean`
- **Default**: `false`
- **Description**: Return log probabilities for output tokens
**Guidance**:
- Requires model support
- Useful for debugging, analysis
- Increases response size
---
### top_logprobs
- **Type**: `integer`
- **Range**: `0` to `20`
- **Default**: `null`
- **Description**: Number of top log probs to return per token
**Requires**: `logprobs: true`
**Guidance**:
- Only available when logprobs enabled
- Used with top_k or top_p
- Good for understanding model confidence
---
### logit_bias
- **Type**: `Record<number, number>`
- **Default**: `null`
- **Description**: Bias specific tokens
**Format**: `{ token_id: bias_value }`
**Range**: `-100` to `100`
**Effect**:
- `-100`: Ban token
- `-10 to -1`: Less likely
- `1 to 10`: More likely
- `100`: Force selection
**Example**:
```typescript
{
logit_bias: {
12345: -100, // Ban specific token
67890: 5 // Encourage token
}
}
```
**Guidance**:
- Token IDs depend on model's tokenizer
- Use for style control, preventing outputs
- Not available for all models
---
## Routing Parameters
### route
- **Type**: `'fallback' | 'sort' | null`
- **Default**: `null`
- **Description**: Routing strategy
**'fallback'**:
- Try models in order
- Use with `models` array
**'sort'**:
- Sort by provider preferences
- Use with `provider.sort`
**Guidance**:
- Use with models array for fallbacks
- Use with provider preferences for optimization
---
### models
- **Type**: `string[]`
- **Default**: `null`
- **Description**: Array of model IDs for automatic fallback
**Behavior**:
- Tries models in order
- Falls back to next on error
- Uses whichever model succeeds
**Example**:
```typescript
{
models: [
'anthropic/claude-3.5-sonnet',
'openai/gpt-4o',
'google/gemini-2.0-flash'
]
}
```
**Guidance**:
- Use for high reliability
- Order by preference
- Include models from different providers
- Returns actual model used in response
---
### provider
- **Type**: `ProviderPreferences`
- **Default**: `null`
- **Description**: Provider routing preferences
**Properties**:
**order** (`string[]`):
- Preferred provider order
- Example: `['openai', 'anthropic', 'google']`
**allow_fallbacks** (`boolean`):
- Enable automatic provider fallbacks
- Default: `true`
**require_parameters** (`boolean`):
- Only use providers supporting all parameters
- Default: `false`
**data_collection** (`'allow' | 'deny'`):
- Control data retention
- Default: `'allow'`
**only** (`string[]`):
- Whitelist specific providers
- Example: `['openai', 'anthropic']`
**ignore** (`string[]`):
- Blacklist specific providers
- Example: `['openai']`
**quantizations** (`string[]`):
- Filter by quantization level
- Options: `'int4' | 'int8' | 'fp4' | 'fp6' | 'fp8' | 'fp16' | 'bf16' | 'fp32'`
**sort** (`'price' | 'throughput' | 'latency'`):
- Sort providers by metric
- Default: `null`
**max_price** (`object`):
- Maximum pricing thresholds
- Properties:
- `prompt`: Price per 1M prompt tokens
- `completion`: Price per 1M completion tokens
- `request`: Fixed price per request
**preferred_min_throughput** (`number`):
- Minimum tokens/second threshold
- Can be percentile object: `{ p50, p75, p90, p99 }`
**preferred_max_latency** (`number`):
- Maximum latency threshold in seconds
- Can be percentile object: `{ p50, p75, p90, p99 }`
**Example**:
```typescript
{
provider: {
order: ['openai', 'anthropic'],
allow_fallbacks: true,
data_collection: 'deny',
sort: 'price',
ignore: ['provider_to_exclude'],
max_price: {
prompt: 10, // $10 per 1M prompt tokens
completion: 30 // $30 per 1M completion tokens
}
}
}
```
**Guidance**:
- Use to optimize for cost, speed, or throughput
- Set allow_fallbacks: true for reliability
- Use sort to prioritize specific metric
- Set data_collection: 'deny' for Zero Data Retention
---
## Plugins
### plugins
- **Type**: `Plugin[]`
- **Default**: `[]`
- **Description**: Enable model plugins
**Available plugins**:
**Web Search** (`web`):
```typescript
{
id: 'web',
enabled: true,
max_results?: number, // Default: 5
engine?: 'native' | 'exa', // Default: native if available
search_prompt?: string
}
```
- Real-time web search
- Exa: $4 per 1000 results
- Native: Provider-specific pricing
**File Parser** (`file-parser`):
```typescript
{
id: 'file-parser',
enabled: true,
pdf?: {
engine?: 'mistral-ocr' | 'pdf-text' | 'native'
}
}
```
- Parse PDFs and documents
- OCR capabilities
**Response Healing** (`response-healing`):
```typescript
{
id: 'response-healing',
enabled: true
}
```
- Automatically repair malformed JSON
- Works with any model
**Auto Router** (`auto-router`):
```typescript
{
id: 'auto-router',
allowed_models?: string[] // e.g., ['openai/*', 'anthropic/*']
}
```
- Automatic model selection
- Intelligent routing
**Moderation** (`moderation`):
```typescript
{
id: 'moderation'
}
```
- Content moderation
- Safety filtering
**Example**:
```typescript
{
plugins: [
{
id: 'web',
enabled: true,
max_results: 5
},
{
id: 'response-healing'
}
]
}
```
**Guidance**:
- Use `:online` model variant for simple web search
- Use plugin for advanced configuration
- Add response-healing for structured outputs
- Use file-parser for PDF processing
---
## Metadata Parameters
### user
- **Type**: `string`
- **Default**: `null`
- **Description**: Stable identifier for end-user
**Purpose**:
- Abuse detection
- Request caching
- Analytics and reporting
**Constraints**:
- Max length: 128 characters
**Guidance**:
- Set when you have user IDs
- Helps with caching and rate limiting
- Not the same as API key
---
### session_id
- **Type**: `string`
- **Default**: `null`
- **Description**: Group related requests
**Purpose**:
- Observability and analytics
- Conversation tracking
- Cache optimization
**Constraints**:
- Max length: 128 characters
- Body value overrides header value
**Guidance**:
- Use for conversation tracking
- Set once per conversation
- Improves caching for related requests
---
### metadata
- **Type**: `Record<string, string>`
- **Default**: `null`
- **Description**: Custom metadata for request
**Constraints**:
- Max 16 key-value pairs
- Keys: Max 64 characters, no brackets
- Values: Max 512 characters
**Purpose**:
- Analytics and tracking
- Request categorization
- Debugging
**Example**:
```typescript
{
metadata: {
application: 'my-app',
version: '1.0.0',
feature: 'chat',
environment: 'production'
}
}
```
**Guidance**:
- Use for observability
- Keep keys consistent across requests
- Don't include sensitive data
---
## Transform Parameters
### transforms
- **Type**: `string[]`
- **Default**: `[]`
- **Description**: Message transformation pipeline
**Guidance**:
- Advanced feature
- See Message Transforms documentation
- Used for pre/post-processing
---
## Debug Parameters
### debug
- **Type**: `DebugOptions`
- **Default**: `null`
- **Description**: Debugging options (streaming only)
**Properties**:
**echo_upstream_body** (`boolean`):
- Return transformed request body
- Default: `false`
- **WARNING**: Do not use in production
- Only works with streaming
**Example**:
```typescript
{
stream: true,
debug: {
echo_upstream_body: true
}
}
```
**Guidance**:
- For debugging only
- Never use in production
- Increases response size and latency
---
## Web Search Options
### web_search_options
- **Type**: `object`
- **Default**: `null`
- **Description**: Configure web search behavior
**Properties**:
**search_context_size** (`'low' | 'medium' | 'high'`):
- Amount of search context
- Default: Model-dependent
- Effect: More context = higher cost
**user_location** (`object`):
- User location for search
- Properties:
- `type`: `'approximate'`
- `city`?: string
- `country`?: string
- `region`?: string
- `timezone`?: string
**Example**:
```typescript
{
web_search_options: {
search_context_size: 'high',
user_location: {
type: 'approximate',
city: 'San Francisco',
country: 'USA'
}
}
}
```
**Guidance**:
- Use with web search plugin or :online variant
- Higher context = better results, more cost
- Set user location for local search results
---
## Stream Options
### stream_options
- **Type**: `object`
- **Default**: `null`
- **Description**: Streaming configuration
**Properties**:
**include_usage** (`boolean`):
- Include usage in every chunk
- Default: `false`
- Note: Usage always in final chunk
**Example**:
```typescript
{
stream: true,
stream_options: {
include_usage: true
}
}
```
**Guidance**:
- Use for real-time usage tracking
- Adds small overhead to each chunk
- Final chunk always includes usage
---
## Prediction Parameter
### prediction
- **Type**: `object`
- **Default**: `null`
- **Description**: Provide predicted output to reduce latency
**Properties**:
**type**: Must be `'content'`
**content**: Predicted output text
**Example**:
```typescript
{
prediction: {
type: 'content',
content: 'Expected response...'
}
}
```
**Guidance**:
- Experimental feature
- Purpose: Latency optimization
- Requires good prediction to be effective
- Not widely supported
---
## Image Configuration
### image_config
- **Type**: `object`
- **Default**: `null`
- **Description**: Configure image generation
**Properties**: Model-specific
**Guidance**:
- For image-generation models
- See model documentation for details
---
### modalities
- **Type**: `string[]`
- **Default**: `null`
- **Options**: `['text'] | ['image'] | ['text', 'image']`
- **Description**: Request specific output modalities
**Guidance**:
- Only for models supporting multiple outputs
- Controls what model generates
- Most models only support text
---
## Verbosity
### verbosity
- **Type**: `'low' | 'medium' | 'high'`
- **Default**: `'medium'`
- **Description**: Control response verbosity
**Behavior**:
- `'low'`: Concise responses
- `'medium'`: Balanced (default)
- `'high'`: Detailed, comprehensive
**Guidance**:
- Introduced by OpenAI
- Maps to Anthropic's `output_config.effort`
- Use to control output length indirectly
---
## Parameter Support by Model
Not all models support all parameters. Check model's `supported_parameters` field:
### Common Parameters
- `temperature` - Widely supported
- `top_p` - Widely supported
- `top_k` - Not OpenAI models
- `min_p`, `top_a` - Some open-source models
- `frequency_penalty`, `presence_penalty` - OpenAI models
- `repetition_penalty` - Non-OpenAI models
- `max_tokens` - All models
- `logit_bias` - OpenAI, some others
- `logprobs` - OpenAI, some others
- `seed` - Most models (determinism not guaranteed)
- `response_format` - Growing support
- `structured_outputs` - OpenAI, Anthropic, Google, most open-source
- `stop` - All models
- `tools` - Growing support
- `tool_choice` - With tools support
- `parallel_tool_calls` - With tools support
- `include_reasoning` - Reasoning models
- `reasoning` - Reasoning models
- `web_search_options` - With web search support
- `verbosity` - OpenAI, Anthropic
### Check Support
```bash
curl https://openrouter.ai/api/v1/models
# Filter by supported_parameters in response
```
Or check models page: `openrouter.ai/models?supported_parameters=tools`
---
## Parameter Quick Reference
| Category | Parameter | Type | Range/Options | Default | When to Use |
|----------|-----------|------|---------------|---------|-------------|
| Core | model | string | - | User default | Always specify |
| Core | messages | Message[] | - | Required | Every request |
| Core | stream | boolean | - | false | Real-time responses |
| Sampling | temperature | float | 0-2 | 1.0 | Control creativity |
| Sampling | top_p | float | 0-1 | 1.0 | Alternative to temp |
| Sampling | top_k | integer | 0+ | 0 (disabled) | Not OpenAI |
| Sampling | frequency_penalty | float | -2 to 2 | 0.0 | Reduce repetition |
| Sampling | presence_penalty | float | -2 to 2 | 0.0 | Encourage variety |
| Sampling | repetition_penalty | float | 0-2 | 1.0 | Non-OpenAI |
| Sampling | min_p | float | 0-1 | 0.0 | Alternative to top_p |
| Sampling | top_a | float | 0-1 | 0.0 | Creative writing |
| Length | max_tokens | integer | 1+ | Model dep. | Control cost |
| Length | stop | string/array | - | null | Control structure |
| Output | response_format | object | - | null | Structured data |
| Tools | tools | Tool[] | - | [] | External functions |
| Tools | tool_choice | string/object | - | 'auto' | Control tool use |
| Tools | parallel_tool_calls | boolean | - | true | Efficiency |
| Reasoning | reasoning | object | - | null | Reasoning models |
| Reasoning | include_reasoning | boolean | - | false | Transparency |
| Routing | route | string | fallback/sort | null | Strategy |
| Routing | models | string[] | - | null | Fallbacks |
| Routing | provider | object | - | null | Preferences |
| Plugins | plugins | Plugin[] | - | [] | Extend capabilities |
| Metadata | user | string | 128 chars | null | Abuse detection |
| Metadata | session_id | string | 128 chars | null | Tracking |
| Metadata | metadata | map | 16 pairs | null | Analytics |
| Debug | debug | object | - | null | Debugging only |
---
**Sources**:
- https://openrouter.ai/docs/api/reference/parameters.mdx
- https://openrouter.ai/docs/api/reference/overview.mdx
- https://openrouter.ai/openapi.json
```
### references/ERROR_CODES.md
```markdown
# Error Codes Reference
Complete guide to OpenRouter API error codes, response structure, and handling strategies.
**Source**: https://openrouter.ai/docs/api/reference/errors-and-debugging.mdx
---
## HTTP Status Codes
### 400 Bad Request
**Description**: Invalid request format or parameters
**Common causes**:
- Missing required fields
- Invalid parameter values (out of range, wrong type)
- Malformed request body
- Invalid JSON structure
- Parameter not supported by model
**Example error**:
```json
{
"error": {
"code": 400,
"message": "Invalid request: 'messages' is required"
}
}
```
**How to handle**:
1. Validate request structure before sending
2. Check all required fields are present
3. Verify parameter types and ranges
4. Check model supports all parameters used
5. **Do not retry** - fix the request
**Common 400 errors**:
- Missing `messages` field
- `temperature` outside 0-2 range
- `max_tokens` exceeds model context length
- Invalid model ID
- Malformed JSON
---
### 401 Unauthorized
**Description**: Missing or invalid API key
**Common causes**:
- No `Authorization` header
- Invalid API key format
- API key does not exist
- API key has been revoked
**Example error**:
```json
{
"error": {
"code": 401,
"message": "Invalid API key"
}
}
```
**How to handle**:
1. Verify API key is set correctly
2. Check format: `Authorization: Bearer YOUR_KEY`
3. Ensure key is valid and active
4. Check if key was revoked
5. **Do not retry** - fix authentication
**Debug steps**:
```typescript
// Verify key format
const apiKey = process.env.OPENROUTER_API_KEY;
if (!apiKey?.startsWith('sk-or-')) {
console.error('Invalid API key format');
}
// Verify header
const headers = {
'Authorization': `Bearer ${apiKey}`,
// ...
};
```
---
### 402 Payment Required
**Description**: Insufficient credits
**Common causes**:
- Account balance is zero or low
- Cost of request exceeds available credits
- Spending limits reached
**Example error**:
```json
{
"error": {
"code": 402,
"message": "Insufficient credits",
"metadata": {
"required": 0.00015,
"available": 0.00000,
"currency": "USD"
}
}
}
```
**How to handle**:
1. Check account balance
2. Add credits to account
3. Use cheaper models or :free variants
4. Set spending limits appropriately
5. **Retry after** adding credits
**Prevention**:
```typescript
// Use free models when credits low
const useFreeModel = balance < 0.01;
const model = useFreeModel
? 'google/gemini-2.0-flash:free'
: 'anthropic/claude-3.5-sonnet';
```
---
### 403 Forbidden
**Description**: Insufficient permissions or access denied
**Common causes**:
- Model not allowed (guardrails)
- API key lacks permissions
- Organization restrictions
- Model access not purchased
- Rate limit exceeded (some cases)
**Example error**:
```json
{
"error": {
"code": 403,
"message": "Model not allowed for this API key",
"metadata": {
"model": "anthropic/claude-opus-4",
"restriction": "guardrails"
}
}
}
```
**How to handle**:
1. Check guardrails settings
2. Verify API key permissions
3. Check if model requires additional access
4. Review organization settings
5. Use allowed models
**Debug steps**:
- Check API key settings in dashboard
- Verify guardrail configuration
- Check if model is in allowed list
- Try a different model
---
### 408 Request Timeout
**Description**: Request took too long to complete
**Common causes**:
- Very long prompts
- Complex reasoning tasks
- Provider latency
- Network issues
**Example error**:
```json
{
"error": {
"code": 408,
"message": "Request timeout after 60 seconds"
}
}
```
**How to handle**:
1. Reduce prompt length
2. Use streaming for real-time feedback
3. Try a faster model (nitro variant)
4. Reduce max_tokens
5. **Retry with** simpler request
**Prevention**:
```typescript
// Use streaming for long responses
{
stream: true,
max_tokens: 1000, // Limit output length
}
// Use faster model for quick responses
{
model: 'openai/gpt-4o-mini:nitro'
}
```
---
### 429 Rate Limited
**Description**: Too many requests
**Common causes**:
- Exceeded request rate limit
- Too many concurrent requests
- Model-specific rate limits
- API key rate limits
**Example error**:
```json
{
"error": {
"code": 429,
"message": "Rate limit exceeded",
"metadata": {
"limit": 60,
"remaining": 0,
"reset": "2026-01-30T12:00:00Z"
}
}
}
```
**How to handle**:
1. Implement exponential backoff
2. Reduce request rate
3. Use API key with higher limits
4. Implement request queuing
5. **Retry with** backoff
**Exponential backoff strategy**:
```typescript
async function requestWithBackoff(url, options) {
const maxRetries = 5;
const baseDelay = 1000; // 1 second
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fetch(url, options);
if (response.status === 429) {
const delay = baseDelay * Math.pow(2, attempt);
const jitter = Math.random() * 1000;
await new Promise(resolve => setTimeout(resolve, delay + jitter));
continue;
}
return response;
} catch (error) {
if (attempt === maxRetries - 1) throw error;
await new Promise(resolve =>
setTimeout(resolve, baseDelay * Math.pow(2, attempt))
);
}
}
}
```
**Prevention**:
- Use model fallbacks to distribute load
- Implement request throttling
- Monitor usage and adjust rate
- Use batch APIs when available
---
### 502 Bad Gateway
**Description**: Provider error or invalid response
**Common causes**:
- Provider returned error
- Provider timeout
- Invalid response from provider
- Provider service unavailable
**Example error**:
```json
{
"error": {
"code": 502,
"message": "Provider returned invalid response",
"metadata": {
"provider": "openai",
"native_error": "timeout"
}
}
}
```
**How to handle**:
1. Use model fallbacks
2. **Retry with** different model/provider
3. Check provider status
4. Implement graceful degradation
**Retry with fallback**:
```typescript
{
models: [
'anthropic/claude-3.5-sonnet', // Primary
'openai/gpt-4o', // Fallback 1
'google/gemini-2.0-flash' // Fallback 2
]
}
```
---
### 503 Service Unavailable
**Description**: Service overloaded or temporarily unavailable
**Common causes**:
- High demand/overload
- Provider maintenance
- Temporary outage
- Capacity issues
**Example error**:
```json
{
"error": {
"code": 503,
"message": "Service temporarily unavailable",
"metadata": {
"retry_after": 30
}
}
}
```
**How to handle**:
1. **Retry with** exponential backoff
2. Use model fallbacks
3. Implement graceful degradation
4. Check status page if available
**Backoff with retry-after**:
```typescript
async function requestWithRetry(url, options) {
const response = await fetch(url, options);
if (response.status === 503) {
const retryAfter = response.headers.get('Retry-After');
const delay = retryAfter
? parseInt(retryAfter) * 1000
: 5000; // Default 5 seconds
await new Promise(resolve => setTimeout(resolve, delay));
return await fetch(url, options); // Retry
}
return response;
}
```
---
## Error Response Structure
All errors follow this format:
```typescript
type ErrorResponse = {
error: {
code: number; // HTTP status code
message: string; // Human-readable error message
metadata?: {
// Additional error-specific information
provider?: string;
model?: string;
native_error?: string;
limit?: number;
remaining?: number;
reset?: string;
required?: number;
available?: number;
restriction?: string;
retry_after?: number;
};
};
};
```
**Fields**:
- `code`: HTTP status code (400, 401, 402, 403, 408, 429, 502, 503)
- `message`: Description of error
- `metadata`: Additional context (varies by error type)
---
## Error Metadata Types
### Provider Metadata
```json
{
"metadata": {
"provider": "openai",
"native_error": "timeout"
}
}
```
**When**: Provider-specific errors (502, 503)
### Rate Limit Metadata
```json
{
"metadata": {
"limit": 60,
"remaining": 0,
"reset": "2026-01-30T12:00:00Z"
}
}
```
**When**: Rate limited (429)
### Credit Metadata
```json
{
"metadata": {
"required": 0.00015,
"available": 0.00000,
"currency": "USD"
}
}
```
**When**: Insufficient credits (402)
### Restriction Metadata
```json
{
"metadata": {
"model": "anthropic/claude-opus-4",
"restriction": "guardrails"
}
}
```
**When**: Access denied (403)
---
## Native Finish Reasons
Normalized finish reasons returned by OpenRouter:
| Finish Reason | Description | When Occurs |
|---------------|-------------|--------------|
| `stop` | Model naturally stopped | End of generation |
| `length` | Max tokens reached | Response truncated |
| `tool_calls` | Model wants to call tools | Tool calling response |
| `content_filter` | Content filtered | Safety/policy violation |
| `error` | Error occurred | Generation failed |
**Native finish reason**:
```json
{
"choices": [{
"finish_reason": "stop", // Normalized
"native_finish_reason": "stop" // Provider's original
}]
}
```
**Common native reasons by provider**:
- OpenAI: `stop`, `length`, `content_filter`, `function_call`
- Anthropic: `end_turn`, `max_tokens`, `stop_sequence`
- Google: `STOP`, `MAX_TOKENS`, `RECITATION`, `SAFETY`
- Mistral: `stop`, `length`, `error`
---
## Streaming Error Handling
### Pre-Stream Errors
Errors that occur before streaming starts:
**Format**: Standard HTTP error response
**Example**:
```json
{
"error": {
"code": 400,
"message": "Invalid request"
}
}
```
**How to handle**:
1. Parse first chunk as error check
2. If error, abort stream
3. Handle like non-streaming error
```typescript
const response = await fetch(url, { stream: true });
const reader = response.body.getReader();
const firstChunk = await reader.read();
const text = new TextDecoder().decode(firstChunk.value);
if (!text.startsWith('data: ')) {
const error = JSON.parse(text);
throw new Error(error.error.message);
}
// Process stream normally...
```
---
### Mid-Stream Errors
Errors that occur during streaming:
**Format**: SSE event with error field
**Example**:
```
data: {"id":"gen-abc","object":"chat.completion.chunk","error":{"code":502,"message":"Provider disconnected"},"choices":[{"index":0,"delta":{"content":""},"finish_reason":"error"}]}
```
**How to handle**:
1. Check for `error` field in each chunk
2. If error present, handle gracefully
3. Partial content may still be usable
4. Decide whether to continue or abort
```typescript
for await (const chunk of stream) {
const parsed = JSON.parse(chunk);
if (parsed.error) {
console.error('Stream error:', parsed.error);
// Decide: continue, retry, or abort
if (parsed.error.code >= 500) {
// Server error - can retry
break;
}
}
// Process content...
const content = parsed.choices?.[0]?.delta?.content;
}
```
---
## Retry Strategy
### Retryable Status Codes
**Should retry**:
- `408` - Request Timeout
- `429` - Rate Limited
- `502` - Bad Gateway
- `503` - Service Unavailable
**Should NOT retry**:
- `400` - Bad Request (fix request)
- `401` - Unauthorized (fix auth)
- `403` - Forbidden (fix permissions)
- `402` - Payment Required (add credits)
---
### Exponential Backoff Implementation
```typescript
async function fetchWithRetry(
url: string,
options: RequestInit,
maxRetries = 3
): Promise<Response> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fetch(url, options);
// Don't retry client errors (except 408)
if (response.status >= 400 && response.status < 500 &&
response.status !== 408) {
return response;
}
// Retry on rate limit or server errors
if (response.status === 429 || response.status >= 500) {
if (attempt === maxRetries - 1) {
return response; // Final attempt, don't retry
}
// Exponential backoff with jitter
const baseDelay = 1000;
const delay = Math.min(
baseDelay * Math.pow(2, attempt),
10000 // Max 10 seconds
);
const jitter = Math.random() * 1000;
await new Promise(resolve =>
setTimeout(resolve, delay + jitter)
);
continue;
}
return response;
} catch (error) {
if (attempt === maxRetries - 1) throw error;
// Network error - retry with backoff
const delay = Math.min(
1000 * Math.pow(2, attempt),
10000
);
await new Promise(resolve =>
setTimeout(resolve, delay)
);
}
}
throw new Error('Max retries exceeded');
}
```
---
## Graceful Degradation
### When Errors Occur
**Options**:
1. **Use cached responses** - if available
2. **Fall back to simpler model** - cheaper, more available
3. **Disable advanced features** - tools, web search, streaming
4. **Provide degraded experience** - partial functionality
5. **Show user-friendly error** - explain the situation
### Example Degradation Strategy
```typescript
async function requestWithDegradation(messages) {
const strategies = [
// Primary: Full-featured model
async () => await fetchWithRetry({
model: 'anthropic/claude-3.5-sonnet',
messages,
stream: true,
tools: [...]
}),
// Fallback 1: Cheaper model, no streaming
async () => await fetchWithRetry({
model: 'google/gemini-2.0-flash',
messages,
stream: false
}),
// Fallback 2: Free model
async () => await fetchWithRetry({
model: 'google/gemini-2.0-flash:free',
messages
}),
// Fallback 3: Cached response
async () => getCachedResponse(messages)
];
for (const strategy of strategies) {
try {
return await strategy();
} catch (error) {
console.warn('Strategy failed:', error.message);
continue;
}
}
throw new Error('All strategies failed');
}
```
---
## Error Handling Best Practices
### Do's
✅ **Always validate requests** before sending
✅ **Implement exponential backoff** for retryable errors
✅ **Use model fallbacks** for reliability
✅ **Log errors with context** (model, parameters, metadata)
✅ **Implement graceful degradation**
✅ **Check error metadata** for additional context
✅ **Monitor error rates** and adjust strategies
✅ **Set timeouts** to prevent hanging
### Don'ts
❌ **Retry on client errors** (400, 401, 402, 403)
❌ **Ignore error metadata** - contains valuable info
❌ **Retry without backoff** - can overload systems
❌ **Retry indefinitely** - set max retries
❌ **Expose raw errors** to users - sanitize and explain
❌ **Cache error responses** - only cache successes
❌ **Use fixed delays** - use exponential backoff with jitter
---
## Error Handling Checklist
### Before Request
- [ ] Validate API key format
- [ ] Validate request structure
- [ ] Check parameter types and ranges
- [ ] Verify model ID is valid
- [ ] Set reasonable timeouts
### After Error
- [ ] Identify error code
- [ ] Check error metadata
- [ ] Determine if retryable
- [ ] Implement appropriate backoff
- [ ] Log with full context
- [ ] Inform user appropriately
### Prevention
- [ ] Use model fallbacks
- [ ] Implement request throttling
- [ ] Monitor credit balance
- [ ] Check model capabilities
- [ ] Use appropriate parameters
- [ ] Test with :free models first
---
## Quick Reference
| Status Code | Name | Retry? | Backoff? | Action |
|-------------|------|--------|----------|--------|
| 400 | Bad Request | No | No | Fix request |
| 401 | Unauthorized | No | No | Check API key |
| 402 | Payment Required | Yes | No | Add credits |
| 403 | Forbidden | No | No | Check permissions |
| 408 | Timeout | Yes | Yes | Simplify/retry |
| 429 | Rate Limited | Yes | Yes | Backoff |
| 502 | Bad Gateway | Yes | Yes | Retry with fallback |
| 503 | Service Unavailable | Yes | Yes | Backoff |
---
**Sources**:
- https://openrouter.ai/docs/api/reference/errors-and-debugging.mdx
- https://openrouter.ai/openapi.json
```
### references/MODEL_SELECTION.md
```markdown
# Model Selection Guide
Comprehensive guide for selecting appropriate OpenRouter models, variants, and providers for different use cases.
**Source**: https://openrouter.ai/models
---
## Model Identifier Format
**Format**: `provider/model-name[:variant]`
**Examples**:
- `anthropic/claude-3.5-sonnet` - Specific model
- `openai/gpt-4o:online` - Model with web search variant
- `google/gemini-2.0-flash:free` - Model with free tier variant
- `meta-llama/llama-3.1-70b:thinking` - Model with thinking variant
**Parts**:
- **Provider**: Organization or platform (anthropic, openai, google, etc.)
- **Model Name**: Specific model (claude-3.5-sonnet, gpt-4o, gemini-2.0-flash)
- **Variant** (optional): Modifier for behavior (:free, :online, :extended, :thinking, :nitro, :exacto)
---
## Model Families
### OpenAI Models
**GPT-4o** (`openai/gpt-4o`)
- **Strengths**: Balanced, strong reasoning, multimodal (vision, audio)
- **Best for**: General purpose, coding, analysis
- **Context**: 128K
- **Cost**: High tier
**GPT-4o-mini** (`openai/gpt-4o-mini`)
- **Strengths**: Fast, cost-effective, good quality
- **Best for**: High-volume, real-time, cost-sensitive
- **Context**: 128K
- **Cost**: Low tier
**GPT-4.1** (`openai/gpt-4.1`)
- **Strengths**: Excellent reasoning, analysis
- **Best for**: Complex reasoning, research
- **Context**: 128K
- **Cost**: Very high tier
**O1 / O3** (`openai/o1`, `openai/o3`)
- **Strengths**: Deep reasoning, chain-of-thought
- **Best for**: Math, logic puzzles, complex problems
- **Context**: 200K (extended)
- **Cost**: Premium tier
- **Note**: Reasoning models, slower but smarter
---
### Anthropic Models
**Claude 3.5 Sonnet** (`anthropic/claude-3.5-sonnet`)
- **Strengths**: Excellent balance, strong coding, creative writing
- **Best for**: Most tasks, coding, writing
- **Context**: 200K
- **Cost**: Medium tier
- **Recommendation**: Default model for most use cases
**Claude Opus 4** (`anthropic/claude-opus-4`)
- **Strengths**: Best reasoning, analysis, nuanced understanding
- **Best for**: Complex reasoning, research, detailed analysis
- **Context**: 200K
- **Cost**: High tier
**Claude Haiku 4** (`anthropic/claude-haiku-4`)
- **Strengths**: Fast, cost-effective, good quality
- **Best for**: High-volume, simple tasks, cost-sensitive
- **Context**: 200K
- **Cost**: Low tier
---
### Google Models
**Gemini 2.5 Pro** (`google/gemini-2.5-pro`)
- **Strengths**: Strong reasoning, multimodal, competitive with GPT-4/Claude
- **Best for**: General purpose, multimodal, cost-effective alternative
- **Context**: 1M-2M
- **Cost**: Medium tier
**Gemini 2.0 Flash** (`google/gemini-2.0-flash`)
- **Strengths**: Very fast, good quality, multimodal
- **Best for**: Speed-critical, high-volume, real-time
- **Context**: 1M
- **Cost**: Low tier
- **Recommendation**: Best for speed-sensitive applications
---
### xAI Models
**Grok-2** (`xai/grok-2`)
- **Strengths**: Strong reasoning, Twitter knowledge
- **Best for**: Current events, reasoning
- **Context**: 128K
- **Cost**: Medium tier
---
### Cohere Models
**Command R+** (`cohere/command-r-plus`)
- **Strengths**: Good reasoning, retrieval-augmented
- **Best for**: RAG applications, enterprise
- **Context**: 128K
- **Cost**: Medium tier
**Command R** (`cohere/command-r`)
- **Strengths**: Fast, efficient, good quality
- **Best for**: High-volume, production
- **Context**: 128K
- **Cost**: Low tier
---
### Meta Models (Llama)
**Llama 3.1 70B** (`meta-llama/llama-3.1-70b`)
- **Strengths**: Open-source, good quality, cost-effective
- **Best for**: Cost-sensitive, privacy, deployment
- **Context**: 128K
- **Cost**: Low tier
**Llama 3.1 405B** (`meta-llama/llama-3.1-405b`)
- **Strengths**: Large, strong reasoning
- **Best for**: Research, complex tasks
- **Context**: 128K
- **Cost**: Medium tier
---
### Mistral Models
**Mistral Large 2** (`mistral/mistral-large`)
- **Strengths**: Good reasoning, efficient
- **Best for**: General purpose, European languages
- **Context**: 128K
- **Cost**: Medium tier
**Mistral Nemo** (`mistral/mistral-nemo`)
- **Strengths**: Fast, cost-effective
- **Best for**: High-volume, speed-critical
- **Context**: 128K
- **Cost**: Low tier
---
### Qwen Models
**Qwen 2.5 72B** (`qwen/qwen-2.5-72b`)
- **Strengths**: Strong coding, good reasoning
- **Best for**: Coding, technical tasks
- **Context**: 128K
- **Cost**: Low tier
**Qwen 2.5 Coder** (`qwen/qwen-2.5-coder-32b`)
- **Strengths**: Specialized for coding
- **Best for**: Code generation, debugging
- **Context**: 32K
- **Cost**: Very low tier
---
### DeepSeek Models
**DeepSeek V3** (`deepseek/deepseek-chat`)
- **Strengths**: Strong reasoning, cost-effective
- **Best for**: Cost-sensitive general purpose
- **Context**: 128K
- **Cost**: Low tier
---
## Model Variants
### :free - Free Tier
**Description**: Free access to models with rate limits
**When to use**:
- Testing and prototyping
- Low-complexity tasks
- High-volume, low-value operations
- Development/evaluation
**Limits**:
- 200 requests/minute (base)
- 200 requests/day (no credits)
- 2000 requests/day (with $5+ in credits)
**Examples**:
- `google/gemini-2.0-flash:free`
- `openai/gpt-4o-mini:free`
- `meta-llama/llama-3.1-70b:free`
**Tradeoffs**:
- Pros: No cost, good for testing
- Cons: Rate limits, often older or smaller models
**Source**: https://openrouter.ai/docs/guides/routing/model-variants/free.mdx
---
### :online - Web Search Enabled
**Description**: Model with built-in web search capabilities
**When to use**:
- Need current information
- Questions about recent events
- Factual verification needed
- Real-time data required
- User explicitly asks for current info
**Examples**:
- `anthropic/claude-3.5-sonnet:online`
- `openai/gpt-4o:online`
- `google/gemini-2.5-pro:online`
**Cost**: Additional cost for web search queries
**Tradeoffs**:
- Pros: Real-time information, factual accuracy
- Cons: Higher cost, additional latency
**Source**: https://openrouter.ai/docs/guides/routing/model-variants/online.mdx
---
### :extended - Extended Context
**Description**: Model with larger context window
**When to use**:
- Processing large documents
- Codebase understanding
- Long conversations
- Multi-document analysis
- Need to maintain large context
**Examples**:
- `anthropic/claude-3.5-sonnet:extended` (200K+)
- `google/gemini-2.5-pro:extended` (1M-2M)
- `openai/o1:extended` (200K)
**Tradeoffs**:
- Pros: Handle much larger inputs
- Cons: May be slower, higher cost
**Source**: https://openrouter.ai/docs/guides/routing/model-variants/extended.mdx
---
### :thinking - Enhanced Reasoning
**Description**: Model with explicit chain-of-thought reasoning
**When to use**:
- Complex multi-step reasoning
- Mathematical problems
- Logic puzzles
- Decision trees
- Need transparent reasoning
**Examples**:
- `anthropic/claude-opus-4:thinking`
- `openai/o3:thinking`
- `deepseek/deepseek-r1:thinking`
**Cost**: Higher token usage (reasoning + response)
**Tradeoffs**:
- Pros: Better reasoning, transparent thought process
- Cons: Slower, higher cost, more tokens
**Source**: https://openrouter.ai/docs/guides/routing/model-variants/thinking.mdx
---
### :nitro - High Speed
**Description**: Optimized for low latency
**When to use**:
- Speed is critical
- Real-time applications
- Chat interfaces
- User-facing where every millisecond matters
- High-frequency interactions
**Examples**:
- `openai/gpt-4o:nitro`
- `google/gemini-2.0-flash:nitro`
- `anthropic/claude-3.5-sonnet:nitro`
**Tradeoffs**:
- Pros: Minimal latency, faster responses
- Cons: May have quality tradeoffs, higher cost
**Source**: https://openrouter.ai/docs/guides/routing/model-variants/nitro.mdx
---
### :exacto - Specific Provider
**Description**: Force specific provider
**When to use**:
- Need specific provider features
- Provider agreement/contract
- Regional compliance
- Provider-specific requirements
**Examples**:
- `openai/gpt-4o:exacto` (OpenAI only)
- `anthropic/claude-3.5-sonnet:exacto` (Anthropic only)
**Tradeoffs**:
- Pros: Guaranteed provider
- Cons: No fallbacks, potential availability issues
**Source**: https://openrouter.ai/docs/guides/routing/model-variants/exacto.mdx
---
## Model Selection by Use Case
### General Purpose Chat
**Recommended**: `anthropic/claude-3.5-sonnet` or `openai/gpt-4o`
**Why**:
- Balanced quality, speed, cost
- Strong at most conversational tasks
- Wide feature support (tools, streaming, multimodal)
**Alternatives**:
- Cost-sensitive: `google/gemini-2.0-flash` or `openai/gpt-4o-mini`
- Speed-critical: `google/gemini-2.0-flash:nitro`
- Need web search: `:online` variant
---
### Coding
**Recommended**: `anthropic/claude-3.5-sonnet` or `openai/gpt-4o`
**Why**:
- Strong code generation and understanding
- Good at debugging and explaining
- Supports tools (code execution)
**Alternatives**:
- Cost-effective: `qwen/qwen-2.5-coder-32b`
- Very high quality: `openai/o1` or `anthropic/claude-opus-4`
---
### Complex Reasoning
**Recommended**: `anthropic/claude-opus-4` or `openai/o1`
**Why**:
- Deep reasoning capabilities
- Chain-of-thought approach
- Handles multi-step problems
**Alternatives**:
- Faster reasoning: `anthropic/claude-opus-4:thinking`
- Cost-sensitive: `openai/gpt-4.1`
---
### Creative Writing
**Recommended**: `anthropic/claude-3.5-sonnet` with `temperature: 0.8-1.2`
**Why**:
- Strong creative capabilities
- Nuanced language
- Good at style imitation
**Alternatives**:
- More creative: `openai/gpt-4o` with higher temperature
- Cost-sensitive: `meta-llama/llama-3.1-70b` with high temperature
---
### Factual/Informational
**Recommended**: `anthropic/claude-3.5-sonnet:online` or `google/gemini-2.5-pro:online`
**Why**:
- Web search for current information
- High factual accuracy
- Citations available
**Alternatives**:
- Cost-sensitive: `google/gemini-2.0-flash:online`
- Need verification: Use `:online` variant
---
### Summarization
**Recommended**: `anthropic/claude-3.5-sonnet` with `temperature: 0.2-0.4`
**Why**:
- Concise, accurate summaries
- Handles long documents with `:extended`
- Good extraction of key points
**Alternatives**:
- Long documents: `anthropic/claude-3.5-sonnet:extended`
- Cost-effective: `google/gemini-2.0-flash`
---
### Translation
**Recommended**: `google/gemini-2.5-pro` or `openai/gpt-4o`
**Why**:
- Strong multilingual capabilities
- Nuanced translations
- Context-aware
**Alternatives**:
- Cost-effective: `meta-llama/llama-3.1-70b`
- Specialized: `mistral/mistral-large` (European languages)
---
### Sentiment Analysis
**Recommended**: `anthropic/claude-3.5-sonnet` with structured outputs
**Why**:
- Consistent, accurate
- Can output structured JSON with sentiment labels
- Handles nuances
**Implementation**:
```typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: 'Analyze sentiment...' }],
response_format: {
type: 'json_schema',
json_schema: {
name: 'sentiment',
strict: true,
schema: {
type: 'object',
properties: {
sentiment: { type: 'string', enum: ['positive', 'negative', 'neutral'] },
confidence: { type: 'number' }
}
}
}
}
}
```
---
### RAG Applications
**Recommended**: `cohere/command-r-plus` or `anthropic/claude-3.5-sonnet`
**Why**:
- Strong at incorporating context
- Good at synthesis
- Cohere designed for RAG
**Parameters**:
```typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [
{
role: 'system',
content: 'Use the provided context to answer questions...'
},
{
role: 'user',
content: `Context:\n${context}\n\nQuestion:\n${query}`
}
],
temperature: 0.2, // Lower for factual responses
max_tokens: 500
}
```
---
### Agentic Systems
**Recommended**: `anthropic/claude-3.5-sonnet` or `openai/gpt-4o` with tools
**Why**:
- Strong tool use capabilities
- Good at decision-making
- Efficient multi-turn interactions
**Setup**:
```typescript
{
model: 'anthropic/claude-3.5-sonnet',
messages: [...],
tools: [/* available tools */],
tool_choice: 'auto',
parallel_tool_calls: true
}
```
---
## Model Capability Matrix
### Context Length
| Model | Standard | Extended |
|-------|----------|----------|
| Claude 3.5 Sonnet | 200K | 200K+ |
| Claude Opus 4 | 200K | 200K+ |
| GPT-4o | 128K | 200K |
| GPT-4.1 | 128K | 200K |
| Gemini 2.5 Pro | 1M | 1M-2M |
| Gemini 2.0 Flash | 1M | 1M+ |
| Grok-2 | 128K | 128K+ |
| Llama 3.1 70B | 128K | 128K+ |
| Llama 3.1 405B | 128K | 128K+ |
| Mistral Large 2 | 128K | 128K+ |
| Qwen 2.5 72B | 128K | 128K+ |
---
### Feature Support
| Model | Tools | Streaming | Vision | Audio | Video | Structured Output | Web Search |
|-------|--------|-----------|---------|--------|--------|------------------|------------|
| Claude 3.5 Sonnet | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Claude Opus 4 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| GPT-4o | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| GPT-4.1 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Gemini 2.5 Pro | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Gemini 2.0 Flash | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
| Grok-2 | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
| Llama 3.1 70B | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
| Mistral Large 2 | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
| Qwen 2.5 72B | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
**Note**: Check `supported_parameters` field for exact support
---
### Cost Tiers
**Very High**: OpenAI O1/O3, Claude Opus 4
**High**: OpenAI GPT-4.1, Claude Opus 4
**Medium**: Claude 3.5 Sonnet, GPT-4o, Gemini 2.5 Pro
**Low**: GPT-4o-mini, Gemini 2.0 Flash, Mistral Nemo
**Very Low**: Llama 3.1 70B, Qwen 2.5 72B, DeepSeek
**Free tier available**: Add `:free` variant
---
## Provider Selection
### Default Behavior
OpenRouter automatically selects the best available provider for each model based on:
- Cost
- Performance
- Availability
### Explicit Provider Order
**When to use**:
- Have preferred provider
- Need specific provider features
- Regional compliance requirements
- BYOK (Bring Your Own Key) arrangements
**Configuration**:
```typescript
{
provider: {
order: ['anthropic', 'openai', 'google'],
allow_fallbacks: true,
sort: 'price' // or 'latency', 'throughput'
}
}
```
### Provider Sorting Options
**'price'**: Optimize for lowest cost
**'latency'**: Optimize for fastest response
**'throughput'**: Optimize for highest tokens/second
### Provider Characteristics
| Provider | Strengths | Best For |
|----------|-----------|----------|
| OpenAI | Balanced, reliable, tools | General purpose, enterprise |
| Anthropic | Reasoning, coding, writing | Complex tasks, quality |
| Google | Fast, multimodal, long context | Speed, documents |
| xAI | Current events, reasoning | Real-time, news |
| Cohere | RAG, enterprise | Enterprise search |
| Meta | Open-source, cost-effective | Privacy, deployment |
| Mistral | Efficient, European languages | EU compliance, efficiency |
---
## Model Selection Algorithm
```typescript
function selectModel(requirements) {
const {
task, // 'chat', 'coding', 'reasoning', etc.
priority, // 'quality', 'speed', 'cost'
needsCurrentInfo,
largeContext,
tools,
budget
} = requirements;
// Priority: Quality
if (priority === 'quality') {
if (task === 'reasoning') return 'anthropic/claude-opus-4';
if (task === 'coding') return 'anthropic/claude-3.5-sonnet';
return 'openai/gpt-4o';
}
// Priority: Speed
if (priority === 'speed') {
if (task === 'coding') return 'anthropic/claude-3.5-sonnet:nitro';
return 'google/gemini-2.0-flash:nitro';
}
// Priority: Cost
if (priority === 'cost') {
if (budget === 'free') return 'google/gemini-2.0-flash:free';
return 'google/gemini-2.0-flash';
}
// Balanced (default)
if (needsCurrentInfo) return 'anthropic/claude-3.5-sonnet:online';
if (largeContext) return 'anthropic/claude-3.5-sonnet:extended';
if (tools) return 'anthropic/claude-3.5-sonnet';
return 'anthropic/claude-3.5-sonnet'; // Default
}
```
---
## Best Practices
### Start with Balanced Model
**Default**: `anthropic/claude-3.5-sonnet`
**Why**:
- Strong performance across tasks
- Good cost-quality tradeoff
- Wide feature support
### Adjust Based on Feedback
**Monitor**:
- Response quality
- Latency
- Cost
- Error rates
**Iterate**:
- Upgrade to better model if quality insufficient
- Downgrade for speed if latency too high
- Switch to cheaper model if cost is concern
### Use Model Fallbacks
**Setup**:
```typescript
{
models: [
'anthropic/claude-3.5-sonnet', // Primary
'openai/gpt-4o', // Fallback 1
'google/gemini-2.0-flash' // Fallback 2
]
}
```
**Benefits**:
- Automatic failover
- Higher reliability
- Graceful degradation
### Test with Free Models First
**Before production**:
```typescript
// Development/testing
const model = 'google/gemini-2.0-flash:free';
// Production
const model = 'anthropic/claude-3.5-sonnet';
```
### Check Model Capabilities
**Verify**:
```bash
curl https://openrouter.ai/api/v1/models
# Check `supported_parameters` field
```
**Or check**:
```
https://openrouter.ai/models?supported_parameters=tools
```
---
## Quick Reference
### Default Model by Task
| Task | Default | Alternative |
|------|---------|-------------|
| Chat | claude-3.5-sonnet | gpt-4o, gemini-2.5-pro |
| Coding | claude-3.5-sonnet | gpt-4o, qwen-2.5-coder |
| Reasoning | claude-opus-4 | o1, o3 |
| Creative | claude-3.5-sonnet | gpt-4o (higher temp) |
| Speed | gemini-2.0-flash:nitro | gpt-4o-mini:nitro |
| Cost | gemini-2.0-flash | gpt-4o-mini, llama-3.1-70b |
### Variant Selection
| Need | Variant | Example |
|------|---------|---------|
| No cost | :free | gpt-4o-mini:free |
| Current info | :online | claude-3.5-sonnet:online |
| Large context | :extended | claude-3.5-sonnet:extended |
| Deep reasoning | :thinking | claude-opus-4:thinking |
| Speed | :nitro | gpt-4o:nitro |
| Specific provider | :exacto | gpt-4o:exacto |
---
**Sources**:
- https://openrouter.ai/models
- https://openrouter.ai/docs/guides/routing/model-variants/free.mdx
- https://openrouter.ai/docs/guides/routing/model-variants/online.mdx
- https://openrouter.ai/docs/guides/routing/model-variants/extended.mdx
- https://openrouter.ai/docs/guides/routing/model-variants/thinking.mdx
- https://openrouter.ai/docs/guides/routing/model-variants/nitro.mdx
- https://openrouter.ai/docs/guides/routing/model-variants/exacto.mdx
```
### references/ROUTING_STRATEGIES.md
```markdown
# Routing Strategies
Complete guide to configuring intelligent routing with OpenRouter for model fallbacks, provider selection, and automatic optimization.
**Source**: https://openrouter.ai/docs/guides/routing/model-fallbacks.mdx
---
## Overview
OpenRouter provides powerful routing capabilities to optimize for:
- **Reliability**: Automatic failover
- **Cost**: Optimize for lowest price
- **Latency**: Optimize for fastest response
- **Throughput**: Optimize for highest capacity
---
## Model Fallbacks
### What are Model Fallbacks?
Automatic failover between multiple models. If the primary model fails (5xx, 429, timeout), OpenRouter automatically tries the next model in the list.
### Basic Configuration
```typescript
{
models: [
'anthropic/claude-3.5-sonnet', // Primary
'openai/gpt-4o', // Fallback 1
'google/gemini-2.0-flash' // Fallback 2
],
messages: [{ role: 'user', content: '...' }]
}
```
### How It Works
1. **Try primary model** (`anthropic/claude-3.5-sonnet`)
2. **If error** (5xx, 429, timeout): Try next model
3. **Continue** until one succeeds or all fail
4. **Return response** from successful model
5. **Include** actual model used in `model` field
### When to Use Model Fallbacks
✅ **Use when**:
- High reliability required
- User-facing applications
- Critical business functions
- Multiple providers acceptable
- Want graceful degradation
❌ **Don't use when**:
- Need specific model for compliance
- Model behavior must be consistent
- Testing/model comparison
- Cost must be predictable
### Best Practices
**Order by preference**:
```typescript
{
models: [
'anthropic/claude-3.5-sonnet', // Most preferred
'openai/gpt-4o', // Second choice
'google/gemini-2.0-flash' // Last resort
]
}
```
**Use different providers**:
```typescript
{
models: [
'anthropic/claude-3.5-sonnet', // Anthropic
'openai/gpt-4o', // OpenAI
'google/gemini-2.0-flash' // Google
]
}
```
**Include cost options**:
```typescript
{
models: [
'anthropic/claude-3.5-sonnet', // Quality
'google/gemini-2.0-flash', // Speed
'meta-llama/llama-3.1-70b:free' // Free
]
}
```
### Advanced Patterns
**Quality -> Speed -> Free**:
```typescript
{
models: [
'anthropic/claude-3.5-sonnet', // Best quality
'google/gemini-2.0-flash:nitro', // Fastest
'meta-llama/llama-3.1-70b:free' // Free
]
}
```
**By cost tier**:
```typescript
{
models: [
'google/gemini-2.0-flash', // Low cost
'anthropic/claude-3.5-sonnet', // Medium
'anthropic/claude-opus-4' // High quality
]
}
```
**For tools** (ensure all support tools):
```typescript
{
models: [
'anthropic/claude-3.5-sonnet',
'openai/gpt-4o',
'google/gemini-2.5-pro'
],
tools: [...]
}
```
### Response Handling
**Check actual model used**:
```typescript
const response = await fetch(/* ... */);
const data = await response.json();
console.log('Requested:', claude-3.5-sonnet');
console.log('Actual:', data.model); // May be different!
```
**Handle partial fallbacks**:
- If fallback used, may have different behavior
- Response quality may vary
- Test with all fallback models
---
## Provider Selection
### What is Provider Selection?
Control which providers serve your requests. Set preferences for cost, latency, or throughput, and specify which providers to use or avoid.
### Basic Configuration
```typescript
{
provider: {
order: ['anthropic', 'openai', 'google'],
allow_fallbacks: true,
sort: 'price'
},
messages: [{ role: 'user', content: '...' }]
}
```
### Provider Preferences Properties
#### order (`string[]`)
Preferred provider order.
**Example**:
```typescript
{
provider: {
order: ['anthropic', 'openai', 'google']
}
}
```
**When to use**:
- Have preferred providers
- Need specific provider features
- Provider agreements/contracts
---
#### allow_fallbacks (`boolean`)
Enable automatic provider fallbacks.
**Default**: `true`
**Example**:
```typescript
{
provider: {
order: ['anthropic', 'openai'],
allow_fallbacks: true // Allow falling back from Anthropic to OpenAI
}
}
```
**When to use**:
- Always: `true` (default)
- Only: `false` when you strictly need first provider
---
#### require_parameters (`boolean`)
Only use providers that support all request parameters.
**Default**: `false`
**Example**:
```typescript
{
provider: {
require_parameters: true // Only use providers supporting tools, streaming, etc.
},
tools: [...],
stream: true
}
```
**When to use**:
- Using advanced features (tools, structured outputs)
- Need consistent behavior across providers
- Want to avoid parameter being ignored
---
#### data_collection (`'allow' | 'deny'`)
Control whether providers can retain data.
**Default**: `'allow'`
**Example**:
```typescript
{
provider: {
data_collection: 'deny' // Zero Data Retention
}
}
```
**When to use**:
- Privacy requirements
- Compliance (GDPR, HIPAA)
- Zero Data Retention (ZDR) policies
---
#### only (`string[]`)
Whitelist specific providers.
**Example**:
```typescript
{
provider: {
only: ['anthropic', 'openai'] // Only use these providers
}
}
```
**When to use**:
- Want to restrict to specific providers
- Regional requirements
- Provider agreements
---
#### ignore (`string[]`)
Blacklist specific providers.
**Example**:
```typescript
{
provider: {
ignore: ['openai'] // Never use OpenAI
}
}
```
**When to use**:
- Have issues with specific provider
- Exclude for cost/latency reasons
- Regional/compliance restrictions
---
#### quantizations (`string[]`)
Filter by model quantization level.
**Options**: `'int4' | 'int8' | 'fp4' | 'fp6' | 'fp8' | 'fp16' | 'bf16' | 'fp32'`
**Example**:
```typescript
{
provider: {
quantizations: ['fp16', 'bf16'] // Only 16-bit precision
}
}
```
**When to use**:
- Quality requirements (higher precision = better quality)
- Speed requirements (lower precision = faster)
- Cost requirements (lower precision = cheaper)
**Tradeoffs**:
- `fp32`: Best quality, slowest, most expensive
- `fp16`/`bf16`: Balanced
- `fp8`/`int8`: Faster, cheaper, slight quality loss
- `int4`/`fp4`: Fastest, cheapest, more quality loss
---
#### sort (`'price' | 'latency' | 'throughput'`)
Sort providers by metric.
**Example**:
```typescript
{
provider: {
sort: 'price' // Use cheapest provider
}
}
```
**Options**:
**'price'**: Lowest cost per token
- **Best for**: Cost optimization
- **Tradeoff**: May be slower
**'latency'**: Fastest first token time
- **Best for**: Real-time applications, chat
- **Tradeoff**: May be more expensive
**'throughput'**: Highest tokens/second
- **Best for**: Batch processing, long documents
- **Tradeoff**: May be more expensive
---
#### max_price (`object`)
Maximum pricing thresholds.
**Properties**:
- `prompt`: Price per 1M prompt tokens
- `completion`: Price per 1M completion tokens
- `request`: Fixed price per request
**Example**:
```typescript
{
provider: {
max_price: {
prompt: 10, // Max $10 per 1M prompt tokens
completion: 30 // Max $30 per 1M completion tokens
}
}
}
```
**When to use**:
- Budget constraints
- Cost predictability
- Avoid expensive providers
---
#### preferred_min_throughput (`number`)
Minimum tokens/second threshold.
**Can be percentile object**: `{ p50, p75, p90, p99 }`
**Example**:
```typescript
{
provider: {
preferred_min_throughput: {
p50: 50, // At least 50 tokens/s median
p95: 30 // At least 30 tokens/s 95th percentile
}
}
}
```
**When to use**:
- Need consistent speed
- Batch processing
- Long document processing
---
#### preferred_max_latency (`number`)
Maximum latency threshold in seconds.
**Can be percentile object**: `{ p50, p75, p90, p99 }`
**Example**:
```typescript
{
provider: {
preferred_max_latency: {
p50: 2.0, // Median latency under 2 seconds
p95: 5.0 // 95th percentile under 5 seconds
}
}
}
```
**When to use**:
- Real-time applications
- Chat interfaces
- User-facing where latency matters
---
## Complete Provider Configuration Example
```typescript
{
model: 'anthropic/claude-3.5-sonnet',
provider: {
order: ['anthropic', 'openai', 'google'],
allow_fallbacks: true,
require_parameters: false,
data_collection: 'deny',
only: null,
ignore: ['provider_to_exclude'],
quantizations: ['fp16', 'bf16'],
sort: 'price',
max_price: {
prompt: 10,
completion: 30
},
preferred_min_throughput: {
p50: 50
},
preferred_max_latency: {
p95: 5.0
}
},
messages: [...]
}
```
---
## Routing Strategies by Use Case
### Cost Optimization
**Goal**: Minimize total cost
**Configuration**:
```typescript
{
provider: {
sort: 'price',
allow_fallbacks: true
},
models: [
'google/gemini-2.0-flash',
'meta-llama/llama-3.1-70b:free',
'anthropic/claude-3.5-sonnet'
]
}
```
**Additional tips**:
- Use `:free` variants when possible
- Set `max_price` thresholds
- Prefer quantized models (int8, fp8)
---
### Latency Optimization
**Goal**: Minimize response time for real-time apps
**Configuration**:
```typescript
{
provider: {
sort: 'latency',
preferred_max_latency: {
p50: 1.5,
p95: 3.0
}
},
models: [
'google/gemini-2.0-flash:nitro',
'openai/gpt-4o-mini:nitro',
'anthropic/claude-3.5-sonnet:nitro'
]
}
```
**Additional tips**:
- Use `:nitro` variants
- Prefer fast models (Flash, Mini)
- Use streaming for perceived speed
---
### Throughput Optimization
**Goal**: Maximize tokens/second for batch processing
**Configuration**:
```typescript
{
provider: {
sort: 'throughput',
preferred_min_throughput: {
p50: 100
}
},
models: [
'anthropic/claude-3.5-sonnet',
'google/gemini-2.5-pro'
]
}
```
**Additional tips**:
- Use larger models (better throughput)
- Parallelize requests
- Use non-streaming for efficiency
---
### Quality Optimization
**Goal**: Maximize response quality
**Configuration**:
```typescript
{
models: [
'anthropic/claude-opus-4',
'openai/o1',
'anthropic/claude-3.5-sonnet'
],
provider: {
require_parameters: true, // Ensure all features work
allow_fallbacks: true
}
}
```
**Additional tips**:
- Use best models (Opus, O1)
- Use `:thinking` variants for complex reasoning
- Enable all advanced features (tools, structured outputs)
---
### Reliability Optimization
**Goal**: Maximize availability and success rate
**Configuration**:
```typescript
{
models: [
'anthropic/claude-3.5-sonnet',
'openai/gpt-4o',
'google/gemini-2.5-pro',
'meta-llama/llama-3.1-70b'
],
provider: {
allow_fallbacks: true
}
}
```
**Additional tips**:
- Use 3-5 models in fallback list
- Include models from different providers
- Test with all fallback models
- Implement retry logic with exponential backoff
---
### Privacy Optimization
**Goal**: Zero Data Retention (ZDR)
**Configuration**:
```typescript
{
provider: {
data_collection: 'deny', // ZDR enabled
ignore: ['providers_that_retain_data']
},
user: 'user-123', // For abuse detection
metadata: {
privacy: 'zdr-enabled'
}
}
```
**Additional tips**:
- Check provider privacy policies
- Use BYOK for full control
- Review data retention settings in dashboard
---
## Auto Router
### What is Auto Router?
Automatic model selection based on request complexity, cost, and availability.
### Basic Configuration
```typescript
{
model: 'openrouter.ai/auto',
messages: [{ role: 'user', content: '...' }]
}
```
**Behavior**:
- Automatically selects best model
- Considers cost and quality
- No model selection needed
### With Allowed Models
```typescript
{
model: 'openrouter.ai/auto',
plugins: [{
id: 'auto-router',
allowed_models: ['openai/*', 'anthropic/*']
}],
messages: [{ role: 'user', content: '...' }]
}
```
**Allowed patterns**:
- `'*'` - All models
- `'openai/*'` - All OpenAI models
- `'anthropic/claude-*'` - All Claude models
- Specific models: `'openai/gpt-4o', 'anthropic/claude-3.5-sonnet'`
### When to Use Auto Router
✅ **Use when**:
- Want automatic optimization
- Don't want to manage model selection
- Acceptable for variable model behavior
- Quick prototyping
❌ **Don't use when**:
- Need consistent model behavior
- Specific model requirements
- Compliance/regulatory needs
- Testing/comparison
---
## Combining Routing Strategies
### Model Fallbacks + Provider Selection
```typescript
{
models: [
'anthropic/claude-3.5-sonnet',
'openai/gpt-4o',
'google/gemini-2.0-flash'
],
provider: {
sort: 'price',
allow_fallbacks: true
}
}
```
**Behavior**:
- Try each model in order
- For each model, select cheapest provider
- Fall back to next model on error
### Provider Selection + Cost Thresholds
```typescript
{
provider: {
order: ['anthropic', 'openai', 'google'],
sort: 'price',
max_price: {
prompt: 10,
completion: 30
}
},
model: 'anthropic/claude-3.5-sonnet'
}
```
**Behavior**:
- Prefer Anthropic, then OpenAI, then Google
- Within each provider, select cheapest
- Reject providers exceeding max_price
### Model Fallbacks + Data Collection
```typescript
{
models: [
'anthropic/claude-3.5-sonnet',
'openai/gpt-4o'
],
provider: {
data_collection: 'deny', // ZDR
allow_fallbacks: true
}
}
```
**Behavior**:
- Only use providers with ZDR enabled
- Fall back between models
- Maintain privacy across fallbacks
---
## Monitoring and Observability
### Track Routing Decisions
```typescript
const response = await fetch(/* ... */);
const data = await response.json();
// Log routing info
console.log({
model: data.model,
provider: data.provider,
usage: data.usage,
cost: data.usage?.cost
});
```
### Monitor Fallback Rates
```typescript
let primarySuccess = 0;
let fallbackUsed = 0;
function trackFallback(data) {
if (data.model === 'anthropic/claude-3.5-sonnet') {
primarySuccess++;
} else {
fallbackUsed++;
}
console.log({
primaryRate: primarySuccess / (primarySuccess + fallbackUsed),
fallbackRate: fallbackUsed / (primarySuccess + fallbackUsed)
});
}
```
### Monitor Provider Performance
```typescript
const providerStats = {};
function trackProvider(data) {
const provider = data.provider;
if (!providerStats[provider]) {
providerStats[provider] = { count: 0, totalLatency: 0, errors: 0 };
}
providerStats[provider].count++;
providerStats[provider].totalLatency += data.latency;
console.log('Provider stats:', providerStats);
}
```
---
## Best Practices
### Always Allow Fallbacks
```typescript
{
provider: {
allow_fallbacks: true // Always true unless you have specific reason
}
}
```
### Use Model Fallbacks for Critical Applications
```typescript
{
models: [
'anthropic/claude-3.5-sonnet',
'openai/gpt-4o',
'google/gemini-2.5-pro'
]
}
```
### Set Appropriate Price Limits
```typescript
{
provider: {
max_price: {
prompt: 10, // Set based on budget
completion: 30
}
}
}
```
### Match Sorting to Use Case
- **Real-time chat**: `sort: 'latency'`
- **Batch processing**: `sort: 'throughput'`
- **Cost-sensitive**: `sort: 'price'`
- **General purpose**: No sorting (automatic)
### Use Data Collection: deny for Privacy
```typescript
{
provider: {
data_collection: 'deny' // ZDR by default
}
}
```
### Test Routing Configuration
**Before production**:
- Test with all models in fallback list
- Verify provider selection works
- Check cost estimates
- Monitor actual usage
---
## Quick Reference
### Routing Configuration
| Strategy | Configuration | Use Case |
|----------|---------------|-----------|
| Model fallbacks | `models: [...]` | Reliability |
| Provider order | `provider.order: [...]` | Preferred providers |
| Cost optimization | `provider.sort: 'price'` | Cost-sensitive |
| Latency optimization | `provider.sort: 'latency'` | Real-time apps |
| Throughput optimization | `provider.sort: 'throughput'` | Batch processing |
| ZDR | `provider.data_collection: 'deny'` | Privacy |
| Auto router | `model: 'openrouter.ai/auto'` | Automatic selection |
### Common Patterns
```typescript
// Cost-optimized with fallbacks
{
models: ['gemini-2.0-flash', 'llama-3.1-70b:free'],
provider: { sort: 'price' }
}
// Fast with reliability
{
models: ['gpt-4o-mini:nitro', 'gemini-2.0-flash:nitro'],
provider: { sort: 'latency', allow_fallbacks: true }
}
// Privacy-focused
{
models: ['claude-3.5-sonnet', 'gpt-4o'],
provider: { data_collection: 'deny', allow_fallbacks: true }
}
```
---
**Sources**:
- https://openrouter.ai/docs/guides/routing/model-fallbacks.mdx
- https://openrouter.ai/docs/guides/routing/provider-selection.mdx
- https://openrouter.ai/docs/guides/routing/routers/auto-router.mdx
```
### references/EXAMPLES.md
```markdown
# Working Examples
Complete, working code examples for common OpenRouter API usage patterns in TypeScript and Python.
**Source**: https://openrouter.ai/docs/quickstart
---
## TypeScript Examples
### Basic Chat Completion
**Simple request**:
```typescript
const apiKey = process.env.OPENROUTER_API_KEY;
async function chatCompletion(userMessage: string) {
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [
{ role: 'user', content: userMessage }
],
temperature: 0.7,
max_tokens: 500
})
});
const data = await response.json();
const content = data.choices[0].message.content;
console.log('Response:', content);
return content;
}
// Usage
chatCompletion('What is the meaning of life?');
```
### Streaming Response
**Process SSE stream**:
```typescript
async function streamingChat(userMessage: string) {
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [
{ role: 'user', content: userMessage }
],
stream: true
})
});
if (!response.body) {
throw new Error('No response body');
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
let fullContent = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.startsWith('data: '));
for (const line of lines) {
const data = line.slice(6);
if (data === '[DONE]') break;
const parsed = JSON.parse(data);
const content = parsed.choices?.[0]?.delta?.content;
if (content) {
fullContent += content;
process.stdout.write(content); // Stream to console
}
if (parsed.usage) {
console.log('\nUsage:', parsed.usage);
}
}
}
console.log('\nComplete:', fullContent);
return fullContent;
}
// Usage
streamingChat('Tell me a short story');
```
### Tool Calling with Agentic Loop
**Complete tool calling example**:
```typescript
interface Tool {
name: string;
description: string;
parameters: object;
}
const tools: Tool[] = [{
name: 'get_weather',
description: 'Get current weather for a location',
parameters: {
type: 'object',
properties: {
location: {
type: 'string',
description: 'City name'
},
unit: {
type: 'string',
enum: ['celsius', 'fahrenheit']
}
},
required: ['location']
}
}, {
name: 'calculate',
description: 'Perform a calculation',
parameters: {
type: 'object',
properties: {
expression: {
type: 'string',
description: 'Mathematical expression to evaluate'
}
},
required: ['expression']
}
}];
async function executeTool(name: string, args: any) {
console.log(`Executing tool: ${name}`, args);
switch (name) {
case 'get_weather':
return { location: args.location, temperature: 22, conditions: 'Sunny' };
case 'calculate':
try {
const result = eval(args.expression);
return { expression: args.expression, result };
} catch (error) {
return { error: 'Invalid expression' };
}
default:
throw new Error(`Unknown tool: ${name}`);
}
}
async function runAgent(userMessage: string, maxIterations = 5) {
let messages = [{ role: 'user', content: userMessage }];
for (let iteration = 0; iteration < maxIterations; iteration++) {
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: messages,
tools: tools.map(tool => ({
type: 'function',
function: tool
})),
tool_choice: 'auto',
parallel_tool_calls: true
})
});
const data = await response.json();
const assistantMessage = data.choices[0].message;
messages.push(assistantMessage);
if (!assistantMessage.tool_calls) {
console.log('Final answer:', assistantMessage.content);
return assistantMessage.content;
}
console.log(`Iteration ${iteration + 1}: ${assistantMessage.tool_calls.length} tools called`);
// Execute tools
const toolResults = await Promise.all(
assistantMessage.tool_calls.map(async (toolCall) => {
const { name, arguments: args } = toolCall.function;
const parsedArgs = JSON.parse(args);
const result = await executeTool(name, parsedArgs);
return {
role: 'tool',
tool_call_id: toolCall.id,
content: JSON.stringify(result)
};
})
);
messages.push(...toolResults);
}
throw new Error('Max iterations exceeded');
}
// Usage
runAgent('What is the weather in Tokyo and what is 15 + 27?');
```
### Structured Output
**JSON Schema enforcement**:
```typescript
interface WeatherData {
location: string;
temperature: number;
conditions: string;
humidity: number;
}
async function getStructuredWeather(location: string): Promise<WeatherData> {
const schema = {
type: 'object',
properties: {
location: { type: 'string' },
temperature: { type: 'number' },
conditions: { type: 'string' },
humidity: { type: 'number' }
},
required: ['location', 'temperature', 'conditions', 'humidity'],
additionalProperties: false
};
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [{
role: 'user',
content: `What is the weather in ${location}? Respond with JSON.`
}],
response_format: {
type: 'json_schema',
json_schema: {
name: 'weather',
strict: true,
schema: schema
}
},
plugins: [{
id: 'response-healing'
}]
})
});
const data = await response.json();
const weatherData = JSON.parse(data.choices[0].message.content);
console.log('Weather data:', weatherData);
return weatherData;
}
// Usage
getStructuredWeather('San Francisco');
```
### Web Search Integration
**Using :online variant**:
```typescript
async function webSearchQuery(query: string) {
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet:online',
messages: [{
role: 'user',
content: query
}]
})
});
const data = await response.json();
const content = data.choices[0].message.content;
// Extract citations
const annotations = data.choices[0].message.annotations || [];
console.log('Response:', content);
console.log('Citations:', annotations);
return { content, annotations };
}
// Usage
webSearchQuery('What are the latest AI developments in 2026?');
```
### Image Understanding
**Multimodal with image**:
```typescript
async function analyzeImage(imageUrl: string) {
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'anthropic/claude-3.5-sonnet',
messages: [{
role: 'user',
content: [
{ type: 'text', text: 'Describe this image in detail.' },
{
type: 'image_url',
image_url: {
url: imageUrl,
detail: 'high'
}
}
]
}]
})
});
const data = await response.json();
const description = data.choices[0].message.content;
console.log('Image description:', description);
return description;
}
// Usage
analyzeImage('https://example.com/image.jpg');
```
### Model Fallbacks
**Automatic failover**:
```typescript
async function requestWithFallbacks(userMessage: string) {
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
models: [
'anthropic/claude-3.5-sonnet',
'openai/gpt-4o',
'google/gemini-2.0-flash'
],
messages: [{ role: 'user', content: userMessage }]
})
});
const data = await response.json();
const actualModel = data.model;
const content = data.choices[0].message.content;
console.log(`Used model: ${actualModel}`);
console.log('Response:', content);
return { content, model: actualModel };
}
// Usage
requestWithFallbacks('Explain quantum computing');
```
### Error Handling with Retry
**Robust error handling**:
```typescript
async function requestWithRetry(
body: any,
maxRetries = 3
) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify(body)
});
if (response.ok) {
return await response.json();
}
// Don't retry client errors (except 408)
if (response.status >= 400 && response.status < 500 &&
response.status !== 408) {
const error = await response.json();
throw new Error(error.error.message);
}
// Retry on rate limit or server errors
if (response.status === 429 || response.status >= 500) {
if (attempt === maxRetries - 1) {
const error = await response.json();
throw new Error(`Max retries: ${error.error.message}`);
}
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
const jitter = Math.random() * 1000;
console.log(`Retry ${attempt + 1} after ${delay}ms...`);
await new Promise(resolve => setTimeout(resolve, delay + jitter));
continue;
}
} catch (error: any) {
if (attempt === maxRetries - 1) throw error;
const delay = Math.min(1000 * Math.pow(2, attempt), 10000);
console.log(`Network error, retry ${attempt + 1} after ${delay}ms...`);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
throw new Error('Max retries exceeded');
}
// Usage
const result = await requestWithRetry({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: 'Hello!' }]
});
console.log(result.choices[0].message.content);
```
### OpenAI SDK Integration
**Using OpenAI SDK**:
```typescript
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: process.env.OPENROUTER_API_KEY,
defaultHeaders: {
'HTTP-Referer': 'https://your-app.com',
'X-Title': 'Your App'
}
});
async function chatWithOpenAISDK(message: string) {
const completion = await openai.chat.completions.create({
model: 'anthropic/claude-3.5-sonnet',
messages: [{ role: 'user', content: message }],
temperature: 0.7,
max_tokens: 500
});
console.log(completion.choices[0].message.content);
return completion.choices[0].message.content;
}
// Usage
chatWithOpenAISDK('What is the meaning of life?');
```
---
## Python Examples
### Basic Chat Completion
```python
import requests
import json
api_key = "your-openrouter-api-key"
def chat_completion(user_message):
url = "https://openrouter.ai/api/v1/chat/completions"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
data = {
"model": "anthropic/claude-3.5-sonnet",
"messages": [
{"role": "user", "content": user_message}
],
"temperature": 0.7,
"max_tokens": 500
}
response = requests.post(url, headers=headers, json=data)
result = response.json()
content = result["choices"][0]["message"]["content"]
print("Response:", content)
return content
# Usage
chat_completion("What is the meaning of life?")
```
### Streaming Response
```python
import requests
def streaming_chat(user_message):
url = "https://openrouter.ai/api/v1/chat/completions"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
data = {
"model": "anthropic/claude-3.5-sonnet",
"messages": [
{"role": "user", "content": user_message}
],
"stream": True
}
response = requests.post(url, headers=headers, json=data, stream=True)
full_content = ""
for line in response.iter_lines():
if line.startswith("data: "):
data_str = line[6:]
if data_str == "[DONE]":
break
parsed = json.loads(data_str)
content = parsed.get("choices", [{}])[0].get("delta", {}).get("content")
if content:
full_content += content
print(content, end="", flush=True)
if "usage" in parsed:
print("\nUsage:", parsed["usage"])
print("\nComplete:", full_content)
return full_content
# Usage
streaming_chat("Tell me a short story")
```
### Tool Calling
```python
import requests
import json
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location"]
}
}
}
]
def execute_tool(name, args):
print(f"Executing tool: {name}", args)
if name == "get_weather":
return {"location": args["location"], "temperature": 22, "conditions": "Sunny"}
raise Exception(f"Unknown tool: {name}")
def run_agent(user_message, max_iterations=5):
messages = [{"role": "user", "content": user_message}]
for iteration in range(max_iterations):
response = requests.post(
"https://openrouter.ai/api/v1/chat/completions",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},
json={
"model": "anthropic/claude-3.5-sonnet",
"messages": messages,
"tools": tools,
"tool_choice": "auto",
"parallel_tool_calls": True
}
)
data = response.json()
assistant_message = data["choices"][0]["message"]
messages.append(assistant_message)
if "tool_calls" not in assistant_message:
print("Final answer:", assistant_message["content"])
return assistant_message["content"]
print(f"Iteration {iteration + 1}: {len(assistant_message['tool_calls'])} tools called")
# Execute tools
for tool_call in assistant_message["tool_calls"]:
name = tool_call["function"]["name"]
args = json.loads(tool_call["function"]["arguments"])
result = execute_tool(name, args)
messages.append({
"role": "tool",
"tool_call_id": tool_call["id"],
"content": json.dumps(result)
})
raise Exception("Max iterations exceeded")
# Usage
run_agent("What is the weather in Tokyo?")
```
### Structured Output
```python
import requests
import json
def get_structured_weather(location):
schema = {
"type": "object",
"properties": {
"location": {"type": "string"},
"temperature": {"type": "number"},
"conditions": {"type": "string"},
"humidity": {"type": "number"}
},
"required": ["location", "temperature", "conditions", "humidity"],
"additionalProperties": False
}
response = requests.post(
"https://openrouter.ai/api/v1/chat/completions",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},
json={
"model": "anthropic/claude-3.5-sonnet",
"messages": [{
"role": "user",
"content": f"What is the weather in {location}? Respond with JSON."
}],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "weather",
"strict": True,
"schema": schema
}
},
"plugins": [{
"id": "response-healing"
}]
}
)
data = response.json()
weather_data = json.loads(data["choices"][0]["message"]["content"])
print("Weather data:", weather_data)
return weather_data
# Usage
get_structured_weather("San Francisco")
```
### OpenAI SDK (Python)
```python
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key=api_key
)
def chat_with_openai_sdk(message):
completion = client.chat.completions.create(
model="anthropic/claude-3.5-sonnet",
messages=[{"role": "user", "content": message}],
temperature=0.7,
max_tokens=500
)
print(completion.choices[0].message.content)
return completion.choices[0].message.content
# Usage
chat_with_openai_sdk("What is the meaning of life?")
```
---
## cURL Examples
### Basic Request
```bash
curl https://openrouter.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d '{
"model": "anthropic/claude-3.5-sonnet",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
```
### Streaming
```bash
curl https://openrouter.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d '{
"model": "anthropic/claude-3.5-sonnet",
"messages": [
{"role": "user", "content": "Tell me a story"}
],
"stream": true
}'
```
### With Tools
```bash
curl https://openrouter.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d '{
"model": "anthropic/claude-3.5-sonnet",
"messages": [
{"role": "user", "content": "What'\''s the weather?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'
```
### With Web Search
```bash
curl https://openrouter.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-d '{
"model": "anthropic/claude-3.5-sonnet:online",
"messages": [
{"role": "user", "content": "What'\''s happening in AI today?"}
]
}'
```
---
## Quick Reference
### Common Patterns
| Pattern | Key Parameters | When to Use |
|---------|----------------|--------------|
| Basic chat | model, messages, temperature, max_tokens | Simple requests |
| Streaming | stream: true | Real-time responses |
| Tool calling | tools, tool_choice, parallel_tool_calls | External functions |
| Structured output | response_format: { type: 'json_schema' } | API responses, data extraction |
| Web search | model: 'xxx:online' or plugins: [{ id: 'web' }] | Current information |
| Image input | content: [{ type: 'image_url', ... }] | Vision tasks |
| Model fallbacks | models: [...] | Reliability |
| Retry logic | Exponential backoff | Error handling |
---
**Sources**:
- https://openrouter.ai/docs/quickstart
- https://openrouter.ai/docs/api/reference/overview.mdx
```