Paid Models Integration
Expert guidance for integrating paid AI model APIs including Claude (Anthropic), OpenAI, Google Gemini, Groq, and others with Python applications
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install majiayu000-claude-skill-registry-data-paid-models-integration
Repository
Skill path: testing/paid-models-integration
Expert guidance for integrating paid AI model APIs including Claude (Anthropic), OpenAI, Google Gemini, Groq, and others with Python applications
Open repositoryBest for
Primary workflow: Analyze Data & AI.
Technical facets: Full Stack, Data / AI, Integration.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: majiayu000.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install Paid Models Integration into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/majiayu000/claude-skill-registry-data before adding Paid Models Integration to shared team environments
- Use Paid Models Integration for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: Paid Models Integration
description: Expert guidance for integrating paid AI model APIs including Claude (Anthropic), OpenAI, Google Gemini, Groq, and others with Python applications
version: 1.0.0
---
# Paid AI Models Integration
Complete guide for integrating commercial AI model APIs with Python applications.
## Overview
This skill covers integration with major paid AI model providers:
- **Anthropic Claude** - Advanced reasoning and coding
- **OpenAI** - GPT-4o, GPT-4, GPT-3.5
- **Google Gemini** - Multimodal capabilities
- **Groq** - Fast inference (paid tier)
- **Amazon Bedrock** - AWS-hosted models
- **Azure OpenAI** - Enterprise OpenAI
## Anthropic Claude Integration
### Setup
```bash
# Install SDK
pip install anthropic
# Set API key
export ANTHROPIC_API_KEY='your-api-key'
```
### Basic Usage
```python
import anthropic
client = anthropic.Anthropic(
api_key="your-api-key"
)
# Simple message
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain quantum computing"}
]
)
print(response.content[0].text)
```
### Advanced Features
```python
from anthropic import Anthropic
client = Anthropic()
# Streaming response
with client.messages.stream(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a story"}]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
# With system prompt
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=2048,
system="You are a Python expert. Provide clear, well-documented code.",
messages=[
{"role": "user", "content": "Write a binary search function"}
]
)
```
### Tool Use (Function Calling)
```python
import anthropic
client = anthropic.Anthropic()
# Define tools
tools = [
{
"name": "get_weather",
"description": "Get weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
]
# Request with tools
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's the weather in Paris?"}
]
)
# Process tool calls
for content in response.content:
if content.type == "tool_use":
print(f"Tool: {content.name}")
print(f"Input: {content.input}")
# Execute tool and send result back
tool_result = get_weather(content.input["location"])
# Continue conversation with tool result
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "What's the weather in Paris?"},
{"role": "assistant", "content": response.content},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": content.id,
"content": tool_result
}
]
}
]
)
```
### Vision (Image Analysis)
```python
import base64
import anthropic
client = anthropic.Anthropic()
# Read image
with open("image.jpg", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": image_data
}
},
{
"type": "text",
"text": "Describe this image"
}
]
}
]
)
print(response.content[0].text)
```
### Prompt Caching (Cost Optimization)
```python
# Enable prompt caching for repeated content
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are an expert in Python programming...",
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": "Explain decorators"}
]
)
# Check cache usage
print(f"Cache read tokens: {response.usage.cache_read_input_tokens}")
print(f"Cache creation tokens: {response.usage.cache_creation_input_tokens}")
```
## OpenAI Integration
### Setup
```bash
pip install openai
export OPENAI_API_KEY='your-api-key'
```
### Basic Usage
```python
from openai import OpenAI
client = OpenAI(api_key="your-api-key")
# Chat completion
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Explain async programming"}
],
temperature=0.7,
max_tokens=1000
)
print(response.choices[0].message.content)
```
### Streaming
```python
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a poem"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
```
### Function Calling
```python
functions = [
{
"name": "get_stock_price",
"description": "Get current stock price",
"parameters": {
"type": "object",
"properties": {
"symbol": {
"type": "string",
"description": "Stock symbol (e.g., AAPL)"
}
},
"required": ["symbol"]
}
}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What's the price of Apple stock?"}],
functions=functions,
function_call="auto"
)
# Process function call
if response.choices[0].message.function_call:
function_call = response.choices[0].message.function_call
print(f"Function: {function_call.name}")
print(f"Arguments: {function_call.arguments}")
```
### Vision
```python
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.jpg"
}
}
]
}
]
)
```
### Embeddings
```python
response = client.embeddings.create(
model="text-embedding-3-large",
input="Your text here"
)
embedding = response.data[0].embedding
print(f"Embedding dimensions: {len(embedding)}")
```
### Assistants API
```python
# Create assistant
assistant = client.beta.assistants.create(
name="Math Tutor",
instructions="You are a math tutor. Help with math problems.",
model="gpt-4o",
tools=[{"type": "code_interpreter"}]
)
# Create thread
thread = client.beta.threads.create()
# Add message
message = client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="Solve: 3x + 11 = 14"
)
# Run assistant
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id
)
# Wait for completion
import time
while run.status != "completed":
time.sleep(1)
run = client.beta.threads.runs.retrieve(
thread_id=thread.id,
run_id=run.id
)
# Get messages
messages = client.beta.threads.messages.list(thread_id=thread.id)
for msg in messages.data:
print(f"{msg.role}: {msg.content[0].text.value}")
```
## Google Gemini Integration
### Setup
```bash
pip install google-generativeai
export GOOGLE_API_KEY='your-api-key'
```
### Basic Usage
```python
import google.generativeai as genai
genai.configure(api_key="your-api-key")
# Create model
model = genai.GenerativeModel('gemini-2.0-flash-exp')
# Generate content
response = model.generate_content("Explain machine learning")
print(response.text)
```
### Streaming
```python
response = model.generate_content(
"Write a long story",
stream=True
)
for chunk in response:
print(chunk.text, end="", flush=True)
```
### Multimodal (Vision + Text)
```python
import PIL.Image
# Load image
image = PIL.Image.open('photo.jpg')
# Analyze image
model = genai.GenerativeModel('gemini-2.0-flash-exp')
response = model.generate_content([
"What's in this image?",
image
])
print(response.text)
```
### Function Calling
```python
def get_exchange_rate(currency_from: str, currency_to: str) -> dict:
"""Get exchange rate between currencies"""
return {"rate": 1.12, "from": currency_from, "to": currency_to}
model = genai.GenerativeModel(
'gemini-2.0-flash-exp',
tools=[get_exchange_rate]
)
response = model.generate_content(
"What's the exchange rate from USD to EUR?"
)
# Check for function calls
for part in response.parts:
if fn := part.function_call:
print(f"Calling: {fn.name}")
print(f"Args: {fn.args}")
```
## Groq Integration
### Setup
```bash
pip install groq
export GROQ_API_KEY='your-api-key'
```
### Basic Usage
```python
from groq import Groq
client = Groq(api_key="your-api-key")
# Fast inference
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Explain neural networks"}
],
temperature=0.7,
max_tokens=1000
)
print(response.choices[0].message.content)
```
### Streaming
```python
stream = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
```
## Amazon Bedrock Integration
### Setup
```bash
pip install boto3
# Configure AWS credentials
aws configure
```
### Basic Usage
```python
import boto3
import json
bedrock = boto3.client(
service_name='bedrock-runtime',
region_name='us-east-1'
)
# Claude on Bedrock
body = json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Explain cloud computing"}
]
})
response = bedrock.invoke_model(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
body=body
)
response_body = json.loads(response['body'].read())
print(response_body['content'][0]['text'])
```
## Azure OpenAI Integration
### Setup
```bash
pip install openai
export AZURE_OPENAI_API_KEY='your-api-key'
export AZURE_OPENAI_ENDPOINT='https://your-resource.openai.azure.com/'
```
### Basic Usage
```python
from openai import AzureOpenAI
client = AzureOpenAI(
api_key="your-api-key",
api_version="2024-02-15-preview",
azure_endpoint="https://your-resource.openai.azure.com/"
)
response = client.chat.completions.create(
model="gpt-4o", # Your deployment name
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)
```
## Unified Interface with Agno
### Multi-Provider Support
```python
from agno.agent import Agent
from agno.models.anthropic import Claude
from agno.models.openai import OpenAIChat
from agno.models.gemini import Gemini
from agno.models.groq import Groq
# Create agents with different providers
claude_agent = Agent(
model=Claude(id="claude-3-7-sonnet-latest"),
description="Claude-powered agent"
)
openai_agent = Agent(
model=OpenAIChat(id="gpt-4o"),
description="OpenAI-powered agent"
)
gemini_agent = Agent(
model=Gemini(id="gemini-2.0-flash-exp"),
description="Gemini-powered agent"
)
groq_agent = Agent(
model=Groq(id="llama-3.3-70b-versatile"),
description="Groq-powered agent"
)
# Use the same interface
for agent in [claude_agent, openai_agent, gemini_agent, groq_agent]:
response = agent.run("Explain AI")
print(f"{agent.description}: {response.content[:100]}...")
```
## Production Patterns
### Retry Logic with Exponential Backoff
```python
import time
from functools import wraps
from anthropic import Anthropic, RateLimitError, APIError
def retry_with_backoff(max_retries=3, initial_delay=1):
"""Retry decorator with exponential backoff"""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
delay = initial_delay
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except RateLimitError as e:
if attempt == max_retries - 1:
raise
print(f"Rate limited, retrying in {delay}s...")
time.sleep(delay)
delay *= 2
except APIError as e:
if attempt == max_retries - 1:
raise
print(f"API error, retrying in {delay}s...")
time.sleep(delay)
delay *= 2
return wrapper
return decorator
@retry_with_backoff(max_retries=3)
def call_claude(prompt: str):
client = Anthropic()
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
```
### Rate Limiting
```python
import asyncio
from asyncio import Semaphore
from anthropic import AsyncAnthropic
class RateLimitedClient:
"""Rate-limited API client"""
def __init__(self, max_concurrent=5, requests_per_minute=60):
self.client = AsyncAnthropic()
self.semaphore = Semaphore(max_concurrent)
self.requests_per_minute = requests_per_minute
self.request_times = []
async def call(self, prompt: str):
"""Make rate-limited API call"""
async with self.semaphore:
# Wait if we've hit rate limit
await self._wait_if_needed()
response = await self.client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
messages=[{"role": "user", "content": prompt}]
)
self.request_times.append(asyncio.get_event_loop().time())
return response.content[0].text
async def _wait_if_needed(self):
"""Wait if rate limit would be exceeded"""
now = asyncio.get_event_loop().time()
minute_ago = now - 60
# Remove old requests
self.request_times = [t for t in self.request_times if t > minute_ago]
# Wait if needed
if len(self.request_times) >= self.requests_per_minute:
sleep_time = 60 - (now - self.request_times[0])
if sleep_time > 0:
await asyncio.sleep(sleep_time)
```
### Cost Tracking
```python
from dataclasses import dataclass
from typing import Dict
@dataclass
class TokenUsage:
"""Track token usage"""
input_tokens: int = 0
output_tokens: int = 0
cache_read_tokens: int = 0
cache_creation_tokens: int = 0
class CostTracker:
"""Track API costs"""
# Pricing per 1M tokens (adjust based on current pricing)
PRICING = {
"claude-3-7-sonnet-20250219": {
"input": 3.00,
"output": 15.00,
"cache_write": 3.75,
"cache_read": 0.30
},
"gpt-4o": {
"input": 2.50,
"output": 10.00
}
}
def __init__(self):
self.usage: Dict[str, TokenUsage] = {}
def track(self, model: str, response):
"""Track usage from response"""
if model not in self.usage:
self.usage[model] = TokenUsage()
usage = self.usage[model]
# Handle different response formats
if hasattr(response, 'usage'):
# Anthropic format
usage.input_tokens += response.usage.input_tokens
usage.output_tokens += response.usage.output_tokens
if hasattr(response.usage, 'cache_read_input_tokens'):
usage.cache_read_tokens += response.usage.cache_read_input_tokens or 0
if hasattr(response.usage, 'cache_creation_input_tokens'):
usage.cache_creation_tokens += response.usage.cache_creation_input_tokens or 0
elif hasattr(response, 'usage'):
# OpenAI format
usage.input_tokens += response.usage.prompt_tokens
usage.output_tokens += response.usage.completion_tokens
def calculate_cost(self, model: str) -> float:
"""Calculate total cost for model"""
if model not in self.usage or model not in self.PRICING:
return 0.0
usage = self.usage[model]
pricing = self.PRICING[model]
cost = 0.0
cost += (usage.input_tokens / 1_000_000) * pricing["input"]
cost += (usage.output_tokens / 1_000_000) * pricing["output"]
if "cache_write" in pricing:
cost += (usage.cache_creation_tokens / 1_000_000) * pricing["cache_write"]
if "cache_read" in pricing:
cost += (usage.cache_read_tokens / 1_000_000) * pricing["cache_read"]
return cost
def report(self):
"""Generate cost report"""
total = 0.0
print("\n=== API Cost Report ===")
for model, usage in self.usage.items():
cost = self.calculate_cost(model)
total += cost
print(f"\n{model}:")
print(f" Input tokens: {usage.input_tokens:,}")
print(f" Output tokens: {usage.output_tokens:,}")
if usage.cache_read_tokens:
print(f" Cache read: {usage.cache_read_tokens:,}")
if usage.cache_creation_tokens:
print(f" Cache write: {usage.cache_creation_tokens:,}")
print(f" Cost: ${cost:.4f}")
print(f"\nTotal cost: ${total:.4f}")
return total
```
### Environment Configuration
```python
from pydantic_settings import BaseSettings
from typing import Optional
class AIConfig(BaseSettings):
"""AI API configuration"""
# API Keys
anthropic_api_key: Optional[str] = None
openai_api_key: Optional[str] = None
google_api_key: Optional[str] = None
groq_api_key: Optional[str] = None
# Default models
default_model: str = "claude-3-7-sonnet-20250219"
# Limits
max_tokens: int = 4096
temperature: float = 0.7
timeout: int = 30
class Config:
env_file = ".env"
case_sensitive = False
# Usage
config = AIConfig()
client = Anthropic(api_key=config.anthropic_api_key)
```
## Best Practices
1. **Use environment variables for API keys** - Never hardcode
2. **Implement retry logic** - Handle transient failures
3. **Track token usage and costs** - Monitor spending
4. **Use streaming for long responses** - Better UX
5. **Cache prompts when possible** - Reduce costs
6. **Set reasonable timeouts** - Prevent hanging
7. **Handle rate limits gracefully** - Exponential backoff
8. **Validate inputs** - Use Pydantic
9. **Log API calls** - Debug and audit
10. **Use prompt caching** - Save on repeated content
## Model Selection Guide
### When to Use Claude
- Complex reasoning tasks
- Code generation and review
- Long context understanding (200K tokens)
- Safety-critical applications
- Document analysis
### When to Use GPT-4o
- General purpose tasks
- Function calling
- Vision tasks
- Fastest OpenAI model
- Good balance of cost/performance
### When to Use Gemini
- Multimodal tasks (text + image + video)
- Cost-effective for simple tasks
- Long context (2M tokens with Gemini Pro)
- Fast inference
### When to Use Groq
- Ultra-fast inference needed
- Real-time applications
- Cost-effective for simple tasks
- Open source models
## Resources
- Anthropic Docs: https://docs.anthropic.com/
- OpenAI Docs: https://platform.openai.com/docs
- Google AI Docs: https://ai.google.dev/docs
- Groq Docs: https://console.groq.com/docs
- AWS Bedrock: https://aws.amazon.com/bedrock/
- Azure OpenAI: https://azure.microsoft.com/en-us/products/ai-services/openai-service