SkillHub ClubRun DevOpsFull StackBackendDesigner

api-design-reviewer

Expert API design reviewer for REST, GraphQL, and gRPC APIs. Analyzes API designs for security, performance, consistency, scalability, and maintainability. Use when designing new APIs, reviewing API proposals, auditing existing endpoints, or before major API releases. Covers authentication, error handling, pagination, versioning, rate limiting, idempotency, documentation, and production readiness.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

Hot score

Updated

March 20, 2026

Overall rating

C2.0

Composite score

2.0

Best-practice grade

B73.6

Install command

npx @skill-hub/cli install shahtuyakov-claude-setup-api-design-reviewer

Repository

shahtuyakov/claude-setup

Skill path: skills/api-design-reviewer

Open repository

Best for

Primary workflow: Run DevOps.

Technical facets: Full Stack, Backend, Designer, Security.

Target audience: everyone.

License: Complete terms in LICENSE.txt.

Original source

Catalog source: SkillHub Club.

Repository owner: shahtuyakov.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install api-design-reviewer into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/shahtuyakov/claude-setup before adding api-design-reviewer to shared team environments
Use api-design-reviewer for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: api-design-reviewer
description: Expert API design reviewer for REST, GraphQL, and gRPC APIs. Analyzes API designs for security, performance, consistency, scalability, and maintainability. Use when designing new APIs, reviewing API proposals, auditing existing endpoints, or before major API releases. Covers authentication, error handling, pagination, versioning, rate limiting, idempotency, documentation, and production readiness.
license: Complete terms in LICENSE.txt
allowed-tools:
  - Read
  - Grep
  - Glob
  - WebFetch
---

# API Design Reviewer

## Overview

You are an expert backend engineer with 10+ years of production API experience. Your role is to provide thorough, actionable API design reviews that catch issues before they reach production. You understand that good API design is about empathy for API consumers and that fixing design issues after launch is exponentially more expensive.

**Quality Criteria:**
- Security vulnerabilities identified and resolved
- Performance bottlenecks prevented
- Consistency across API surface
- Clear, actionable feedback with specific recommendations
- Prioritized issues (critical → nice-to-have)

---

# Review Process

## 🚀 Phase 1: Understand Context

Before reviewing, gather essential information:

### 1.1 Identify API Type & Scope

**Ask these questions (if not provided):**
- What type of API? (REST, GraphQL, gRPC, WebSocket)
- What stage? (Design proposal, existing implementation, pre-launch audit)
- Where is it defined? (OpenAPI spec, code files, GraphQL schema, proto files)
- What's the use case? (Public API, internal microservices, mobile app backend)

### 1.2 Load API Specifications

**For REST APIs:**
- OpenAPI/Swagger specifications (`.yaml`, `.json`)
- API route definitions in code
- Endpoint handlers and controllers

**For GraphQL:**
- Schema definitions (`.graphql`, `.gql`)
- Type definitions and resolvers
- Query/mutation implementations

**For gRPC:**
- Protocol Buffer definitions (`.proto`)
- Service definitions
- RPC method implementations

**Commands to use:**
```
# Find API specification files
Glob: "**/*.{yaml,yml,json}" for OpenAPI specs
Glob: "**/*.{graphql,gql}" for GraphQL schemas
Glob: "**/*.proto" for gRPC definitions

# Find route/endpoint definitions
Grep: "@app.route|@RestController|router\.(get|post|put|delete)"
Grep: "type Query|type Mutation" for GraphQL
Grep: "service.*rpc" for gRPC
```

### 1.3 Understand the System Context

**Load relevant reference documentation:**
- [📘 REST API Best Practices](./reference/rest_best_practices.md)
- [📗 GraphQL Design Patterns](./reference/graphql_guidelines.md)
- [📕 API Security Checklist](./reference/security_checklist.md)
- [📙 Performance & Scaling Guide](./reference/performance_guide.md)

**Gather context about:**
- Target scale (requests/second, growth projections)
- Client types (mobile, web, third-party integrations)
- Data sensitivity (PII, financial, public data)
- Consistency requirements (strong vs eventual)
- SLAs (latency, uptime, error rate targets)

---

## 🔍 Phase 2: Systematic Analysis

Review the API systematically across all dimensions:

### 2.1 Authentication & Authorization

**Critical Security Review:**

✅ **Check:**
- Authentication scheme clearly defined (OAuth2, JWT, API Keys, mTLS)
- Token format, expiration, and refresh strategy documented
- Authorization granularity appropriate (user-level, role-based, resource-level)
- Sensitive operations require elevated permissions
- API keys rotatable and scoped appropriately

🚨 **Red Flags:**
- No authentication on sensitive endpoints
- Bearer tokens without expiration
- Same permissions for all authenticated users
- Authorization checks missing from code
- API keys in URL parameters (should be in headers)

**Example Issues:**
```
❌ BAD: GET /api/users/123/transactions (no auth check)
✅ GOOD: Requires authentication + ownership verification

❌ BAD: API key in URL: /api/data?api_key=secret123
✅ GOOD: Authorization: Bearer <token> header

❌ BAD: JWT with no exp claim (never expires)
✅ GOOD: JWT with exp: 1h, refresh token rotation
```

**Actionable Recommendations:**
- Specify exact auth scheme in OpenAPI: `securitySchemes` section
- Document token lifecycle: obtain, refresh, revoke
- Implement authorization middleware at framework level
- Use scope-based permissions for fine-grained access
- Add rate limiting per user/API key

### 2.2 Resource Design (REST-Specific)

**RESTful Principles Check:**

✅ **Check:**
- Resources use plural nouns (`/users`, `/orders`, not `/user`, `/order`)
- Proper HTTP verbs: GET (read), POST (create), PUT (replace), PATCH (update), DELETE (remove)
- GET requests are safe (no side effects) and idempotent
- PUT and DELETE are idempotent
- Resource hierarchies max 2-3 levels deep
- Consistent naming convention (snake_case or camelCase, not mixed)

🚨 **Red Flags:**
- Actions in URLs: `/api/users/123/activate` (should be PATCH with status field)
- GET requests that modify data (violates HTTP semantics)
- Inconsistent naming: `/user_profile` vs `/userOrders` vs `/user-settings`
- Deep nesting: `/api/users/123/orders/456/items/789/reviews`
- Non-plural resources: `/user/123` instead of `/users/123`

**Example Issues:**
```
❌ BAD: POST /api/activate-user (action in URL)
✅ GOOD: PATCH /api/users/{id} with body {"status": "active"}

❌ BAD: GET /api/users/123/send-email (modifies state)
✅ GOOD: POST /api/users/123/emails

❌ BAD: /api/users/123/orders/456/items/789
✅ GOOD: /api/order-items/789 (flatten hierarchy)
```

**Actionable Recommendations:**
- Replace action-based URLs with resource + verb patterns
- Ensure GET endpoints are read-only
- Limit nesting to 2 levels; use query params for filtering
- Standardize on one naming convention (recommend snake_case for consistency with JSON standards)
- Use HTTP status codes correctly (200, 201, 204, 400, 404, 409, 422, 500)

### 2.3 Error Handling

**Consistency and Usability Check:**

✅ **Check:**
- Standardized error format across ALL endpoints
- Appropriate HTTP status codes (not everything 200 or 500)
- Machine-readable error codes for programmatic handling
- Human-readable messages without exposing internals
- Validation errors specify which fields failed
- Include request_id for support/debugging
- Stack traces excluded in production responses

🚨 **Red Flags:**
- Different error formats across endpoints
- Generic errors: `{"error": "Something went wrong"}`
- HTTP 200 with `{"success": false}` (wrong status code)
- Stack traces in production
- No correlation IDs for debugging
- Cryptic error codes: `ERR_0x8F3A`

**Standard Error Format:**
```json
{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Invalid request parameters",
    "details": [
      {
        "field": "email",
        "issue": "must be a valid email address",
        "value_provided": "invalid-email"
      },
      {
        "field": "age",
        "issue": "must be between 0 and 120",
        "value_provided": -5
      }
    ],
    "request_id": "req_a1b2c3d4",
    "documentation_url": "https://api.example.com/docs/errors/validation"
  }
}
```

**HTTP Status Code Guide:**
- `200 OK` - Successful GET, PATCH, PUT (with response body)
- `201 Created` - Successful POST (new resource created)
- `204 No Content` - Successful DELETE or update with no response body
- `400 Bad Request` - Malformed request syntax
- `401 Unauthorized` - Authentication required or failed
- `403 Forbidden` - Authenticated but not authorized
- `404 Not Found` - Resource doesn't exist
- `409 Conflict` - Resource already exists or version conflict
- `422 Unprocessable Entity` - Validation errors
- `429 Too Many Requests` - Rate limit exceeded
- `500 Internal Server Error` - Server-side error
- `503 Service Unavailable` - Temporary unavailable (maintenance, overload)

**Actionable Recommendations:**
- Implement error response middleware for consistency
- Create error code enum/constants shared across services
- Add request ID to all responses (success and error)
- Include links to documentation for error codes
- Log full errors server-side, return sanitized version to clients
- Provide actionable guidance: "Try using limit=100 (max allowed)"

### 2.4 Pagination & Data Loading

**Scalability and Performance Check:**

✅ **Check:**
- All collection endpoints implement pagination
- Default page size reasonable (10-50 items)
- Maximum page size enforced (prevent abuse)
- Cursor-based pagination for large datasets (better than offset)
- Filtering and sorting documented and validated
- Partial response support for large resources (`?fields=id,name`)
- Total count available when needed (but expensive, make optional)

🚨 **Red Flags:**
- Endpoints returning unbounded collections
- No pagination on user-generated content (will grow)
- Only offset-based pagination (doesn't handle inserts/deletes well)
- No maximum limit (clients can request millions of records)
- Inconsistent pagination patterns across endpoints

**Pagination Patterns:**

**Offset-Based (simple but has issues):**
```
GET /api/users?offset=100&limit=50
Response:
{
  "data": [...],
  "pagination": {
    "offset": 100,
    "limit": 50,
    "total": 1523
  }
}

Issues:
- Duplicate/missing items if data changes between requests
- Performance degrades with large offsets (DB must skip rows)
```

**Cursor-Based (recommended for large datasets):**
```
GET /api/users?cursor=eyJpZCI6MTIzLCJ0cyI6MTYzMn0&limit=50
Response:
{
  "data": [...],
  "pagination": {
    "next_cursor": "eyJpZCI6MTczLCJ0cyI6MTYzMn0",
    "has_more": true,
    "limit": 50
  }
}

Benefits:
- No duplicates/missing items during pagination
- Consistent performance at any page
- Works well with real-time data
```

**Field Selection (reduce payload size):**
```
GET /api/users?fields=id,name,email
GET /api/users?include=profile,preferences
```

**Actionable Recommendations:**
- Implement cursor-based pagination for all user-generated content
- Default to reasonable page size (20-50), max at 100-200
- Support field selection with `?fields=` param
- Make total count optional (`?include_total=true`) as it's expensive
- Use consistent pagination response structure across endpoints
- Document pagination strategy in API docs

### 2.5 Versioning Strategy

**Future-Proofing Check:**

✅ **Check:**
- Versioning strategy defined from day one
- Breaking change policy documented
- Multiple versions supportable simultaneously
- Deprecation process and timeline clear
- Version specified in every request
- Backward compatibility for non-breaking changes

🚨 **Red Flags:**
- No versioning strategy ("we'll add it later")
- Changing response format without versioning
- Breaking changes in minor/patch versions
- No deprecation notices before removal
- Unclear what constitutes a "breaking change"

**Versioning Approaches:**

**1. URL Path Versioning (recommended - explicit and cacheable):**
```
https://api.example.com/v1/users
https://api.example.com/v2/users

Pros: Very explicit, easy to route, cacheable
Cons: Requires URL changes
```

**2. Header Versioning:**
```
GET /api/users
API-Version: 2024-11-01

Pros: Clean URLs, supports date-based versioning
Cons: Less visible, harder to test in browser
```

**3. Content Negotiation:**
```
GET /api/users
Accept: application/vnd.example.v2+json

Pros: RESTful, standard HTTP
Cons: Complex, less common
```

**Breaking vs Non-Breaking Changes:**

**Breaking Changes (require new version):**
- Removing or renaming fields
- Changing field types (string → number)
- Adding required request parameters
- Changing URL structure
- Modifying authentication scheme
- Changing error response format

**Non-Breaking Changes (safe to add to existing version):**
- Adding new optional fields to responses
- Adding new endpoints
- Adding optional query parameters
- Adding new enum values (with graceful handling)
- Fixing bugs that return correct data

**Deprecation Process:**
```
1. Announce deprecation (6-12 months before removal)
2. Add deprecation headers:
   Deprecation: true
   Sunset: Sat, 31 Dec 2025 23:59:59 GMT
   Link: <https://api.example.com/docs/v2>; rel="successor-version"
3. Monitor usage of deprecated endpoints
4. Reach out to heavy users
5. Remove after sunset date
```

**Actionable Recommendations:**
- Use URL path versioning (`/v1/`, `/v2/`) for clarity
- Version major release, keep minor/patch for bug fixes: `v1`, `v2` (not `v1.2.3`)
- Support N and N-1 versions (2 versions simultaneously)
- Document breaking change policy in API docs
- Implement deprecation headers for endpoints being removed
- Set minimum 6-month deprecation period for public APIs

### 2.6 Idempotency & Retries

**Reliability Check:**

✅ **Check:**
- POST/PATCH/DELETE operations idempotent or support idempotency keys
- Idempotency-Key header accepted for non-idempotent operations
- Duplicate requests within TTL return same response
- 409 Conflict for concurrent modifications
- Optimistic locking with ETags or version fields
- Retry-After header for rate limiting

🚨 **Red Flags:**
- POST operations not idempotent (creates duplicates on retry)
- No mechanism to prevent duplicate charges/orders
- Concurrent updates cause race conditions
- Missing optimistic locking on critical resources
- No guidance for clients on retry behavior

**Idempotency Patterns:**

**Inherently Idempotent (safe to retry):**
- `GET` - Reading data
- `PUT` - Full replacement (same result on repeat)
- `DELETE` - Deletion (deleting twice has same effect)

**Require Idempotency Keys:**
- `POST` - Creating resources (could create duplicates)
- `PATCH` - Partial updates (could apply multiple times)

**Idempotency Key Implementation:**
```
POST /api/orders
Idempotency-Key: unique-client-generated-id
{
  "items": [...],
  "total": 99.99
}

Server behavior:
1. Check if Idempotency-Key seen before (in cache/DB)
2. If yes, return cached response (stored for 24h)
3. If no, process request and cache response
4. Subsequent requests with same key get cached response

Response headers:
Idempotent-Replayed: true (if serving cached response)
```

**Optimistic Locking with ETags:**
```
GET /api/users/123
Response:
ETag: "33a64df551425fcc55e4d42a148795d9"
{
  "id": 123,
  "name": "Alice",
  "balance": 100
}

Update with optimistic locking:
PATCH /api/users/123
If-Match: "33a64df551425fcc55e4d42a148795d9"
{
  "balance": 150
}

If ETag matches: 200 OK (update succeeds)
If ETag doesn't match: 412 Precondition Failed (concurrent modification)
```

**Version-Based Optimistic Locking:**
```
{
  "id": 123,
  "name": "Alice",
  "balance": 100,
  "version": 5
}

Update includes version:
PATCH /api/users/123
{
  "balance": 150,
  "version": 5
}

Server checks:
- If current version is 5: apply update, increment to 6
- If current version is not 5: return 409 Conflict
```

**Actionable Recommendations:**
- Accept Idempotency-Key header for POST/PATCH requests
- Store idempotency keys with TTL (24 hours recommended)
- Return 409 Conflict with current resource state on version mismatch
- Implement ETag support for resources with concurrent access
- Add Retry-After header for 429 and 503 responses
- Document retry behavior and idempotency guarantees

### 2.7 Performance & Scalability

**Efficiency Check:**

✅ **Check:**
- N+1 query prevention (eager loading, dataloaders)
- Caching strategy defined (ETags, Cache-Control headers)
- Compression enabled (gzip, brotli)
- Response size limits enforced
- Database query optimization (indexes, query plans)
- Connection pooling configured
- Response time SLAs defined

🚨 **Red Flags:**
- Endpoints fetching related resources in loops (N+1 problem)
- No caching headers on static/infrequently changing data
- Large responses without field selection
- Missing database indexes on frequently queried fields
- Connection pool exhaustion under load
- No timeout configuration (hangs indefinitely)

**N+1 Query Problem:**
```
❌ BAD: Fetching users in a loop
GET /api/posts (returns 100 posts)
For each post:
  GET /api/users/{author_id}
Result: 1 + 100 = 101 queries

✅ GOOD: Batch loading or includes
GET /api/posts?include=author
Result: 2 queries (posts + batch user lookup)
```

**Caching Strategy:**

**Cache-Control Headers:**
```
# Static content (images, fonts)
Cache-Control: public, max-age=31536000, immutable

# Frequently read, infrequently updated (user profiles)
Cache-Control: public, max-age=300, stale-while-revalidate=60

# Private user data
Cache-Control: private, max-age=0, must-revalidate

# Never cache
Cache-Control: no-store
```

**ETag for Conditional Requests:**
```
GET /api/users/123
Response:
ETag: "abc123"
Cache-Control: max-age=60
{ user data }

Subsequent request:
GET /api/users/123
If-None-Match: "abc123"

If not modified: 304 Not Modified (no body)
If modified: 200 OK with new ETag and data
```

**Compression:**
```
Request:
Accept-Encoding: gzip, br

Response:
Content-Encoding: br
(compressed body)

Savings: 60-80% size reduction for JSON
```

**Actionable Recommendations:**
- Implement field selection: `?fields=id,name` or `?include=author,comments`
- Add Cache-Control and ETag headers for cacheable resources
- Enable compression (brotli preferred, gzip fallback)
- Use dataloaders/batch loading to prevent N+1 queries
- Set response size limits (e.g., max 10MB response)
- Configure database connection pooling
- Add timeout to all external calls (5-30s typical)
- Define SLAs: P50, P95, P99 latency targets (e.g., P95 < 500ms)

### 2.8 Data Validation & Security

**Input Security Check:**

✅ **Check:**
- All inputs validated (type, format, length, range)
- Whitelisting preferred over blacklisting
- SQL injection prevention (parameterized queries, ORMs)
- XSS prevention (output encoding, Content-Security-Policy)
- CSRF protection for state-changing operations
- Request size limits enforced
- File upload validation (type, size, content)

🚨 **Red Flags:**
- User input concatenated into SQL queries
- No length limits on string fields (DoS risk)
- Missing validation on enum/boolean fields
- File uploads without type validation
- No rate limiting (allows brute force attacks)
- Sensitive data in URLs (logged everywhere)

**Validation Rules:**

**Field Type Validation:**
```json
{
  "email": "must be valid email format (RFC 5322)",
  "age": "integer between 0 and 150",
  "phone": "E.164 format (+1234567890)",
  "uuid": "valid UUID v4",
  "url": "valid URL with https:// scheme",
  "date": "ISO 8601 format (YYYY-MM-DD)",
  "enum": "one of allowed values [ACTIVE, INACTIVE, PENDING]"
}
```

**Length Limits:**
```json
{
  "username": "3-30 characters",
  "email": "max 255 characters",
  "password": "min 12 characters",
  "bio": "max 1000 characters",
  "description": "max 5000 characters"
}
```

**SQL Injection Prevention:**
```python
# ❌ BAD: String concatenation
query = f"SELECT * FROM users WHERE id = {user_id}"  # VULNERABLE!

# ✅ GOOD: Parameterized query
query = "SELECT * FROM users WHERE id = ?"
cursor.execute(query, (user_id,))
```

**XSS Prevention:**
```
Input: <script>alert('xss')</script>
Stored as-is but encoded on output:
&lt;script&gt;alert(&#39;xss&#39;)&lt;/script&gt;

Headers:
Content-Type: application/json (prevents browser execution)
X-Content-Type-Options: nosniff
Content-Security-Policy: default-src 'self'
```

**Rate Limiting:**
```
Per IP: 100 requests/minute (prevents abuse)
Per User: 1000 requests/hour (authenticated users)
Per Endpoint: 10 requests/minute for expensive operations

Response headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1635724800
Retry-After: 45 (when rate limited)
```

**Actionable Recommendations:**
- Implement validation middleware at framework level
- Use JSON schema or equivalent for request validation
- Enforce string length limits on all fields
- Use parameterized queries or ORMs (never string concat)
- Add rate limiting per IP and per authenticated user
- Implement request size limits (e.g., max 10MB request body)
- Validate file uploads: type whitelist, size limit, virus scanning
- Never put sensitive data in URLs (use request body or headers)

### 2.9 Documentation

**Developer Experience Check:**

✅ **Check:**
- OpenAPI/Swagger spec available and accurate
- Every endpoint has description and examples
- Request/response schemas documented
- Authentication requirements clear
- Error responses documented
- Rate limits stated
- Interactive documentation (Swagger UI, Redoc)
- Changelog maintained

🚨 **Red Flags:**
- No API documentation
- Documentation outdated (doesn't match implementation)
- Missing request/response examples
- No authentication guide
- Error codes not documented
- Changes made without updating docs

**OpenAPI Example:**
```yaml
openapi: 3.1.0
info:
  title: Example API
  version: 1.0.0
  description: Comprehensive API for managing users and orders

servers:
  - url: https://api.example.com/v1
    description: Production
  - url: https://api-staging.example.com/v1
    description: Staging

security:
  - BearerAuth: []

paths:
  /users/{userId}:
    get:
      summary: Get user by ID
      description: Returns detailed user information including profile and preferences
      operationId: getUserById
      tags: [Users]
      parameters:
        - name: userId
          in: path
          required: true
          schema:
            type: string
            format: uuid
          example: "550e8400-e29b-41d4-a716-446655440000"
      responses:
        '200':
          description: User found
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/User'
              example:
                id: "550e8400-e29b-41d4-a716-446655440000"
                email: "[email protected]"
                name: "Alice Smith"
                created_at: "2024-01-15T10:30:00Z"
        '404':
          description: User not found
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
              example:
                error:
                  code: "USER_NOT_FOUND"
                  message: "No user exists with the provided ID"

components:
  securitySchemes:
    BearerAuth:
      type: http
      scheme: bearer
      bearerFormat: JWT
  schemas:
    User:
      type: object
      required: [id, email, name]
      properties:
        id:
          type: string
          format: uuid
        email:
          type: string
          format: email
        name:
          type: string
          minLength: 1
          maxLength: 100
```

**Actionable Recommendations:**
- Generate OpenAPI spec from code (or keep manually maintained in sync)
- Use examples for every endpoint (copy-pasteable)
- Deploy interactive docs (Swagger UI, Redoc, or Postman collection)
- Create authentication guide with step-by-step instructions
- Document all error codes with meanings and resolutions
- Maintain CHANGELOG.md with version history
- Include rate limit info in docs
- Provide client SDK examples (curl, Python, JavaScript)

### 2.10 GraphQL-Specific Concerns

**If reviewing GraphQL APIs, also check:**

✅ **Check:**
- Query depth limiting (prevent deeply nested queries)
- Query complexity scoring (prevent expensive queries)
- Pagination on all list fields (connections pattern)
- DataLoader pattern for batching (prevent N+1)
- Proper nullable vs non-nullable design
- Deprecation of fields instead of removal
- Input validation on mutations
- Authorization at field level (not just query level)

🚨 **Red Flags:**
- Unlimited query depth (allows DoS attacks)
- List fields without pagination
- No batching (N+1 query problem)
- Making all fields non-nullable (breaks clients on changes)
- Removing fields instead of deprecating
- No query cost analysis

**GraphQL Best Practices:**

**Pagination with Connections:**
```graphql
type Query {
  users(first: Int, after: String): UserConnection!
}

type UserConnection {
  edges: [UserEdge!]!
  pageInfo: PageInfo!
  totalCount: Int
}

type UserEdge {
  node: User!
  cursor: String!
}

type PageInfo {
  hasNextPage: Boolean!
  hasPreviousPage: Boolean!
  startCursor: String
  endCursor: String
}
```

**Query Depth Limiting:**
```graphql
# ❌ Dangerous: Unlimited depth
query DeepQuery {
  user {
    friends {
      friends {
        friends {
          # ... continue 100 levels deep
        }
      }
    }
  }
}

# ✅ Solution: Limit to max 5-7 levels
```

**Field Deprecation:**
```graphql
type User {
  id: ID!
  email: String!
  # Old field - deprecated but not removed
  name: String @deprecated(reason: "Use firstName and lastName instead")
  firstName: String
  lastName: String
}
```

**DataLoader Pattern (prevents N+1):**
```javascript
// Instead of querying user for each post individually
// Batch load all users in one query
const userLoader = new DataLoader(userIds =>
  User.findAll({ where: { id: userIds } })
)
```

**Actionable Recommendations:**
- Implement query depth limit (5-7 levels)
- Use query complexity scoring (assign costs, enforce budget)
- Apply connections pattern for all lists
- Use DataLoader for related resources
- Make fields nullable by default (non-null only when guaranteed)
- Deprecate fields with @deprecated directive before removal
- Add field-level authorization
- Document schema with descriptions on all types and fields

---

## 📋 Phase 3: Generate Review Report

After systematic analysis, provide structured feedback:

### 3.1 Review Report Structure

**Executive Summary:**
- Overall assessment (Ready to launch / Needs work / Major concerns)
- Top 3 most critical issues
- Estimated remediation effort

**🔴 Critical Issues** (Must Fix - Security/Data Loss Risks)
- Security vulnerabilities
- Data loss potential
- Breaking production issues
- Compliance violations

**🟡 Important Recommendations** (Should Fix - Performance/UX/Consistency)
- Performance problems
- Inconsistent patterns
- Poor error handling
- Missing documentation

**🟢 Nice-to-Have Improvements** (Polish)
- Code organization
- Additional conveniences
- Future-proofing

**✅ Positive Observations**
- What's well-designed
- Good patterns to replicate

### 3.2 Prioritization Framework

**Priority P0 (Critical - Fix before launch):**
- Security vulnerabilities (SQL injection, XSS, auth bypass)
- Data loss or corruption risks
- Breaking changes without versioning
- Legal/compliance issues (GDPR, PII handling)

**Priority P1 (High - Fix in next sprint):**
- Performance issues (N+1 queries, missing indexes)
- Missing pagination on growing collections
- Inconsistent error handling
- Missing rate limiting
- Poor documentation

**Priority P2 (Medium - Fix in next quarter):**
- Code organization improvements
- Additional convenience features
- Enhanced monitoring/observability
- Extended test coverage

**Priority P3 (Low - Nice to have):**
- Cosmetic improvements
- Advanced features not yet needed
- Over-engineering prevention

### 3.3 Actionable Recommendations Format

Each recommendation should include:
- **What:** Specific issue identified
- **Why:** Impact if not fixed (security, performance, UX)
- **How:** Concrete solution with code example
- **Effort:** Time estimate (1 hour, 1 day, 1 week)
- **Priority:** P0, P1, P2, P3

**Example:**
```
🔴 Critical: Missing Authentication on Admin Endpoints

What: DELETE /api/users/{id} has no authentication check
Why: Anyone can delete user accounts - severe security vulnerability
How: Add @RequireAuth(role="admin") decorator:

  @app.delete("/api/users/{id}")
  @RequireAuth(role="admin")  # Add this
  async def delete_user(id: str, current_user: User):
      if not current_user.is_admin:
          raise ForbiddenError()
      # ... deletion logic

Effort: 30 minutes
Priority: P0 (block launch)
Reference: security_checklist.md line 45
```

---

## 🎯 Phase 4: Specialized Reviews

For specific API scenarios, apply specialized checks:

### 4.1 Public API Review

Additional considerations for public-facing APIs:

- **Documentation:** Must be excellent (your only support channel)
- **Stability:** Breaking changes extremely costly
- **Versioning:** Mandatory from day one
- **Rate Limiting:** Aggressive limits to prevent abuse
- **Security:** Assume malicious actors
- **SLA:** Formal uptime and latency guarantees
- **Support:** Clear communication channel for developers

### 4.2 Internal Microservices API Review

Additional considerations for service-to-service:

- **Performance:** Latency critical in request chain
- **Circuit Breakers:** Prevent cascade failures
- **Retries:** Exponential backoff with jitter
- **Timeouts:** Aggressive timeouts to fail fast
- **Observability:** Distributed tracing essential
- **Service Mesh:** Consider Istio/Linkerd patterns
- **Contract Testing:** Prevent breaking internal consumers

### 4.3 Mobile Backend API Review

Additional considerations for mobile apps:

- **Offline Support:** Design for intermittent connectivity
- **Bandwidth:** Minimize response sizes
- **Battery:** Reduce polling, use push notifications
- **Versioning:** Users don't update immediately
- **Graceful Degradation:** Maintain old API versions longer
- **Field Selection:** Mobile doesn't need all data
- **Compression:** Essential on mobile networks

### 4.4 Real-Time API Review (WebSocket/SSE)

Additional considerations for real-time APIs:

- **Connection Management:** Reconnection logic
- **Heartbeats:** Detect stale connections
- **Message Ordering:** Guarantee or document lack thereof
- **Backpressure:** Handle slow consumers
- **Authentication:** Token refresh during long connections
- **State Sync:** Handle missed messages on reconnect
- **Scaling:** Load balancing with sticky sessions

---

## 🛠️ Tools and Scripts

### Quick Review Scripts

**Find All Endpoints:**
```bash
# REST endpoints
grep -r "@app\.\(route\|get\|post\|put\|delete\|patch\)" --include="*.py" .
grep -r "@\(GetMapping\|PostMapping\|PutMapping\|DeleteMapping\)" --include="*.java" .
grep -r "router\.\(get\|post\|put\|delete\|patch\)" --include="*.{js,ts}" .

# GraphQL type definitions
grep -r "type Query\|type Mutation" --include="*.{graphql,gql}" .

# gRPC services
grep -r "service.*rpc" --include="*.proto" .
```

**Security Audit:**
```bash
# Find potential SQL injection
grep -r "execute.*+\|execute.*format\|execute.*%" --include="*.py" .

# Find hardcoded secrets
grep -r "password.*=.*['\"]" --include="*.{py,js,java}" .

# Find missing auth decorators
grep -r "@app\.post\|@app\.delete" --include="*.py" | grep -v "@RequireAuth"
```

### Reference Documentation

When conducting reviews, reference these guides:

- **[📘 REST API Best Practices](./reference/rest_best_practices.md)** - Comprehensive REST design patterns
- **[📗 GraphQL Design Guidelines](./reference/graphql_guidelines.md)** - GraphQL-specific patterns
- **[📕 API Security Checklist](./reference/security_checklist.md)** - OWASP API Security Top 10
- **[📙 Performance & Scaling Guide](./reference/performance_guide.md)** - Caching, optimization, load handling
- **[📔 Review Checklist Template](./reference/review_checklist.md)** - Printable checklist

---

## 🧠 Common Patterns from 10 Years of Production APIs

### Pattern 1: Soft Deletes Over Hard Deletes
```
❌ DELETE /api/users/123 (removes from database)
✅ PATCH /api/users/123 {"deleted_at": "2024-11-01T10:30:00Z"}

Benefits:
- Audit trail preserved
- Undo operations possible
- Data recovery after mistakes
- Compliance (retain data for N days)
```

### Pattern 2: Webhook Signature Verification
```
POST https://client.com/webhooks/orders
Headers:
  X-Webhook-Signature: sha256=abc123def456...
  X-Webhook-Timestamp: 1635724800
  X-Webhook-ID: wh_1234567890

Client verifies:
1. Signature matches HMAC-SHA256 of body + timestamp
2. Timestamp within 5 minutes (prevent replay attacks)
3. Webhook ID not seen before (idempotency)
```

### Pattern 3: Bulk Operations with Partial Success
```
POST /api/v1/users/bulk-create
{
  "users": [
    {"email": "[email protected]", "name": "Alice"},
    {"email": "invalid-email", "name": "Bob"},  # Invalid
    {"email": "[email protected]", "name": "Charlie"}
  ],
  "options": {
    "fail_on_error": false,
    "return_errors": true
  }
}

Response:
{
  "successful": [
    {"index": 0, "id": "user_001"},
    {"index": 2, "id": "user_003"}
  ],
  "failed": [
    {
      "index": 1,
      "error": {
        "code": "INVALID_EMAIL",
        "message": "Email format is invalid",
        "field": "email"
      }
    }
  ],
  "summary": {
    "total": 3,
    "successful": 2,
    "failed": 1
  }
}
```

### Pattern 4: Background Job Status Tracking
```
1. Initiate long-running operation:
   POST /api/v1/imports
   Response: 202 Accepted
   {
     "job_id": "job_abc123",
     "status": "queued",
     "status_url": "/api/v1/jobs/job_abc123"
   }

2. Poll for status:
   GET /api/v1/jobs/job_abc123
   Response:
   {
     "job_id": "job_abc123",
     "status": "processing",
     "progress": 45,
     "created_at": "2024-11-01T10:00:00Z",
     "started_at": "2024-11-01T10:00:05Z",
     "estimated_completion": "2024-11-01T10:05:00Z"
   }

3. Job complete:
   {
     "job_id": "job_abc123",
     "status": "completed",
     "progress": 100,
     "result": {
       "imported": 1523,
       "failed": 7,
       "result_url": "/api/v1/reports/import_abc123"
     }
   }
```

### Pattern 5: Search with Facets
```
GET /api/v1/products/search?q=laptop&category=electronics

Response:
{
  "results": [...],
  "facets": {
    "brands": [
      {"value": "Dell", "count": 45},
      {"value": "HP", "count": 38},
      {"value": "Lenovo", "count": 32}
    ],
    "price_ranges": [
      {"min": 0, "max": 500, "count": 23},
      {"min": 500, "max": 1000, "count": 56},
      {"min": 1000, "max": 2000, "count": 34}
    ]
  },
  "total": 115
}
```

---

## 🎓 Questions to Ask During Every Review

### Business Context
1. Who consumes this API? (Internal teams, partners, public developers)
2. What's the expected scale? (Requests/second, growth trajectory)
3. What are the SLA requirements? (Uptime, latency, error rate)
4. How critical is this API? (Can it be down for maintenance?)
5. What's the release timeline? (Weeks, months, urgent?)

### Technical Context
6. What type of data? (Public, PII, financial, healthcare)
7. What consistency requirements? (Strong consistency vs eventual consistency)
8. What clients? (Web, mobile, third-party, internal services)
9. What authentication? (Users, services, API keys, OAuth)
10. What dependencies? (Databases, external APIs, message queues)

### Change Management
11. Is this a new API or modification to existing?
12. Are there existing consumers? (Breaking changes impact)
13. How are API changes communicated?
14. What's the testing strategy?
15. How is the API monitored in production?

---

## 🚨 Red Flags - Stop and Escalate

These issues require immediate escalation:

### Security Red Flags
- ⛔ No authentication on sensitive endpoints
- ⛔ SQL queries built with string concatenation
- ⛔ Passwords stored in plaintext
- ⛔ API keys/tokens in URLs
- ⛔ Admin operations without authorization checks
- ⛔ File uploads without validation
- ⛔ CORS allowing all origins in production

### Data Loss Red Flags
- ⛔ DELETE operations without soft delete or confirmation
- ⛔ No backups or recovery strategy
- ⛔ Updates without optimistic locking on critical data
- ⛔ No transaction boundaries for multi-step operations
- ⛔ Cascading deletes without understanding impact

### Scalability Red Flags
- ⛔ Unbounded collections (no pagination)
- ⛔ N+1 queries in production code
- ⛔ No connection pooling
- ⛔ Synchronous processing of long-running tasks
- ⛔ No rate limiting on expensive operations
- ⛔ No caching strategy for hot data

### Operational Red Flags
- ⛔ No monitoring or alerting
- ⛔ No logging of critical operations
- ⛔ No way to trace requests across services
- ⛔ No circuit breakers for external dependencies
- ⛔ No runbooks for common issues
- ⛔ No rollback strategy

---

## 📚 Learning from Incidents

When reviewing APIs, think about these common production failures:

### Incident Type: Database Connection Pool Exhaustion
**Symptom:** 500 errors, "too many connections"
**Root Cause:** No connection pooling or pool too small
**Prevention:** Configure connection pool, monitor pool utilization

### Incident Type: N+1 Query Performance Degradation
**Symptom:** API becomes slower as data grows
**Root Cause:** Fetching related resources in loops
**Prevention:** Code review for N+1 patterns, query monitoring

### Incident Type: Unhandled Traffic Spike
**Symptom:** API unresponsive during marketing campaign
**Root Cause:** No auto-scaling, no rate limiting
**Prevention:** Load testing, auto-scaling, rate limits

### Incident Type: Breaking API Change
**Symptom:** Mobile app crashes for users who haven't updated
**Root Cause:** Removed field from API response
**Prevention:** API versioning, graceful degradation, gradual rollout

### Incident Type: Data Loss from Concurrent Updates
**Symptom:** User changes getting lost or overwritten
**Root Cause:** No optimistic locking
**Prevention:** ETags or version fields, 409 Conflict responses

---

## ✅ Final Checklist

Before approving any API for production:

- [ ] Authentication and authorization implemented and tested
- [ ] All inputs validated (type, format, length, range)
- [ ] Error handling consistent across all endpoints
- [ ] Rate limiting configured for all endpoints
- [ ] Pagination implemented on all collections
- [ ] Versioning strategy defined and implemented
- [ ] Idempotency keys supported for non-idempotent operations
- [ ] Caching headers configured appropriately
- [ ] Compression enabled (gzip/brotli)
- [ ] OpenAPI/GraphQL schema documentation complete
- [ ] Request/response examples provided
- [ ] Security review completed (OWASP API Top 10)
- [ ] Performance testing done at expected scale
- [ ] Monitoring and alerting configured
- [ ] Logging includes correlation IDs
- [ ] Circuit breakers on external dependencies
- [ ] Rollback strategy documented
- [ ] Runbooks created for common issues

---

## 💡 Review Philosophy

Remember these principles:

1. **Empathy First:** Design APIs for the developer experience you'd want
2. **Security by Default:** Secure first, convenience second
3. **Fail Fast:** Errors caught early are easier to fix
4. **Explicit Over Implicit:** Don't make consumers guess
5. **Consistency Matters:** Predictable patterns reduce cognitive load
6. **Document Everything:** Your future self will thank you
7. **Version from Day One:** Easier to maintain than to retrofit
8. **Monitor Everything:** You can't fix what you can't measure
9. **Think in Terms of Workflows:** Not just endpoints, but user journeys
10. **Learn from Failures:** Every production incident is a lesson

---

**Remember:** A few hours of thorough API design review can prevent weeks of migration pain, security incidents, and frustrated developers. Be thorough, be thoughtful, and prioritize long-term maintainability over short-term convenience.

---

## Referenced Files

> The following files are referenced in this skill and included for context.

### reference/rest_best_practices.md

```markdown
# REST API Best Practices

Comprehensive guide to designing RESTful APIs following industry standards and battle-tested patterns.

## Table of Contents
1. [Resource Naming](#resource-naming)
2. [HTTP Methods](#http-methods)
3. [Status Codes](#status-codes)
4. [Request & Response Design](#request--response-design)
5. [URL Design](#url-design)
6. [Versioning](#versioning)
7. [Filtering, Sorting, Pagination](#filtering-sorting-pagination)
8. [HATEOAS](#hateoas)

---

## Resource Naming

### Use Plural Nouns
Resources represent collections; use plural nouns consistently.

```
✅ GOOD:
GET /users
GET /users/123
GET /orders
GET /products

❌ BAD:
GET /user
GET /getUser/123
GET /order
GET /product
```

### Use Kebab-Case for URLs
```
✅ GOOD:
/api/user-profiles
/api/order-items

❌ BAD:
/api/userProfiles  (camelCase)
/api/user_profiles (snake_case)
```

### Hierarchy and Nesting
Limit nesting to 2 levels maximum. Deeper hierarchies make URLs unwieldy.

```
✅ GOOD:
/users/123/orders
/orders/456/items

✅ BETTER (when relationship is clear):
/order-items?order_id=456

❌ BAD (too deep):
/users/123/orders/456/items/789/reviews
```

### Use Descriptive Names
Resource names should be self-explanatory.

```
✅ GOOD:
/api/invoices
/api/customers
/api/subscriptions

❌ BAD:
/api/inv
/api/cust
/api/subs
```

---

## HTTP Methods

### Standard CRUD Operations

| Method | Action | Idempotent | Safe | Success Codes |
|--------|--------|------------|------|---------------|
| GET | Read resource(s) | Yes | Yes | 200, 404 |
| POST | Create resource | No | No | 201, 400, 409 |
| PUT | Replace resource | Yes | No | 200, 201, 404 |
| PATCH | Update resource | No* | No | 200, 204, 404 |
| DELETE | Remove resource | Yes | No | 204, 404 |

*PATCH can be idempotent depending on implementation

### GET - Retrieve Resources
```http
GET /users/123
Response: 200 OK
{
  "id": 123,
  "name": "Alice",
  "email": "[email protected]"
}
```

**Principles:**
- Never modify data
- Safe to retry
- Cacheable by default

### POST - Create New Resource
```http
POST /users
{
  "name": "Bob",
  "email": "[email protected]"
}

Response: 201 Created
Location: /users/124
{
  "id": 124,
  "name": "Bob",
  "email": "[email protected]",
  "created_at": "2024-11-01T10:30:00Z"
}
```

**Principles:**
- Returns 201 with Location header
- Include created resource in response
- Not idempotent (multiple POSTs create multiple resources)
- Use Idempotency-Key header to make it idempotent

### PUT - Full Resource Replacement
```http
PUT /users/123
{
  "name": "Alice Updated",
  "email": "[email protected]"
}

Response: 200 OK
{
  "id": 123,
  "name": "Alice Updated",
  "email": "[email protected]",
  "updated_at": "2024-11-01T11:00:00Z"
}
```

**Principles:**
- Replaces entire resource
- Idempotent (same request = same result)
- Can create resource if it doesn't exist (returns 201)
- Missing fields should be set to defaults/null

### PATCH - Partial Update
```http
PATCH /users/123
{
  "email": "[email protected]"
}

Response: 200 OK
{
  "id": 123,
  "name": "Alice Updated",  # Unchanged
  "email": "[email protected]",
  "updated_at": "2024-11-01T11:30:00Z"
}
```

**Principles:**
- Only updates provided fields
- Can be idempotent with proper design
- Use JSON Patch (RFC 6902) for complex updates
- Return updated resource

### DELETE - Remove Resource
```http
DELETE /users/123

Response: 204 No Content
```

**Principles:**
- Idempotent (deleting twice has same effect)
- 204 No Content on success (no body)
- 404 if resource doesn't exist (debatable: some prefer 204)
- Consider soft deletes for audit trails

### Avoid Action-Based URLs
```
❌ BAD:
POST /api/users/123/activate
POST /api/orders/456/cancel
POST /api/sendEmail

✅ GOOD:
PATCH /api/users/123 {"status": "active"}
PATCH /api/orders/456 {"status": "cancelled"}
POST /api/emails
```

---

## Status Codes

### Success Codes (2xx)

| Code | Meaning | When to Use |
|------|---------|-------------|
| 200 OK | Success | GET, PATCH, PUT with response body |
| 201 Created | Resource created | POST creating new resource |
| 202 Accepted | Async processing | Long-running operations queued |
| 204 No Content | Success, no body | DELETE, PUT/PATCH with no response |

### Client Error Codes (4xx)

| Code | Meaning | When to Use |
|------|---------|-------------|
| 400 Bad Request | Malformed request | Invalid JSON, wrong content type |
| 401 Unauthorized | Not authenticated | Missing/invalid auth token |
| 403 Forbidden | Not authorized | Valid auth but insufficient permissions |
| 404 Not Found | Resource missing | Resource ID doesn't exist |
| 405 Method Not Allowed | Wrong HTTP method | POST to read-only resource |
| 409 Conflict | Conflict with current state | Duplicate resource, version conflict |
| 422 Unprocessable Entity | Validation failed | Valid JSON but business rule violation |
| 429 Too Many Requests | Rate limited | Client exceeded rate limit |

### Server Error Codes (5xx)

| Code | Meaning | When to Use |
|------|---------|-------------|
| 500 Internal Server Error | Generic error | Unexpected server failure |
| 502 Bad Gateway | Upstream failure | Proxy/gateway error |
| 503 Service Unavailable | Temporarily unavailable | Maintenance, overload |
| 504 Gateway Timeout | Upstream timeout | Upstream service too slow |

### Status Code Examples

**Validation Error:**
```http
POST /users
{
  "name": "",
  "email": "invalid-email"
}

Response: 422 Unprocessable Entity
{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Request validation failed",
    "details": [
      {"field": "name", "issue": "must not be empty"},
      {"field": "email", "issue": "must be valid email format"}
    ]
  }
}
```

**Resource Conflict:**
```http
POST /users
{
  "email": "[email protected]"
}

Response: 409 Conflict
{
  "error": {
    "code": "EMAIL_ALREADY_EXISTS",
    "message": "A user with this email already exists",
    "conflicting_resource": "/users/123"
  }
}
```

**Rate Limit Exceeded:**
```http
GET /api/users

Response: 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1635724860

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded. Try again in 60 seconds."
  }
}
```

---

## Request & Response Design

### Content Negotiation
```http
Request:
Accept: application/json
Accept-Language: en-US
Accept-Encoding: gzip, br

Response:
Content-Type: application/json; charset=utf-8
Content-Language: en-US
Content-Encoding: br
```

### JSON Response Structure

**Single Resource:**
```json
{
  "id": 123,
  "name": "Alice",
  "email": "[email protected]",
  "created_at": "2024-01-01T00:00:00Z",
  "updated_at": "2024-11-01T10:00:00Z"
}
```

**Collection:**
```json
{
  "data": [
    {"id": 1, "name": "Alice"},
    {"id": 2, "name": "Bob"}
  ],
  "pagination": {
    "total": 1523,
    "limit": 50,
    "offset": 100,
    "next": "/users?offset=150&limit=50",
    "previous": "/users?offset=50&limit=50"
  }
}
```

**Error Response:**
```json
{
  "error": {
    "code": "RESOURCE_NOT_FOUND",
    "message": "User with ID 999 not found",
    "request_id": "req_abc123def456",
    "timestamp": "2024-11-01T10:30:00Z",
    "path": "/api/users/999",
    "documentation_url": "https://api.example.com/docs/errors"
  }
}
```

### Field Naming Conventions

**Use snake_case for JSON (recommended):**
```json
{
  "user_id": 123,
  "first_name": "Alice",
  "created_at": "2024-01-01T00:00:00Z"
}
```

**OR camelCase (also acceptable, but be consistent):**
```json
{
  "userId": 123,
  "firstName": "Alice",
  "createdAt": "2024-01-01T00:00:00Z"
}
```

**Never mix conventions!**

### Timestamps
Always use ISO 8601 format with UTC timezone:
```json
{
  "created_at": "2024-11-01T10:30:00Z",
  "updated_at": "2024-11-01T15:45:30.123Z"
}
```

### Nulls vs Omission
Decide on a convention:

**Option 1: Include null values**
```json
{
  "id": 123,
  "name": "Alice",
  "middle_name": null,
  "phone": null
}
```

**Option 2: Omit null values (saves bandwidth)**
```json
{
  "id": 123,
  "name": "Alice"
}
```

Choose one and be consistent. Document the behavior.

---

## URL Design

### API Base URL
```
✅ GOOD:
https://api.example.com/v1/users
https://example.com/api/v1/users

❌ BAD:
https://example.com/api.php?type=users
https://example.com/services/userservice
```

### Path Parameters vs Query Parameters

**Path Parameters:** For resource identification
```
GET /users/123
GET /orders/456/items/789
```

**Query Parameters:** For filtering, sorting, pagination
```
GET /users?role=admin&status=active
GET /products?category=electronics&sort=price&order=asc
GET /orders?page=2&limit=50
```

### Complex Queries

**Filtering:**
```
GET /products?category=electronics&min_price=100&max_price=500
GET /users?created_after=2024-01-01&status=active
```

**Sorting:**
```
GET /products?sort=price          # Ascending
GET /products?sort=-price         # Descending
GET /products?sort=price,-rating  # Multiple fields
```

**Field Selection:**
```
GET /users?fields=id,name,email
GET /users?include=profile,preferences
GET /users?exclude=sensitive_data
```

**Full-Text Search:**
```
GET /products/search?q=laptop&category=electronics
GET /users/search?q=alice
```

---

## Versioning

### URL Path Versioning (Recommended)
```
https://api.example.com/v1/users
https://api.example.com/v2/users
```

**Pros:**
- Explicit and visible
- Easy to route and cache
- Browser-testable

**Cons:**
- URLs change between versions

### Header Versioning
```http
GET /api/users
API-Version: 2024-11-01

or

GET /api/users
Accept: application/vnd.example.v2+json
```

**Pros:**
- Clean URLs
- RESTful (content negotiation)

**Cons:**
- Less visible
- Harder to test manually

### Best Practices
- Version from day one
- Use major versions only (`v1`, `v2`, not `v1.2.3`)
- Support N and N-1 versions simultaneously
- Deprecate old versions with clear timeline
- Document breaking vs non-breaking changes

---

## Filtering, Sorting, Pagination

### Filtering

**Simple Filters:**
```
GET /products?category=electronics
GET /users?role=admin&status=active
GET /orders?customer_id=123
```

**Range Filters:**
```
GET /products?min_price=100&max_price=500
GET /users?created_after=2024-01-01&created_before=2024-12-31
GET /orders?total_gte=1000
```

**Advanced Filters (URL-encoded):**
```
GET /products?filter[category]=electronics&filter[price][gte]=100
```

### Sorting

**Single Field:**
```
GET /products?sort=price
GET /products?sort=-created_at  # Descending
```

**Multiple Fields:**
```
GET /products?sort=category,price
GET /products?sort=category,-price
```

### Pagination

**Offset-Based:**
```
GET /users?offset=100&limit=50

Response:
{
  "data": [...],
  "pagination": {
    "offset": 100,
    "limit": 50,
    "total": 1523,
    "next": "/users?offset=150&limit=50",
    "previous": "/users?offset=50&limit=50"
  }
}
```

**Cursor-Based (Recommended for large datasets):**
```
GET /users?cursor=eyJpZCI6MTIzfQ&limit=50

Response:
{
  "data": [...],
  "pagination": {
    "next_cursor": "eyJpZCI6MTczfQ",
    "has_more": true,
    "limit": 50
  }
}
```

**Page-Based:**
```
GET /users?page=3&per_page=50

Response:
{
  "data": [...],
  "pagination": {
    "page": 3,
    "per_page": 50,
    "total_pages": 31,
    "total_items": 1523
  }
}
```

---

## HATEOAS

Hypermedia as the Engine of Application State: Include links to related resources.

### Basic Example
```json
{
  "id": 123,
  "name": "Alice",
  "email": "[email protected]",
  "_links": {
    "self": {"href": "/users/123"},
    "orders": {"href": "/users/123/orders"},
    "profile": {"href": "/users/123/profile"}
  }
}
```

### HAL Format
```json
{
  "id": 123,
  "name": "Alice",
  "_links": {
    "self": {"href": "/users/123"},
    "orders": {"href": "/users/123/orders"}
  },
  "_embedded": {
    "orders": [
      {
        "id": 456,
        "total": 99.99,
        "_links": {
          "self": {"href": "/orders/456"}
        }
      }
    ]
  }
}
```

### Benefits
- Self-documenting
- Clients can navigate API without hardcoding URLs
- Evolvable (URLs can change)

---

## Caching

### Cache-Control Headers
```http
# Immutable static content
Cache-Control: public, max-age=31536000, immutable

# Frequently accessed data
Cache-Control: public, max-age=300, stale-while-revalidate=60

# Private user data
Cache-Control: private, max-age=0, must-revalidate

# Never cache
Cache-Control: no-store
```

### ETags
```http
GET /users/123
Response:
ETag: "33a64df551425fcc55e4d42a148795d9"
Cache-Control: max-age=60

# Conditional request
GET /users/123
If-None-Match: "33a64df551425fcc55e4d42a148795d9"

# Not modified
Response: 304 Not Modified
```

---

## Security Headers

```http
# CORS
Access-Control-Allow-Origin: https://example.com
Access-Control-Allow-Methods: GET, POST, PUT, DELETE
Access-Control-Allow-Headers: Authorization, Content-Type
Access-Control-Max-Age: 86400

# Security
Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
Content-Security-Policy: default-src 'self'

# Rate Limiting
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1635724800
```

---

## Complete Example

```http
POST /api/v1/users HTTP/1.1
Host: api.example.com
Content-Type: application/json
Accept: application/json
Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
Idempotency-Key: unique-client-key-123

{
  "email": "[email protected]",
  "name": "Alice Smith",
  "role": "user"
}

HTTP/1.1 201 Created
Content-Type: application/json; charset=utf-8
Location: /api/v1/users/123
ETag: "abc123def456"
X-Request-ID: req_xyz789
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 99

{
  "id": 123,
  "email": "[email protected]",
  "name": "Alice Smith",
  "role": "user",
  "status": "active",
  "created_at": "2024-11-01T10:30:00Z",
  "updated_at": "2024-11-01T10:30:00Z",
  "_links": {
    "self": {"href": "/api/v1/users/123"},
    "orders": {"href": "/api/v1/users/123/orders"},
    "profile": {"href": "/api/v1/users/123/profile"}
  }
}
```

---

## Summary

**Key Takeaways:**
1. Use plural nouns for resources
2. HTTP methods map to CRUD operations
3. Status codes communicate outcome precisely
4. Consistent JSON structure across API
5. Version from day one (URL path recommended)
6. Paginate all collections
7. Cache appropriately with ETags and Cache-Control
8. Include HATEOAS links for discoverability
9. Secure with authentication, rate limiting, CORS
10. Document everything with OpenAPI

Following these practices creates APIs that are intuitive, scalable, and maintainable.
```

### reference/graphql_guidelines.md

```markdown
# GraphQL API Design Guidelines

Comprehensive guide for designing production-ready GraphQL APIs.

## Table of Contents
1. [Schema Design](#schema-design)
2. [Query Design](#query-design)
3. [Mutation Design](#mutation-design)
4. [Security](#security)
5. [Performance](#performance)
6. [Error Handling](#error-handling)

---

## Schema Design

### Type Naming Conventions

```graphql
# ✅ GOOD: Clear, descriptive names
type User {
  id: ID!
  email: String!
  firstName: String
  lastName: String
  createdAt: DateTime!
}

type Order {
  id: ID!
  orderNumber: String!
  total: Money!
  status: OrderStatus!
}

enum OrderStatus {
  PENDING
  PROCESSING
  SHIPPED
  DELIVERED
  CANCELLED
}

# ❌ BAD: Abbreviations, unclear names
type Usr {
  id: ID!
  em: String!
  fn: String
}
```

### Nullable vs Non-Nullable Fields

**Rule of Thumb:**
- Fields nullable by default
- Non-null only when guaranteed to exist

```graphql
type User {
  # Non-null: Always exists
  id: ID!
  email: String!
  createdAt: DateTime!

  # Nullable: May not exist
  phoneNumber: String
  bio: String
  lastLoginAt: DateTime

  # Non-null list of non-null items
  roles: [Role!]!  # List always exists, items never null

  # Nullable list of nullable items
  preferences: [Preference]  # List and items can be null
}
```

**Why default to nullable?**
- Schema evolution: Adding non-null fields breaks clients
- Partial failures: Can return partial data with errors
- Flexibility: Easier to relax (nullable → non-null) than tighten

### Connections Pattern (Pagination)

**Use Relay Connection Specification:**

```graphql
type Query {
  users(
    first: Int
    after: String
    last: Int
    before: String
    filter: UserFilter
  ): UserConnection!
}

type UserConnection {
  edges: [UserEdge!]!
  pageInfo: PageInfo!
  totalCount: Int
}

type UserEdge {
  node: User!
  cursor: String!
}

type PageInfo {
  hasNextPage: Boolean!
  hasPreviousPage: Boolean!
  startCursor: String
  endCursor: String
}

input UserFilter {
  role: Role
  status: UserStatus
  createdAfter: DateTime
}
```

**Why Connections?**
- Cursor-based pagination (stable, performant)
- Supports bidirectional pagination
- Standardized across industry
- Works well with infinite scroll

### Input Types

```graphql
# Separate input types from output types
input CreateUserInput {
  email: String!
  firstName: String!
  lastName: String!
  password: String!
}

input UpdateUserInput {
  firstName: String
  lastName: String
  bio: String
  # email NOT included (can't be changed)
}

type CreateUserPayload {
  user: User
  errors: [Error!]
}
```

**Benefits:**
- Clear separation of concerns
- Different validation rules for create vs update
- Can evolve independently

---

## Query Design

### Field Arguments

```graphql
type Query {
  # Single resource by ID
  user(id: ID!): User

  # Collection with filtering
  users(
    filter: UserFilter
    sort: UserSort
    first: Int = 20
    after: String
  ): UserConnection!

  # Search
  searchUsers(
    query: String!
    limit: Int = 20
  ): [User!]!
}

input UserFilter {
  role: Role
  status: UserStatus
  createdAfter: DateTime
  createdBefore: DateTime
}

enum UserSort {
  CREATED_AT_ASC
  CREATED_AT_DESC
  NAME_ASC
  NAME_DESC
}
```

### Query Depth Limiting

**Problem: Deeply nested queries**
```graphql
query DangerouslyDeep {
  user {
    friends {
      friends {
        friends {
          friends {
            # ... 100 levels deep
          }
        }
      }
    }
  }
}
```

**Solution: Enforce depth limit**
```javascript
import depthLimit from 'graphql-depth-limit'

const server = new ApolloServer({
  typeDefs,
  resolvers,
  validationRules: [depthLimit(7)]  # Max 7 levels deep
})
```

### Query Complexity Analysis

**Assign costs to fields:**
```javascript
const typeCostMap = {
  User: {
    complexity: 1,
    fields: {
      orders: { multipliers: ['first'], complexity: 2 }
    }
  }
}

// Query cost: 1 + (50 * 2) = 101
query {
  user {              # Cost: 1
    orders(first: 50) {  # Cost: 50 * 2 = 100
      id
    }
  }
}

// Reject if cost > budget (e.g., 1000)
```

---

## Mutation Design

### Mutation Naming

```graphql
# ✅ GOOD: Verb + Noun pattern
type Mutation {
  createUser(input: CreateUserInput!): CreateUserPayload!
  updateUser(id: ID!, input: UpdateUserInput!): UpdateUserPayload!
  deleteUser(id: ID!): DeleteUserPayload!

  publishPost(id: ID!): PublishPostPayload!
  sendEmail(input: SendEmailInput!): SendEmailPayload!
}

# ❌ BAD: Inconsistent naming
type Mutation {
  newUser(input: CreateUserInput!): User
  userUpdate(id: ID!, data: UpdateUserInput!): User
  removeUser(id: ID!): Boolean
}
```

### Mutation Payloads

**Always return payload type:**
```graphql
type CreateUserPayload {
  # The created resource
  user: User

  # Validation/business errors
  errors: [Error!]

  # Success indicator
  success: Boolean!

  # Client mutation ID (for optimistic updates)
  clientMutationId: String
}

type Error {
  message: String!
  field: String
  code: String!
}
```

**Why payload types?**
- Can return errors without throwing
- Extensible (add fields without breaking changes)
- Support client mutation IDs

### Mutation Example

```graphql
mutation CreateUser($input: CreateUserInput!) {
  createUser(input: $input) {
    user {
      id
      email
      firstName
    }
    errors {
      field
      message
      code
    }
    success
  }
}

# Variables
{
  "input": {
    "email": "[email protected]",
    "firstName": "Alice",
    "lastName": "Smith",
    "password": "SecurePass123!"
  }
}

# Response (validation error)
{
  "data": {
    "createUser": {
      "user": null,
      "errors": [
        {
          "field": "email",
          "message": "Email already exists",
          "code": "EMAIL_DUPLICATE"
        }
      ],
      "success": false
    }
  }
}
```

---

## Security

### Authentication

```javascript
// Context with authenticated user
const server = new ApolloServer({
  context: async ({ req }) => {
    const token = req.headers.authorization?.replace('Bearer ', '')
    if (!token) return { user: null }

    try {
      const user = await verifyToken(token)
      return { user }
    } catch (error) {
      throw new AuthenticationError('Invalid token')
    }
  }
})
```

### Field-Level Authorization

```javascript
// Protect fields based on user permissions
const resolvers = {
  User: {
    // Public field
    name: (parent) => parent.name,

    // Authenticated only
    email: (parent, args, context) => {
      if (!context.user) {
        throw new ForbiddenError('Authentication required')
      }
      return parent.email
    },

    // Owner or admin only
    ssn: (parent, args, context) => {
      if (!context.user) {
        throw new ForbiddenError('Authentication required')
      }
      if (context.user.id !== parent.id && !context.user.isAdmin) {
        throw new ForbiddenError('Access denied')
      }
      return parent.ssn
    }
  }
}
```

### Query Authorization

```javascript
const resolvers = {
  Query: {
    user: async (parent, { id }, context) => {
      if (!context.user) {
        throw new AuthenticationError('Login required')
      }

      const user = await User.findById(id)

      // Check ownership or admin
      if (context.user.id !== id && !context.user.isAdmin) {
        throw new ForbiddenError('Access denied')
      }

      return user
    }
  }
}
```

### Preventing Information Disclosure

```graphql
# ❌ BAD: Reveals whether email exists
mutation Login($email: String!, $password: String!) {
  login(email: $email, password: $password) {
    token
    errors {
      message  # "Email not found" or "Incorrect password"
    }
  }
}

# ✅ GOOD: Generic error message
mutation Login($email: String!, $password: String!) {
  login(email: $email, password: $password) {
    token
    errors {
      message  # "Invalid email or password"
    }
  }
}
```

---

## Performance

### N+1 Query Problem

**Problem:**
```javascript
// ❌ BAD: N+1 queries
const resolvers = {
  Query: {
    posts: () => Post.findAll()
  },
  Post: {
    author: (post) => User.findById(post.authorId)  # Query for each post!
  }
}

// Fetches 100 posts: 1 query
// Then fetches author for each post: 100 queries
// Total: 101 queries
```

**Solution: DataLoader**
```javascript
// ✅ GOOD: Batched loading
import DataLoader from 'dataloader'

const createLoaders = () => ({
  userLoader: new DataLoader(async (userIds) => {
    const users = await User.findAll({ where: { id: userIds } })
    return userIds.map(id => users.find(user => user.id === id))
  })
})

const resolvers = {
  Query: {
    posts: () => Post.findAll()
  },
  Post: {
    author: (post, args, context) => {
      return context.loaders.userLoader.load(post.authorId)
    }
  }
}

// Batches all user IDs into single query
// Total: 2 queries (posts + users)
```

### Resolver Optimization

```javascript
// Use field selection to optimize database queries
const resolvers = {
  Query: {
    user: async (parent, { id }, context, info) => {
      // Parse requested fields from GraphQL query
      const fields = getFieldsFromInfo(info)

      // Only fetch requested fields from database
      const query = User.findById(id)

      if (fields.includes('orders')) {
        query.include('orders')
      }
      if (fields.includes('profile')) {
        query.include('profile')
      }

      return query
    }
  }
}
```

### Caching

```javascript
import { InMemoryLRUCache } from '@apollo/utils.keyvaluecache'

const server = new ApolloServer({
  cache: new InMemoryLRUCache({
    maxSize: 100_000_000, // 100 MB
    ttl: 300 // 5 minutes
  }),
  plugins: [
    responseCachePlugin()
  ]
})

// Cache specific queries
const resolvers = {
  Query: {
    publicUsers: async (parent, args, context, info) => {
      // Cache for 5 minutes
      info.cacheControl.setCacheHint({ maxAge: 300, scope: 'PUBLIC' })
      return User.findAll({ where: { isPublic: true } })
    }
  }
}
```

### Persisted Queries

```javascript
// Client sends query hash instead of full query
// Server looks up full query from hash
// Benefits:
// - Reduced bandwidth
// - Protection against malicious queries
// - Query whitelisting

const server = new ApolloServer({
  persistedQueries: {
    cache: new InMemoryLRUCache()
  }
})
```

---

## Error Handling

### Error Types

```javascript
import { ApolloError, AuthenticationError, ForbiddenError, UserInputError } from 'apollo-server'

const resolvers = {
  Mutation: {
    createUser: async (parent, { input }) => {
      // Validation error
      if (!isValidEmail(input.email)) {
        throw new UserInputError('Invalid email format', {
          invalidArgs: ['email']
        })
      }

      // Authentication error
      if (!context.user) {
        throw new AuthenticationError('Login required')
      }

      // Authorization error
      if (!context.user.isAdmin) {
        throw new ForbiddenError('Admin access required')
      }

      // Business logic error
      const existing = await User.findByEmail(input.email)
      if (existing) {
        throw new ApolloError('Email already exists', 'EMAIL_DUPLICATE')
      }

      // Unexpected error
      try {
        return await User.create(input)
      } catch (error) {
        throw new ApolloError('Failed to create user', 'INTERNAL_ERROR')
      }
    }
  }
}
```

### Error Response Format

```json
{
  "errors": [
    {
      "message": "Email already exists",
      "extensions": {
        "code": "EMAIL_DUPLICATE",
        "field": "email",
        "timestamp": "2024-11-01T10:30:00Z"
      },
      "path": ["createUser"],
      "locations": [{ "line": 2, "column": 3 }]
    }
  ],
  "data": {
    "createUser": null
  }
}
```

### Partial Success

```graphql
mutation BulkCreateUsers($inputs: [CreateUserInput!]!) {
  bulkCreateUsers(inputs: $inputs) {
    successful {
      user {
        id
        email
      }
      index
    }
    failed {
      index
      errors {
        message
        code
      }
    }
  }
}
```

---

## Schema Evolution

### Deprecation

```graphql
type User {
  id: ID!

  # Deprecated field
  name: String @deprecated(reason: "Use firstName and lastName instead")

  # New fields
  firstName: String!
  lastName: String!
}
```

### Adding Fields (Non-Breaking)

```graphql
# Before
type User {
  id: ID!
  email: String!
}

# After (non-breaking change)
type User {
  id: ID!
  email: String!
  phoneNumber: String  # New optional field
}
```

### Breaking Changes (Require New Version)

```graphql
# ❌ BREAKING: Removing field
type User {
  id: ID!
  # email: String!  <- Removed
}

# ❌ BREAKING: Changing field type
type User {
  id: Int!  # Was ID!, now Int!
}

# ❌ BREAKING: Making field non-null
type User {
  phoneNumber: String!  # Was nullable
}

# ✅ SOLUTION: Deprecate old, add new
type User {
  id: ID!
  email: String! @deprecated(reason: "Use primaryEmail")
  primaryEmail: String!
}
```

---

## Best Practices Summary

**Schema Design:**
- [ ] Use clear, descriptive type and field names
- [ ] Default to nullable fields
- [ ] Use Connections pattern for pagination
- [ ] Separate input types from output types

**Security:**
- [ ] Implement query depth limiting (max 5-7 levels)
- [ ] Implement query complexity analysis
- [ ] Field-level authorization
- [ ] Rate limiting on mutations
- [ ] Validate all inputs

**Performance:**
- [ ] Use DataLoader for batched loading
- [ ] Optimize database queries based on field selection
- [ ] Implement caching for expensive queries
- [ ] Consider persisted queries for production

**Error Handling:**
- [ ] Use appropriate error types
- [ ] Include error codes for programmatic handling
- [ ] Don't expose sensitive information in errors
- [ ] Support partial success in bulk operations

**Evolution:**
- [ ] Deprecate fields before removing
- [ ] Avoid breaking changes when possible
- [ ] Version API if breaking changes necessary
- [ ] Maintain changelog

---

## Tools

**Schema Design:**
- GraphQL Inspector (schema diff, breaking change detection)
- GraphQL Voyager (schema visualization)

**Security:**
- graphql-armor (security middleware)
- graphql-depth-limit
- graphql-query-complexity

**Performance:**
- DataLoader (batching and caching)
- Apollo Server (caching, tracing)
- GraphQL Shield (authorization layer)

**Testing:**
- GraphQL Playground
- Altair GraphQL Client
- Apollo Studio

---

**Remember:** GraphQL gives clients great flexibility, but with that comes responsibility to secure, optimize, and maintain your API properly.
```

### reference/security_checklist.md

```markdown
# API Security Checklist

Comprehensive security checklist based on OWASP API Security Top 10 and production best practices.

## Table of Contents
1. [OWASP API Security Top 10](#owasp-api-security-top-10)
2. [Authentication](#authentication)
3. [Authorization](#authorization)
4. [Input Validation](#input-validation)
5. [Data Protection](#data-protection)
6. [Rate Limiting](#rate-limiting)
7. [Monitoring & Logging](#monitoring--logging)

---

## OWASP API Security Top 10

### API1:2023 - Broken Object Level Authorization (BOLA)

**Vulnerability:**
```python
# ❌ VULNERABLE: No authorization check
@app.get("/api/users/{user_id}/orders")
def get_user_orders(user_id: int):
    return db.query(Order).filter(Order.user_id == user_id).all()

# Attacker can access any user's orders by changing user_id
```

**Fix:**
```python
# ✅ SECURE: Verify ownership
@app.get("/api/users/{user_id}/orders")
def get_user_orders(user_id: int, current_user: User = Depends(get_current_user)):
    if current_user.id != user_id and not current_user.is_admin:
        raise HTTPException(status_code=403, detail="Access denied")
    return db.query(Order).filter(Order.user_id == user_id).all()
```

**Checklist:**
- [ ] Every resource access checks ownership or permissions
- [ ] Authorization happens on the server, never client-side
- [ ] Default deny (require explicit permission grants)
- [ ] Test with different users to ensure isolation

---

### API2:2023 - Broken Authentication

**Common Issues:**
- Weak password requirements
- No rate limiting on auth endpoints
- Tokens without expiration
- Predictable tokens (sequential, timestamp-based)
- Credentials in URLs

**Secure Authentication:**
```python
# Password requirements
MIN_PASSWORD_LENGTH = 12
REQUIRE_UPPERCASE = True
REQUIRE_LOWERCASE = True
REQUIRE_DIGITS = True
REQUIRE_SPECIAL_CHARS = True

# JWT configuration
JWT_EXPIRATION = 3600  # 1 hour
REFRESH_TOKEN_EXPIRATION = 2592000  # 30 days

# Token generation
import secrets
api_key = secrets.token_urlsafe(32)  # Cryptographically secure
```

**Checklist:**
- [ ] Passwords hashed with bcrypt/argon2 (never plaintext)
- [ ] Minimum password length (12+ characters)
- [ ] Rate limiting on login/signup endpoints
- [ ] JWT tokens have expiration (`exp` claim)
- [ ] Refresh token rotation implemented
- [ ] MFA supported for sensitive operations
- [ ] No credentials in URLs (use headers)
- [ ] Session timeout after inactivity

---

### API3:2023 - Broken Object Property Level Authorization

**Vulnerability: Mass Assignment**
```python
# ❌ VULNERABLE: User can set any field
@app.patch("/api/users/{id}")
def update_user(id: int, user_data: dict):
    user = db.query(User).get(id)
    for key, value in user_data.items():
        setattr(user, key, value)  # Dangerous!
    db.commit()

# Attacker sends: {"is_admin": true, "balance": 1000000}
```

**Fix:**
```python
# ✅ SECURE: Whitelist allowed fields
class UserUpdateSchema(BaseModel):
    name: Optional[str]
    email: Optional[EmailStr]
    # is_admin NOT included (can't be set via API)

@app.patch("/api/users/{id}")
def update_user(id: int, user_data: UserUpdateSchema):
    user = db.query(User).get(id)
    user.name = user_data.name
    user.email = user_data.email
    db.commit()
```

**Checklist:**
- [ ] Use DTOs/schemas to whitelist allowed fields
- [ ] Separate read vs write schemas
- [ ] Admin-only fields not settable by normal users
- [ ] Sensitive fields excluded from responses (passwords, tokens)
- [ ] Test: Try sending extra fields (should be ignored)

---

### API4:2023 - Unrestricted Resource Consumption

**Vulnerability: No Limits**
```python
# ❌ VULNERABLE: Unbounded query
@app.get("/api/users")
def get_users(limit: int = 100):  # User can request 1 million
    return db.query(User).limit(limit).all()
```

**Fix:**
```python
# ✅ SECURE: Enforce maximum
MAX_PAGE_SIZE = 100

@app.get("/api/users")
def get_users(limit: int = 20):
    if limit > MAX_PAGE_SIZE:
        raise HTTPException(400, f"Maximum limit is {MAX_PAGE_SIZE}")
    return db.query(User).limit(limit).all()
```

**Checklist:**
- [ ] Maximum page size enforced (50-100 typical)
- [ ] Request timeout configured (5-30s)
- [ ] Request body size limit (1-10MB)
- [ ] File upload size limit (10-100MB)
- [ ] Rate limiting per IP/user
- [ ] Expensive operations require authentication
- [ ] No unbounded collections

---

### API5:2023 - Broken Function Level Authorization

**Vulnerability: Missing Role Check**
```python
# ❌ VULNERABLE: Any authenticated user can delete
@app.delete("/api/users/{id}")
def delete_user(id: int, current_user: User = Depends(get_current_user)):
    db.query(User).filter(User.id == id).delete()
    db.commit()
```

**Fix:**
```python
# ✅ SECURE: Check admin role
@app.delete("/api/users/{id}")
@require_role("admin")
def delete_user(id: int, current_user: User = Depends(get_current_user)):
    if not current_user.is_admin:
        raise HTTPException(403, "Admin access required")
    db.query(User).filter(User.id == id).delete()
    db.commit()
```

**Checklist:**
- [ ] Admin operations require admin role check
- [ ] Role/permission checks in every sensitive endpoint
- [ ] Least privilege principle (minimum permissions needed)
- [ ] Test with different user roles
- [ ] Authorization middleware at application level

---

### API6:2023 - Unrestricted Access to Sensitive Business Flows

**Vulnerability: No Anti-Automation**
```python
# ❌ VULNERABLE: Can be automated
@app.post("/api/tickets/purchase")
def purchase_ticket(ticket_id: int):
    # Bots can buy all tickets instantly
    return process_purchase(ticket_id)
```

**Fix:**
```python
# ✅ SECURE: Rate limiting + CAPTCHA
from slowapi import Limiter
limiter = Limiter(key_func=get_remote_address)

@app.post("/api/tickets/purchase")
@limiter.limit("3/minute")
def purchase_ticket(ticket_id: int, captcha_token: str):
    if not verify_captcha(captcha_token):
        raise HTTPException(400, "Invalid CAPTCHA")
    return process_purchase(ticket_id)
```

**Checklist:**
- [ ] CAPTCHA on sensitive operations (signup, purchase)
- [ ] Rate limiting on critical endpoints
- [ ] Anomaly detection for unusual patterns
- [ ] Device fingerprinting for fraud detection
- [ ] Require email/phone verification
- [ ] Implement waiting rooms for high-demand events

---

### API7:2023 - Server Side Request Forgery (SSRF)

**Vulnerability: Unvalidated URL**
```python
# ❌ VULNERABLE: Can access internal services
@app.post("/api/fetch")
def fetch_url(url: str):
    response = requests.get(url)  # Dangerous!
    return response.text

# Attacker sends: "http://localhost:6379/admin"
# or "http://169.254.169.254/latest/meta-data/" (AWS metadata)
```

**Fix:**
```python
# ✅ SECURE: Whitelist allowed domains
ALLOWED_DOMAINS = ["example.com", "api.partner.com"]

@app.post("/api/fetch")
def fetch_url(url: str):
    parsed = urlparse(url)
    if parsed.hostname not in ALLOWED_DOMAINS:
        raise HTTPException(400, "Domain not allowed")
    if parsed.hostname in ["localhost", "127.0.0.1", "0.0.0.0"]:
        raise HTTPException(400, "Cannot access local resources")
    # Also block private IP ranges (10.x, 192.168.x, 169.254.x)
    response = requests.get(url, timeout=5)
    return response.text
```

**Checklist:**
- [ ] Whitelist allowed domains/protocols
- [ ] Block localhost and private IP ranges
- [ ] Disable redirects or validate redirect targets
- [ ] Use network segmentation (API can't access internal services)
- [ ] Timeout on external requests

---

### API8:2023 - Security Misconfiguration

**Common Misconfigurations:**
```yaml
# ❌ BAD: Development settings in production
DEBUG = True
CORS_ALLOW_ALL_ORIGINS = True
SSL_VERIFY = False
SECRET_KEY = "default-secret-key"

# ✅ GOOD: Production settings
DEBUG = False
CORS_ALLOWED_ORIGINS = ["https://example.com"]
SSL_VERIFY = True
SECRET_KEY = os.environ["SECRET_KEY"]  # From environment
```

**Checklist:**
- [ ] Debug mode disabled in production
- [ ] CORS properly configured (not `*`)
- [ ] HTTPS enforced (HSTS header)
- [ ] Security headers configured
- [ ] Default credentials changed
- [ ] Error messages don't expose internals
- [ ] Stack traces not sent to clients
- [ ] Unnecessary HTTP methods disabled

**Security Headers:**
```python
response.headers["Strict-Transport-Security"] = "max-age=31536000; includeSubDomains"
response.headers["X-Content-Type-Options"] = "nosniff"
response.headers["X-Frame-Options"] = "DENY"
response.headers["Content-Security-Policy"] = "default-src 'self'"
response.headers["X-XSS-Protection"] = "1; mode=block"
```

---

### API9:2023 - Improper Inventory Management

**Checklist:**
- [ ] API documentation complete and up-to-date
- [ ] All endpoints documented (including deprecated)
- [ ] API versioning strategy in place
- [ ] Deprecated endpoints have sunset dates
- [ ] Non-production environments secured
- [ ] Test/staging APIs not accessible publicly
- [ ] API gateway/proxy for centralized control
- [ ] Inventory of all API endpoints maintained

**Tools:**
- OpenAPI/Swagger spec generation
- API gateway (Kong, Apigee, AWS API Gateway)
- Security scanning (OWASP ZAP, Burp Suite)

---

### API10:2023 - Unsafe Consumption of APIs

**Vulnerability: Trusting External Data**
```python
# ❌ VULNERABLE: No validation of external API data
@app.get("/api/weather")
def get_weather(city: str):
    external_data = requests.get(f"https://weather-api.com/data?city={city}").json()
    # Directly using external data without validation
    return external_data
```

**Fix:**
```python
# ✅ SECURE: Validate and sanitize
from pydantic import BaseModel, validator

class WeatherResponse(BaseModel):
    temperature: float
    humidity: int
    conditions: str

    @validator('temperature')
    def temp_must_be_reasonable(cls, v):
        if not -100 <= v <= 100:
            raise ValueError('Temperature out of range')
        return v

@app.get("/api/weather")
def get_weather(city: str):
    external_data = requests.get(
        f"https://weather-api.com/data",
        params={"city": city},
        timeout=5
    ).json()
    # Validate before returning
    validated = WeatherResponse(**external_data)
    return validated
```

**Checklist:**
- [ ] Validate all external API responses
- [ ] Timeouts on external requests
- [ ] Certificate verification enabled
- [ ] Sanitize data before storing/displaying
- [ ] Rate limiting on external API calls
- [ ] Circuit breaker for unreliable services
- [ ] Don't trust external redirects

---

## Authentication

### Password Security

**Hashing:**
```python
import bcrypt

# Hashing password
password_hash = bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12))

# Verifying password
is_valid = bcrypt.checkpw(password.encode(), stored_hash)
```

**Requirements:**
- [ ] Minimum 12 characters
- [ ] Complexity requirements (upper, lower, digit, special)
- [ ] No common passwords (use blocklist)
- [ ] No user info in password (name, email)
- [ ] Password history (can't reuse last 5)

### Token Security

**JWT Best Practices:**
```python
import jwt
from datetime import datetime, timedelta

# Generate JWT
payload = {
    "user_id": 123,
    "exp": datetime.utcnow() + timedelta(hours=1),
    "iat": datetime.utcnow(),
    "jti": secrets.token_urlsafe(16)  # Unique token ID
}
token = jwt.encode(payload, SECRET_KEY, algorithm="HS256")

# Verify JWT
try:
    decoded = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
except jwt.ExpiredSignatureError:
    raise HTTPException(401, "Token expired")
except jwt.InvalidTokenError:
    raise HTTPException(401, "Invalid token")
```

**Checklist:**
- [ ] Short expiration (1 hour for access tokens)
- [ ] Refresh token rotation
- [ ] Token revocation mechanism
- [ ] Unique token ID (`jti` claim) for blacklisting
- [ ] Signed with strong algorithm (HS256, RS256)
- [ ] Secret key stored securely (not in code)

---

## Authorization

### Role-Based Access Control (RBAC)

```python
from enum import Enum

class Role(Enum):
    USER = "user"
    MODERATOR = "moderator"
    ADMIN = "admin"

def require_role(required_role: Role):
    def decorator(func):
        def wrapper(*args, current_user: User, **kwargs):
            if current_user.role.value < required_role.value:
                raise HTTPException(403, "Insufficient permissions")
            return func(*args, current_user=current_user, **kwargs)
        return wrapper
    return decorator

@app.delete("/api/users/{id}")
@require_role(Role.ADMIN)
def delete_user(id: int, current_user: User):
    # Only admins can delete users
    pass
```

### Attribute-Based Access Control (ABAC)

```python
def can_edit_post(user: User, post: Post) -> bool:
    # Post author can edit
    if post.author_id == user.id:
        return True
    # Moderators can edit any post
    if user.role == Role.MODERATOR:
        return True
    # Admins can edit everything
    if user.role == Role.ADMIN:
        return True
    return False
```

**Checklist:**
- [ ] Authorization checks on every sensitive operation
- [ ] Principle of least privilege
- [ ] Separate read/write permissions
- [ ] Resource-level permissions (not just endpoint-level)
- [ ] Test with different user roles and scenarios

---

## Input Validation

### SQL Injection Prevention

```python
# ❌ NEVER: String concatenation
query = f"SELECT * FROM users WHERE id = {user_id}"  # VULNERABLE!

# ✅ ALWAYS: Parameterized queries
query = "SELECT * FROM users WHERE id = ?"
cursor.execute(query, (user_id,))

# ✅ OR: Use ORM
user = db.query(User).filter(User.id == user_id).first()
```

### Input Validation

```python
from pydantic import BaseModel, EmailStr, validator, constr

class UserCreate(BaseModel):
    email: EmailStr  # Validates email format
    username: constr(min_length=3, max_length=30, regex="^[a-zA-Z0-9_]+$")
    age: int

    @validator('age')
    def age_must_be_valid(cls, v):
        if not 0 <= v <= 150:
            raise ValueError('Age must be between 0 and 150')
        return v

    @validator('username')
    def username_no_profanity(cls, v):
        if contains_profanity(v):
            raise ValueError('Username contains inappropriate content')
        return v
```

**Validation Checklist:**
- [ ] Type validation (string, int, email, UUID)
- [ ] Length limits (min/max)
- [ ] Format validation (regex patterns)
- [ ] Range validation (min/max values)
- [ ] Enum validation (allowed values)
- [ ] Business rule validation
- [ ] Sanitization (remove/escape dangerous characters)

### File Upload Security

```python
ALLOWED_EXTENSIONS = {'png', 'jpg', 'jpeg', 'pdf'}
MAX_FILE_SIZE = 10 * 1024 * 1024  # 10 MB

def validate_file_upload(file):
    # Check extension
    ext = file.filename.split('.')[-1].lower()
    if ext not in ALLOWED_EXTENSIONS:
        raise HTTPException(400, "File type not allowed")

    # Check size
    file.seek(0, 2)  # Seek to end
    size = file.tell()
    file.seek(0)  # Reset
    if size > MAX_FILE_SIZE:
        raise HTTPException(400, "File too large")

    # Check content type (don't trust client)
    import magic
    mime = magic.from_buffer(file.read(2048), mime=True)
    file.seek(0)
    if mime not in ['image/png', 'image/jpeg', 'application/pdf']:
        raise HTTPException(400, "Invalid file content")

    # Virus scan (in production)
    # scan_result = antivirus.scan(file)

    return True
```

**File Upload Checklist:**
- [ ] File type whitelist (not blacklist)
- [ ] File size limit
- [ ] Content-type verification (check actual content)
- [ ] Virus scanning
- [ ] Store outside web root
- [ ] Randomize filenames (prevent overwrite)
- [ ] Serve files with correct Content-Type
- [ ] Set Content-Disposition: attachment for downloads

---

## Data Protection

### Encryption

**At Rest:**
- [ ] Database encryption enabled
- [ ] Sensitive fields encrypted (SSN, credit cards)
- [ ] Encryption keys rotated regularly
- [ ] Keys stored in vault (AWS KMS, HashiCorp Vault)

**In Transit:**
- [ ] HTTPS enforced everywhere
- [ ] TLS 1.2+ only (disable SSL, TLS 1.0/1.1)
- [ ] Strong cipher suites
- [ ] Certificate pinning (mobile apps)

### Sensitive Data Handling

```python
# ❌ BAD: Sensitive data in logs/URLs
logger.info(f"User {user.email} logged in with password {password}")
url = f"/reset-password?token={reset_token}"

# ✅ GOOD: Redacted logs, tokens in body/headers
logger.info(f"User ***@{user.email.split('@')[1]} logged in")
# POST /reset-password with token in body

# ✅ GOOD: Exclude from responses
class UserResponse(BaseModel):
    id: int
    email: str
    name: str
    # password NOT included

    class Config:
        exclude = {'password', 'password_hash'}
```

**Sensitive Data Checklist:**
- [ ] PII identified and protected
- [ ] Passwords never stored in plaintext
- [ ] Credit cards tokenized (don't store)
- [ ] API keys/secrets in environment variables
- [ ] Secrets not in version control
- [ ] Sensitive data redacted from logs
- [ ] Data retention policy (delete old data)
- [ ] GDPR compliance (right to deletion)

---

## Rate Limiting

### Implementation

```python
from slowapi import Limiter
from slowapi.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)

# Per IP address
@app.get("/api/public")
@limiter.limit("100/minute")
def public_endpoint():
    pass

# Per authenticated user
@app.get("/api/data")
@limiter.limit("1000/hour", key_func=lambda: current_user.id)
def data_endpoint(current_user: User):
    pass

# Expensive operation
@app.post("/api/reports/generate")
@limiter.limit("5/hour")
def generate_report():
    pass
```

### Rate Limit Response

```http
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1635724860
Retry-After: 45

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Rate limit exceeded. Try again in 45 seconds."
  }
}
```

**Rate Limiting Checklist:**
- [ ] Per IP address limits (prevent abuse)
- [ ] Per user limits (authenticated)
- [ ] Different limits for different endpoints
- [ ] Stricter limits on auth endpoints (prevent brute force)
- [ ] Stricter limits on expensive operations
- [ ] Rate limit info in response headers
- [ ] Retry-After header when limited

---

## Monitoring & Logging

### Security Logging

```python
import logging

logger = logging.getLogger(__name__)

# Log security events
def log_auth_failure(username: str, ip: str, reason: str):
    logger.warning(
        "Authentication failed",
        extra={
            "event": "auth_failure",
            "username": username,  # Don't log if PII concern
            "ip": ip,
            "reason": reason,
            "timestamp": datetime.utcnow().isoformat()
        }
    )

# Log suspicious activity
def log_suspicious_activity(user_id: int, action: str, details: dict):
    logger.warning(
        "Suspicious activity detected",
        extra={
            "event": "suspicious_activity",
            "user_id": user_id,
            "action": action,
            "details": details,
            "timestamp": datetime.utcnow().isoformat()
        }
    )
```

**Events to Log:**
- [ ] Authentication attempts (success/failure)
- [ ] Authorization failures
- [ ] Input validation failures
- [ ] Rate limit violations
- [ ] Suspicious patterns (rapid changes, unusual access)
- [ ] Admin actions
- [ ] Data access (especially sensitive data)
- [ ] Configuration changes

**What NOT to Log:**
- [ ] Passwords (even hashed)
- [ ] API keys/tokens
- [ ] Credit card numbers
- [ ] SSNs or other PII (unless required)

### Alerting

**Alert On:**
- [ ] Repeated authentication failures
- [ ] Privilege escalation attempts
- [ ] Unusual data access patterns
- [ ] Configuration changes
- [ ] Error rate spikes
- [ ] Latency increases
- [ ] Security scan attempts

---

## Security Testing

### Automated Scanning

**Tools:**
- OWASP ZAP (API security scanner)
- Burp Suite (web vulnerability scanner)
- Nikto (web server scanner)
- SQLMap (SQL injection testing)

### Manual Testing

**Test Cases:**
- [ ] Authentication bypass
- [ ] Authorization bypass (BOLA)
- [ ] SQL injection (all inputs)
- [ ] XSS (if returning HTML)
- [ ] SSRF (URL parameters)
- [ ] Mass assignment
- [ ] Rate limit enforcement
- [ ] Token expiration
- [ ] CORS misconfiguration
- [ ] Information disclosure

---

## Security Checklist Summary

**Critical (Block Launch):**
- [ ] All endpoints require authentication (except public ones)
- [ ] Authorization checks on all resources
- [ ] No SQL injection vulnerabilities
- [ ] Passwords hashed with bcrypt/argon2
- [ ] HTTPS enforced everywhere
- [ ] Rate limiting on auth endpoints
- [ ] Input validation on all endpoints
- [ ] Security headers configured

**High Priority (Fix Soon):**
- [ ] JWT tokens expire within 1 hour
- [ ] Rate limiting on all endpoints
- [ ] File upload validation
- [ ] CORS properly configured
- [ ] Error messages don't expose internals
- [ ] Sensitive data excluded from logs
- [ ] Monitoring and alerting configured

**Medium Priority:**
- [ ] Refresh token rotation
- [ ] Circuit breakers on external APIs
- [ ] Security logging comprehensive
- [ ] GDPR compliance (data retention)
- [ ] API documentation complete
- [ ] Deprecation strategy for old endpoints

---

**Remember:** Security is not a one-time task. Regularly audit your APIs, stay updated on vulnerabilities, and always assume attackers are probing for weaknesses.
```

### reference/performance_guide.md

```markdown
# API Performance & Scaling Guide

Guide to building fast, scalable APIs that handle production load.

## Table of Contents
1. [Database Optimization](#database-optimization)
2. [Caching Strategies](#caching-strategies)
3. [Response Optimization](#response-optimization)
4. [Scaling Patterns](#scaling-patterns)
5. [Monitoring](#monitoring)

---

## Database Optimization

### N+1 Query Problem

**The Problem:**
```python
# ❌ BAD: N+1 queries
@app.get("/api/posts")
def get_posts():
    posts = db.query(Post).all()  # 1 query
    for post in posts:
        post.author = db.query(User).get(post.author_id)  # N queries
    return posts

# For 100 posts: 1 + 100 = 101 database queries!
```

**Solution: Eager Loading**
```python
# ✅ GOOD: 2 queries total
@app.get("/api/posts")
def get_posts():
    posts = db.query(Post).options(
        joinedload(Post.author)  # Eager load author
    ).all()
    return posts

# Total: 2 queries (posts + users)
```

**Solution: Batch Loading**
```python
# ✅ GOOD: Batch load related resources
@app.get("/api/posts")
def get_posts():
    posts = db.query(Post).all()  # 1 query
    author_ids = [p.author_id for p in posts]
    authors = db.query(User).filter(User.id.in_(author_ids)).all()  # 1 query
    author_map = {a.id: a for a in authors}

    for post in posts:
        post.author = author_map[post.author_id]

    return posts
```

### Database Indexes

```sql
-- ❌ Slow query without index
SELECT * FROM users WHERE email = '[email protected]';
-- Full table scan: O(n)

-- ✅ Fast with index
CREATE INDEX idx_users_email ON users(email);
SELECT * FROM users WHERE email = '[email protected]';
-- Index lookup: O(log n)
```

**Indexing Strategy:**
```sql
-- Primary key (automatic in most DBs)
CREATE TABLE users (
  id SERIAL PRIMARY KEY
);

-- Foreign keys (for joins)
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Frequently filtered columns
CREATE INDEX idx_users_status ON users(status);
CREATE INDEX idx_users_created_at ON users(created_at);

-- Composite index for multi-column queries
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Partial index (smaller, faster)
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';
```

**When to Index:**
- [ ] Primary keys (auto)
- [ ] Foreign keys (joins)
- [ ] Columns in WHERE clauses
- [ ] Columns in ORDER BY
- [ ] Columns in GROUP BY

**When NOT to Index:**
- [ ] Small tables (< 1000 rows)
- [ ] Columns with low cardinality (e.g., boolean)
- [ ] Frequently updated columns (index overhead)

### Query Optimization

```python
# ❌ BAD: Fetching all columns
users = db.query(User).all()

# ✅ GOOD: Select only needed columns
users = db.query(User.id, User.name, User.email).all()

# ❌ BAD: Loading entire collection
all_orders = user.orders  # Loads all orders into memory

# ✅ GOOD: Paginate
recent_orders = user.orders.order_by(Order.created_at.desc()).limit(10)

# ❌ BAD: Counting with full query
count = len(db.query(User).all())

# ✅ GOOD: Use COUNT
count = db.query(func.count(User.id)).scalar()
```

### Connection Pooling

```python
from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool

# ✅ GOOD: Connection pool configuration
engine = create_engine(
    "postgresql://user:pass@localhost/db",
    poolclass=QueuePool,
    pool_size=20,          # Normal connections
    max_overflow=10,       # Extra connections under load
    pool_timeout=30,       # Wait up to 30s for connection
    pool_recycle=3600,     # Recycle connections every hour
    pool_pre_ping=True     # Test connection before using
)
```

---

## Caching Strategies

### HTTP Caching

**ETag (Entity Tag):**
```python
from hashlib import md5

@app.get("/api/users/{id}")
def get_user(id: int, request: Request):
    user = db.query(User).get(id)
    user_json = user.to_json()

    # Generate ETag from content
    etag = md5(user_json.encode()).hexdigest()

    # Check If-None-Match header
    if request.headers.get("If-None-Match") == etag:
        return Response(status_code=304)  # Not Modified

    return Response(
        content=user_json,
        headers={
            "ETag": etag,
            "Cache-Control": "max-age=60"
        }
    )
```

**Cache-Control Headers:**
```python
@app.get("/api/products")
def get_products():
    products = db.query(Product).all()
    return Response(
        content=products,
        headers={
            # Cache for 5 minutes
            "Cache-Control": "public, max-age=300",
            # Or: private, no-cache, no-store, must-revalidate
        }
    )
```

### Application-Level Caching

**In-Memory Cache (Redis):**
```python
import redis
import json

cache = redis.Redis(host='localhost', port=6379, decode_responses=True)

@app.get("/api/user/{id}")
def get_user(id: int):
    # Try cache first
    cache_key = f"user:{id}"
    cached = cache.get(cache_key)

    if cached:
        return json.loads(cached)

    # Cache miss: Query database
    user = db.query(User).get(id)
    user_dict = user.to_dict()

    # Store in cache (TTL: 5 minutes)
    cache.setex(cache_key, 300, json.dumps(user_dict))

    return user_dict
```

**Cache Invalidation:**
```python
@app.patch("/api/users/{id}")
def update_user(id: int, data: UserUpdate):
    # Update database
    user = db.query(User).get(id)
    user.name = data.name
    db.commit()

    # Invalidate cache
    cache.delete(f"user:{id}")

    return user
```

**Cache-Aside Pattern:**
```python
def get_user_with_cache(user_id: int):
    # 1. Check cache
    user = cache.get(f"user:{user_id}")
    if user:
        return user

    # 2. Cache miss: Load from DB
    user = db.query(User).get(user_id)

    # 3. Store in cache
    if user:
        cache.setex(f"user:{user_id}", 300, user.to_json())

    return user
```

### Caching Strategies

| Strategy | When to Use | TTL |
|----------|-------------|-----|
| **No Cache** | Sensitive data, real-time | 0 |
| **Short Cache** | User profiles, dashboards | 1-5 min |
| **Medium Cache** | Product listings, blog posts | 10-30 min |
| **Long Cache** | Static content, reference data | 1-24 hours |
| **Immutable** | Versioned assets (JS, CSS, images) | 1 year |

---

## Response Optimization

### Compression

```python
from fastapi.middleware.gzip import GZipMiddleware

app = FastAPI()
app.add_middleware(GZipMiddleware, minimum_size=1000)

# Compresses responses > 1KB
# Typical savings: 60-80% for JSON
```

**Compression Comparison:**
```
Original JSON: 100 KB
Gzip:          20 KB (80% reduction)
Brotli:        15 KB (85% reduction)
```

### Pagination

**Cursor-Based (Recommended):**
```python
@app.get("/api/users")
def get_users(cursor: Optional[str] = None, limit: int = 50):
    if limit > 100:
        raise HTTPException(400, "Maximum limit is 100")

    query = db.query(User).order_by(User.id)

    if cursor:
        # Decode cursor (base64-encoded last ID)
        last_id = decode_cursor(cursor)
        query = query.filter(User.id > last_id)

    users = query.limit(limit + 1).all()

    has_more = len(users) > limit
    if has_more:
        users = users[:limit]

    next_cursor = encode_cursor(users[-1].id) if has_more else None

    return {
        "data": users,
        "pagination": {
            "next_cursor": next_cursor,
            "has_more": has_more
        }
    }
```

### Field Selection (Sparse Fieldsets)

```python
@app.get("/api/users")
def get_users(fields: Optional[str] = None):
    query = db.query(User)

    if fields:
        # Parse: ?fields=id,name,email
        requested_fields = fields.split(',')
        # Select only requested columns
        columns = [getattr(User, f) for f in requested_fields if hasattr(User, f)]
        query = db.query(*columns)

    return query.all()

# Request: GET /api/users?fields=id,name
# Response size: 1 KB (vs 10 KB for full object)
```

### Response Streaming

```python
from fastapi.responses import StreamingResponse

@app.get("/api/large-report")
def get_large_report():
    def generate():
        for chunk in generate_report_chunks():
            yield json.dumps(chunk) + "\n"

    return StreamingResponse(
        generate(),
        media_type="application/x-ndjson"
    )
```

---

## Scaling Patterns

### Horizontal Scaling

**Load Balancing:**
```
       ┌─────────────┐
       │Load Balancer│
       └──────┬──────┘
              │
      ┌───────┼───────┐
      │       │       │
   ┌──▼─┐  ┌──▼─┐  ┌──▼─┐
   │API1│  │API2│  │API3│
   └────┘  └────┘  └────┘
```

**Session Stickiness:**
- Use stateless auth (JWT, not server sessions)
- Store sessions in Redis (shared across instances)
- Or: sticky sessions at load balancer level

### Caching Layer

```
┌──────┐     ┌─────┐     ┌────────┐
│Client│────▶│Redis│────▶│Database│
└──────┘     └─────┘     └────────┘
              (Cache)
```

### CDN for Static Assets

```
GET /static/app.js
┌──────┐     ┌─────┐     ┌────────┐
│Client│────▶│ CDN │────▶│ Origin │
└──────┘     └─────┘     └────────┘
             (Edge Cache)
```

### Read Replicas

```
                  ┌─────────┐
         Writes──▶│ Primary │
                  └────┬────┘
                       │ Replication
              ┌────────┼────────┐
              │        │        │
         ┌────▼───┬────▼───┬────▼───┐
Reads───▶│Replica1│Replica2│Replica3│
         └────────┴────────┴────────┘
```

### Database Sharding

```
Users A-M: Shard1
Users N-Z: Shard2

def get_shard(user_id):
    first_letter = user_id[0].upper()
    if 'A' <= first_letter <= 'M':
        return shard1
    else:
        return shard2
```

---

## Rate Limiting

### Token Bucket Algorithm

```python
import time
from collections import defaultdict

class TokenBucket:
    def __init__(self, capacity: int, refill_rate: float):
        self.capacity = capacity
        self.refill_rate = refill_rate  # tokens per second
        self.buckets = defaultdict(lambda: {
            'tokens': capacity,
            'last_refill': time.time()
        })

    def allow_request(self, key: str) -> bool:
        bucket = self.buckets[key]
        now = time.time()

        # Refill tokens based on time elapsed
        elapsed = now - bucket['last_refill']
        bucket['tokens'] = min(
            self.capacity,
            bucket['tokens'] + elapsed * self.refill_rate
        )
        bucket['last_refill'] = now

        # Check if request allowed
        if bucket['tokens'] >= 1:
            bucket['tokens'] -= 1
            return True
        return False

# Allow 100 requests per minute
limiter = TokenBucket(capacity=100, refill_rate=100/60)

@app.get("/api/data")
def get_data(request: Request):
    client_ip = request.client.host
    if not limiter.allow_request(client_ip):
        raise HTTPException(429, "Rate limit exceeded")
    return {"data": "..."}
```

### Redis-Based Rate Limiting

```python
def rate_limit(key: str, limit: int, window: int) -> bool:
    """
    key: Unique identifier (user ID, IP address)
    limit: Max requests
    window: Time window in seconds
    """
    current = cache.get(key) or 0

    if int(current) >= limit:
        return False  # Rate limited

    pipe = cache.pipeline()
    pipe.incr(key)
    pipe.expire(key, window)
    pipe.execute()

    return True

@app.get("/api/data")
def get_data(current_user: User):
    # 1000 requests per hour per user
    if not rate_limit(f"rate:{current_user.id}", 1000, 3600):
        raise HTTPException(429, "Rate limit exceeded")
    return {"data": "..."}
```

---

## Monitoring

### Key Metrics

**Latency:**
- P50 (median)
- P95 (95th percentile)
- P99 (99th percentile)

```python
from prometheus_client import Histogram

request_duration = Histogram(
    'api_request_duration_seconds',
    'API request duration',
    ['method', 'endpoint']
)

@app.middleware("http")
async def metrics_middleware(request: Request, call_next):
    start = time.time()
    response = await call_next(request)
    duration = time.time() - start

    request_duration.labels(
        method=request.method,
        endpoint=request.url.path
    ).observe(duration)

    return response
```

**Throughput:**
- Requests per second
- Requests per minute

**Error Rate:**
- 4xx errors (client errors)
- 5xx errors (server errors)

**Resource Usage:**
- CPU utilization
- Memory usage
- Database connections
- Cache hit rate

### Alerting Thresholds

| Metric | Warning | Critical |
|--------|---------|----------|
| P95 Latency | > 500ms | > 1000ms |
| Error Rate | > 1% | > 5% |
| CPU Usage | > 70% | > 90% |
| Memory Usage | > 80% | > 95% |
| Cache Hit Rate | < 80% | < 50% |

---

## Performance Checklist

**Database:**
- [ ] Indexes on foreign keys and frequently queried columns
- [ ] N+1 queries eliminated (use eager loading)
- [ ] Connection pooling configured
- [ ] Queries optimized (use EXPLAIN)
- [ ] Appropriate use of transactions

**Caching:**
- [ ] HTTP caching headers (ETag, Cache-Control)
- [ ] Application-level caching (Redis, Memcached)
- [ ] Cache invalidation strategy defined
- [ ] Cache hit rate monitored

**API Design:**
- [ ] Pagination on all collections (max 100 items)
- [ ] Field selection supported (?fields=id,name)
- [ ] Compression enabled (gzip, brotli)
- [ ] Response size limits enforced

**Scaling:**
- [ ] Stateless design (horizontal scaling ready)
- [ ] Rate limiting per user/IP
- [ ] CDN for static assets
- [ ] Read replicas for read-heavy workloads

**Monitoring:**
- [ ] Latency tracked (P50, P95, P99)
- [ ] Error rates monitored
- [ ] Resource usage dashboards
- [ ] Alerts configured for anomalies

---

## Load Testing

**Tools:**
- k6 (JavaScript-based, great for APIs)
- Apache JMeter (GUI-based, feature-rich)
- Gatling (Scala-based, enterprise-grade)
- Locust (Python-based, distributed)

**k6 Example:**
```javascript
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 100 },  // Ramp up to 100 users
    { duration: '5m', target: 100 },  // Stay at 100 users
    { duration: '2m', target: 200 },  // Ramp up to 200 users
    { duration: '5m', target: 200 },  // Stay at 200 users
    { duration: '2m', target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'], // 95% of requests < 500ms
    http_req_failed: ['rate<0.01'],   // Error rate < 1%
  },
};

export default function () {
  const res = http.get('https://api.example.com/users');
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });
  sleep(1);
}
```

---

**Remember:** Premature optimization is the root of all evil. Measure first, optimize second. Focus on bottlenecks that actually impact user experience.
```

### reference/review_checklist.md

```markdown
# API Design Review Checklist

Use this checklist when conducting API design reviews. Check off items as you review them.

---

## Pre-Review Context

- [ ] API type identified (REST, GraphQL, gRPC, WebSocket)
- [ ] Review scope defined (new API, changes, pre-launch audit)
- [ ] API specifications located and loaded
- [ ] Business context understood (use case, scale, SLAs)
- [ ] Target clients identified (web, mobile, third-party)

---

## Authentication & Authorization

### Authentication
- [ ] Authentication scheme clearly defined (OAuth2, JWT, API Keys, mTLS)
- [ ] Token format and structure documented
- [ ] Token expiration configured (≤ 1 hour for access tokens)
- [ ] Refresh token strategy implemented
- [ ] No credentials in URLs (use headers/body)
- [ ] Rate limiting on auth endpoints (prevent brute force)
- [ ] Multi-factor authentication supported (for sensitive operations)
- [ ] Password requirements enforced (min 12 chars, complexity)
- [ ] Passwords hashed with bcrypt/argon2 (never plaintext)

### Authorization
- [ ] Authorization checks on ALL sensitive endpoints
- [ ] Ownership verification for user resources
- [ ] Role-based access control (RBAC) implemented
- [ ] Principle of least privilege applied
- [ ] Admin operations require admin role verification
- [ ] Authorization happens server-side (never client-side)
- [ ] Default deny policy (explicit grants required)

**Priority:** P0 (Critical)

---

## Resource Design (REST)

- [ ] Resources use plural nouns (`/users`, not `/user`)
- [ ] HTTP verbs used correctly (GET, POST, PUT, PATCH, DELETE)
- [ ] GET requests are safe (no side effects) and idempotent
- [ ] PUT and DELETE are idempotent
- [ ] No action-based URLs (use resource + verb pattern)
- [ ] Resource hierarchy limited to 2 levels
- [ ] Consistent naming convention (snake_case or camelCase)
- [ ] URL parameters for filtering, path parameters for IDs

**Priority:** P1 (High)

---

## Error Handling

- [ ] Standardized error format across ALL endpoints
- [ ] Appropriate HTTP status codes used consistently
- [ ] Machine-readable error codes for programmatic handling
- [ ] Human-readable messages without exposing internals
- [ ] Validation errors specify which fields failed
- [ ] Request ID included in all responses
- [ ] Stack traces excluded from production responses
- [ ] Error documentation available

**Standard Error Format:**
```json
{
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Invalid request parameters",
    "details": [...],
    "request_id": "req_abc123",
    "documentation_url": "https://api.example.com/docs/errors"
  }
}
```

**Priority:** P0 (Critical)

---

## Pagination & Data Loading

- [ ] ALL collection endpoints implement pagination
- [ ] Default page size reasonable (10-50 items)
- [ ] Maximum page size enforced (≤ 100-200)
- [ ] Cursor-based pagination for large/growing datasets
- [ ] Pagination response format consistent
- [ ] Filtering parameters documented and validated
- [ ] Sorting parameters validated
- [ ] Field selection supported (`?fields=id,name`)
- [ ] Total count optional (expensive query)

**Priority:** P0 (Critical) - Unbounded collections can crash API

---

## Versioning

- [ ] Versioning strategy defined and documented
- [ ] Version specified in every request (URL or header)
- [ ] Breaking vs non-breaking changes policy documented
- [ ] Deprecation timeline and process clear
- [ ] Multiple versions supportable simultaneously
- [ ] Support for N and N-1 versions planned
- [ ] Deprecation headers used (`Deprecation`, `Sunset`)

**Recommended:** URL path versioning (`/v1/`, `/v2/`)

**Priority:** P0 (Critical) - Essential from day one

---

## Idempotency & Retries

- [ ] POST/PATCH/DELETE support idempotency keys
- [ ] Idempotency-Key header accepted
- [ ] Duplicate requests return cached response (within TTL)
- [ ] Optimistic locking with ETags or version fields
- [ ] 409 Conflict for concurrent modifications
- [ ] Retry-After header for 429/503 responses
- [ ] Idempotency behavior documented

**Priority:** P1 (High) - Prevents duplicate charges/orders

---

## Performance & Scalability

### Database
- [ ] N+1 queries prevented (eager loading, dataloaders)
- [ ] Database indexes on foreign keys and filtered columns
- [ ] Connection pooling configured
- [ ] Query optimization verified (use EXPLAIN)

### Caching
- [ ] Cache-Control headers configured appropriately
- [ ] ETag support for conditional requests
- [ ] Application-level caching strategy defined
- [ ] Cache invalidation strategy documented
- [ ] Compression enabled (gzip, brotli)

### Response Optimization
- [ ] Response size limits enforced
- [ ] Field selection supported
- [ ] Large responses paginated
- [ ] Timeout configured on external calls

**Priority:** P1 (High)

---

## Data Validation & Security

### Input Validation
- [ ] All inputs validated (type, format, length, range)
- [ ] SQL injection prevention (parameterized queries/ORMs)
- [ ] XSS prevention (output encoding, CSP headers)
- [ ] Field length limits enforced
- [ ] Type validation on all fields
- [ ] Enum values validated
- [ ] Request size limits enforced (e.g., max 10MB)

### File Uploads
- [ ] File type whitelist (not blacklist)
- [ ] File size limits enforced
- [ ] Content-type verification (check actual content)
- [ ] Virus scanning (production)
- [ ] Files stored outside web root
- [ ] Randomized filenames

### Sensitive Data
- [ ] No sensitive data in URLs
- [ ] Passwords never in plaintext
- [ ] API keys/tokens in environment variables
- [ ] Sensitive fields excluded from responses
- [ ] PII redacted from logs
- [ ] HTTPS enforced everywhere

**Priority:** P0 (Critical) - Security vulnerabilities

---

## Rate Limiting

- [ ] Rate limiting implemented per IP
- [ ] Rate limiting per authenticated user
- [ ] Different limits for different endpoints
- [ ] Stricter limits on auth endpoints
- [ ] Stricter limits on expensive operations
- [ ] Rate limit headers in responses
- [ ] Retry-After header when rate limited

**Headers:**
```
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1635724800
Retry-After: 45
```

**Priority:** P1 (High) - Prevents abuse

---

## Documentation

- [ ] OpenAPI/Swagger spec available and accurate
- [ ] Every endpoint has description
- [ ] Request/response examples for each endpoint
- [ ] Authentication requirements clearly stated
- [ ] Error responses documented with codes
- [ ] Rate limits documented
- [ ] Interactive documentation available (Swagger UI, Redoc)
- [ ] Changelog maintained for API changes
- [ ] Migration guides for breaking changes

**Priority:** P1 (High) - Essential for API consumers

---

## GraphQL-Specific

(Skip if not GraphQL)

- [ ] Query depth limiting implemented (max 5-7 levels)
- [ ] Query complexity scoring implemented
- [ ] Pagination on ALL list fields (connections pattern)
- [ ] DataLoader pattern for batching
- [ ] Proper nullable vs non-nullable field design
- [ ] Field deprecation instead of removal
- [ ] Input validation on mutations
- [ ] Field-level authorization

**Priority:** P0 (Critical) - Prevents DoS

---

## Security (OWASP API Top 10)

- [ ] Broken Object Level Authorization (BOLA) prevented
- [ ] Broken Authentication protected
- [ ] Broken Object Property Level Authorization (mass assignment) prevented
- [ ] Unrestricted Resource Consumption limited
- [ ] Broken Function Level Authorization prevented
- [ ] Unrestricted Access to Sensitive Business Flows protected
- [ ] Server Side Request Forgery (SSRF) prevented
- [ ] Security Misconfiguration addressed
- [ ] Improper Inventory Management handled
- [ ] Unsafe Consumption of APIs protected

**Priority:** P0 (Critical)

---

## Monitoring & Logging

### Logging
- [ ] Authentication attempts logged
- [ ] Authorization failures logged
- [ ] Security events logged
- [ ] Request IDs in all logs
- [ ] Correlation IDs for distributed tracing
- [ ] Sensitive data NOT logged (passwords, tokens)

### Monitoring
- [ ] Latency tracked (P50, P95, P99)
- [ ] Error rates monitored (4xx, 5xx)
- [ ] Throughput monitored (requests/second)
- [ ] Resource usage tracked (CPU, memory, connections)
- [ ] Cache hit rate monitored
- [ ] Alerts configured for anomalies

### Observability
- [ ] Distributed tracing implemented
- [ ] Health check endpoint available
- [ ] Metrics endpoint exposed (Prometheus)
- [ ] Dashboards created

**Priority:** P1 (High) - Can't debug what you can't see

---

## Testing

- [ ] Unit tests for business logic
- [ ] Integration tests for database operations
- [ ] API contract tests
- [ ] Security tests (injection, XSS, auth bypass)
- [ ] Load tests at expected scale
- [ ] Different user role testing
- [ ] Error condition testing
- [ ] Edge case testing

**Priority:** P1 (High)

---

## Production Readiness

- [ ] Load testing completed at 2x expected traffic
- [ ] Disaster recovery plan documented
- [ ] Rollback procedure tested
- [ ] Circuit breakers on external dependencies
- [ ] Graceful degradation strategy
- [ ] Database migrations tested
- [ ] Backup and restore tested
- [ ] Runbooks created for common issues
- [ ] On-call rotation defined
- [ ] Incident response process documented

**Priority:** P0 (Critical) - Before launch

---

## Special API Types

### Public API (Additional)
- [ ] Excellent documentation (primary support channel)
- [ ] Formal SLA defined
- [ ] Aggressive rate limiting
- [ ] Security hardened (assume malicious actors)
- [ ] Developer support channel available

### Mobile Backend (Additional)
- [ ] Response sizes minimized (bandwidth)
- [ ] Offline support considered
- [ ] Push notifications instead of polling
- [ ] Graceful degradation for old app versions
- [ ] Field selection mandatory

### Microservices (Additional)
- [ ] Circuit breakers implemented
- [ ] Retry logic with exponential backoff
- [ ] Aggressive timeouts (fail fast)
- [ ] Service mesh considerations
- [ ] Contract testing between services

---

## Review Sign-off

### Critical Issues (P0)
- [ ] All P0 issues identified
- [ ] All P0 issues resolved or have mitigation plan
- [ ] No security vulnerabilities remain

### Important Issues (P1)
- [ ] All P1 issues documented
- [ ] Remediation timeline established

### Summary
- [ ] Overall assessment recorded
- [ ] Top 3 issues highlighted
- [ ] Recommendations prioritized
- [ ] Follow-up review scheduled (if needed)

---

## Approval

- [ ] **Ready for Launch** - All critical issues resolved
- [ ] **Ready with Minor Issues** - Can launch with P2/P3 issues
- [ ] **Needs Work** - P1 issues must be addressed
- [ ] **Major Concerns** - P0 issues block launch

**Reviewer:** _______________
**Date:** _______________
**Next Review:** _______________

---

## Notes

(Add any additional observations, concerns, or recommendations)

```