SkillHub ClubShip Full StackFull StackBackend

optimizing-sql

Optimize SQL query performance through EXPLAIN analysis, indexing strategies, and query rewriting for PostgreSQL, MySQL, and SQL Server. Use when debugging slow queries, analyzing execution plans, or improving database performance.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

318

Hot score

Updated

March 20, 2026

Overall rating

C4.5

Composite score

4.5

Best-practice grade

B73.6

Install command

npx @skill-hub/cli install ancoleman-ai-design-components-optimizing-sql

Repository

ancoleman/ai-design-components

Skill path: skills/optimizing-sql

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack, Backend.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: ancoleman.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install optimizing-sql into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/ancoleman/ai-design-components before adding optimizing-sql to shared team environments
Use optimizing-sql for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: optimizing-sql
description: Optimize SQL query performance through EXPLAIN analysis, indexing strategies, and query rewriting for PostgreSQL, MySQL, and SQL Server. Use when debugging slow queries, analyzing execution plans, or improving database performance.
---

# SQL Optimization

Provide tactical guidance for optimizing SQL query performance across PostgreSQL, MySQL, and SQL Server through execution plan analysis, strategic indexing, and query rewriting.

## When to Use This Skill

Trigger this skill when encountering:
- Slow query performance or database timeouts
- Analyzing EXPLAIN plans or execution plans
- Determining index requirements
- Rewriting inefficient queries
- Identifying query anti-patterns (N+1, SELECT *, correlated subqueries)
- Database-specific optimization needs (PostgreSQL, MySQL, SQL Server)

## Core Optimization Workflow

### Step 1: Analyze Query Performance

Run execution plan analysis to identify bottlenecks:

**PostgreSQL:**
```sql
EXPLAIN ANALYZE SELECT * FROM users WHERE email = '[email protected]';
```

**MySQL:**
```sql
EXPLAIN FORMAT=JSON SELECT * FROM products WHERE category_id = 5;
```

**SQL Server:**
Use SQL Server Management Studio: Display Estimated Execution Plan (Ctrl+L)

**Key Metrics to Monitor:**
- **Cost**: Estimated resource consumption
- **Rows**: Number of rows processed (estimated vs actual)
- **Scan Type**: Sequential scan vs index scan
- **Execution Time**: Actual time spent on operation

For detailed execution plan interpretation, see `references/explain-guide.md`.

### Step 2: Identify Optimization Opportunities

**Common Red Flags:**

| Indicator | Problem | Solution |
|-----------|---------|----------|
| Seq Scan / Table Scan | Full table scan on large table | Add index on filter columns |
| High row count | Processing excessive rows | Add WHERE filter or index |
| Nested Loop with large outer table | Inefficient join algorithm | Index join columns |
| Correlated subquery | Subquery executes per row | Rewrite as JOIN or EXISTS |
| Sort operation on large result set | Expensive sorting | Add index matching ORDER BY |

For scan type interpretation, see `references/scan-types.md`.

### Step 3: Apply Indexing Strategies

**Index Decision Framework:**

```
Is column used in WHERE, JOIN, ORDER BY, or GROUP BY?
├─ YES → Is column selective (many unique values)?
│  ├─ YES → Is table frequently queried?
│  │  ├─ YES → ADD INDEX
│  │  └─ NO → Consider based on query frequency
│  └─ NO (low selectivity) → Skip index
└─ NO → Skip index
```

**Index Types by Use Case:**

**PostgreSQL:**
- **B-tree** (default): General-purpose, supports <, ≤, =, ≥, >, BETWEEN, IN
- **Hash**: Equality comparisons only (=)
- **GIN**: Full-text search, JSONB, arrays
- **GiST**: Spatial data, geometric types
- **BRIN**: Very large tables with naturally ordered data

**MySQL:**
- **B-tree** (default): General-purpose index
- **Full-text**: Text search on VARCHAR/TEXT columns
- **Spatial**: Spatial data types

**SQL Server:**
- **Clustered**: Table data sorted by index (one per table)
- **Non-clustered**: Separate index structure (multiple allowed)

For comprehensive indexing guidance, see `references/indexing-decisions.md` and `references/index-types.md`.

### Step 4: Design Composite Indexes

For queries filtering on multiple columns, use composite indexes:

**Column Order Matters:**
1. **Equality filters first** (most selective)
2. **Additional equality filters** (by selectivity)
3. **Range filters or ORDER BY** (last)

**Example:**
```sql
-- Query pattern
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'shipped'
ORDER BY created_at DESC
LIMIT 10;

-- Optimal composite index
CREATE INDEX idx_orders_customer_status_created
ON orders (customer_id, status, created_at DESC);
```

For composite index design patterns, see `references/composite-indexes.md`.

### Step 5: Rewrite Inefficient Queries

**Common Anti-Patterns to Avoid:**

**1. SELECT * (Over-fetching)**
```sql
-- ❌ Bad: Fetches all columns
SELECT * FROM users WHERE id = 1;

-- ✅ Good: Fetch only needed columns
SELECT id, name, email FROM users WHERE id = 1;
```

**2. N+1 Queries**
```sql
-- ❌ Bad: 1 + N queries
SELECT * FROM users LIMIT 100;
-- Then in loop: SELECT * FROM posts WHERE user_id = ?;

-- ✅ Good: Single JOIN
SELECT users.*, posts.id AS post_id, posts.title
FROM users
LEFT JOIN posts ON users.id = posts.user_id;
```

**3. Non-Sargable Queries** (functions on indexed columns)
```sql
-- ❌ Bad: Function prevents index usage
SELECT * FROM orders WHERE YEAR(created_at) = 2025;

-- ✅ Good: Sargable range condition
SELECT * FROM orders
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01';
```

**4. Correlated Subqueries**
```sql
-- ❌ Bad: Subquery executes per row
SELECT name,
  (SELECT COUNT(*) FROM orders WHERE orders.user_id = users.id)
FROM users;

-- ✅ Good: JOIN with GROUP BY
SELECT users.name, COUNT(orders.id) AS order_count
FROM users
LEFT JOIN orders ON users.id = orders.user_id
GROUP BY users.id, users.name;
```

For complete anti-pattern reference, see `references/anti-patterns.md`.
For efficient query patterns, see `references/efficient-patterns.md`.

## Quick Reference Tables

### Index Selection Guide

| Query Pattern | Index Type | Example |
|--------------|------------|---------|
| `WHERE column = value` | Single-column B-tree | `CREATE INDEX ON table (column)` |
| `WHERE col1 = ? AND col2 = ?` | Composite B-tree | `CREATE INDEX ON table (col1, col2)` |
| `WHERE text_col LIKE '%word%'` | Full-text (GIN/Full-text) | `CREATE INDEX ON table USING GIN (to_tsvector('english', text_col))` |
| `WHERE geom && box` | Spatial (GiST) | `CREATE INDEX ON table USING GIST (geom)` |
| `WHERE json_col @> '{"key":"value"}'` | JSONB (GIN) | `CREATE INDEX ON table USING GIN (json_col)` |

### Join Optimization Checklist

- [ ] Index foreign key columns on both sides of JOIN
- [ ] Order joins starting with table returning fewest rows
- [ ] Use INNER JOIN when possible (more efficient than OUTER JOIN)
- [ ] Avoid joining more than 5 tables (break into CTEs or subqueries)
- [ ] Consider denormalization for frequently joined tables in read-heavy systems

### Execution Plan Performance Targets

| Scan Type | Performance | When Acceptable |
|-----------|-------------|-----------------|
| Index-Only Scan | Best | Always preferred |
| Index Scan | Excellent | Small-medium result sets |
| Bitmap Heap Scan | Good | Medium result sets (PostgreSQL) |
| Sequential Scan | Poor | Only for small tables (<1000 rows) or full table queries |
| Table Scan | Poor | Only for small tables or unavoidable full scans |

## Database-Specific Optimizations

### PostgreSQL-Specific Features

**Partial Indexes** (index subset of rows):
```sql
CREATE INDEX idx_active_users_login
ON users (last_login)
WHERE status = 'active';
```

**Expression Indexes** (index computed values):
```sql
CREATE INDEX idx_users_email_lower
ON users (LOWER(email));
```

**Covering Indexes** (avoid heap access):
```sql
CREATE INDEX idx_users_email_covering
ON users (email) INCLUDE (id, name);
```

For comprehensive PostgreSQL optimization, see `references/postgresql.md`.

### MySQL-Specific Features

**Index Hints** (override optimizer):
```sql
SELECT * FROM orders USE INDEX (idx_orders_customer)
WHERE customer_id = 123;
```

**Storage Engine Selection:**
- **InnoDB** (default): Transactional, row-level locks, clustered primary key
- **MyISAM**: Faster reads, no transactions, table-level locks

For comprehensive MySQL optimization, see `references/mysql.md`.

### SQL Server-Specific Features

**Query Store** (track query performance over time):
```sql
ALTER DATABASE YourDatabase SET QUERY_STORE = ON;
```

**Execution Plan Warnings:**
- Look for yellow exclamation marks in graphical execution plans
- Thick arrows indicate high row counts

For comprehensive SQL Server optimization, see `references/sqlserver.md`.

## Advanced Optimization Techniques

### Common Table Expressions (CTEs)

Break complex queries into readable, maintainable parts:

```sql
WITH active_customers AS (
  SELECT id, name FROM customers WHERE status = 'active'
),
recent_orders AS (
  SELECT customer_id, COUNT(*) as order_count
  FROM orders
  WHERE created_at > NOW() - INTERVAL '30 days'
  GROUP BY customer_id
)
SELECT ac.name, COALESCE(ro.order_count, 0) as orders
FROM active_customers ac
LEFT JOIN recent_orders ro ON ac.id = ro.customer_id;
```

### EXISTS vs IN for Subqueries

Use EXISTS for better performance with large datasets:

```sql
-- ✅ Good: EXISTS stops at first match
SELECT * FROM users
WHERE EXISTS (SELECT 1 FROM orders WHERE orders.user_id = users.id);

-- ❌ Less efficient: IN builds full list
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders);
```

### Denormalization Decision Framework

Consider denormalization when:
- Query joins 3+ tables frequently
- Data is relatively static (infrequent updates)
- Read performance is critical
- Write overhead is acceptable

**Denormalization Strategies:**
1. **Duplicate columns**: Copy foreign key data into main table
2. **Summary tables**: Pre-aggregate data
3. **Materialized views**: Database-maintained denormalized views
4. **Application caching**: Redis/Memcached for frequently accessed data

## Optimization Workflow Example

**Scenario:** API endpoint taking 2 seconds to load

**Step 1: Identify Slow Query**
```
Use APM/observability tools to identify database query causing delay
```

**Step 2: Run EXPLAIN ANALYZE**
```sql
EXPLAIN ANALYZE SELECT * FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC
LIMIT 10;
```

**Step 3: Analyze Output**
```
Seq Scan on orders (cost=0.00..2500.00 rows=10)
  Filter: (customer_id = 123)
  Rows Removed by Filter: 99990
```
**Problem**: Sequential scan filtering 99,990 rows

**Step 4: Add Composite Index**
```sql
CREATE INDEX idx_orders_customer_created
ON orders (customer_id, created_at DESC);
```

**Step 5: Verify Improvement**
```sql
EXPLAIN ANALYZE SELECT * FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC
LIMIT 10;
```
```
Index Scan using idx_orders_customer_created (cost=0.42..12.44 rows=10)
  Index Cond: (customer_id = 123)
```
**Result**: 200x faster (2000ms → 10ms)

## Monitoring and Maintenance

**Regular Optimization Tasks:**
- Review slow query logs weekly
- Update database statistics regularly (ANALYZE in PostgreSQL, UPDATE STATISTICS in SQL Server)
- Monitor index usage (drop unused indexes)
- Archive old data to keep tables manageable
- Review execution plans for critical queries quarterly

**PostgreSQL Statistics Update:**
```sql
ANALYZE table_name;
```

**MySQL Statistics Update:**
```sql
ANALYZE TABLE table_name;
```

**SQL Server Statistics Update:**
```sql
UPDATE STATISTICS table_name;
```

## Related Skills

- **databases-relational**: Schema design and database fundamentals
- **observability**: Performance monitoring and slow query detection
- **api-patterns**: API-level optimization (pagination, caching)
- **performance-engineering**: Application performance profiling

## Additional Resources

For comprehensive documentation, reference these files:

- `references/explain-guide.md` - Detailed EXPLAIN plan interpretation
- `references/scan-types.md` - Scan type meanings and performance implications
- `references/indexing-decisions.md` - When and how to add indexes
- `references/index-types.md` - Database-specific index types
- `references/composite-indexes.md` - Multi-column index design
- `references/anti-patterns.md` - Common anti-patterns with solutions
- `references/efficient-patterns.md` - Efficient query patterns
- `references/postgresql.md` - PostgreSQL-specific optimizations
- `references/mysql.md` - MySQL-specific optimizations
- `references/sqlserver.md` - SQL Server-specific optimizations

For working SQL examples, see `examples/` directory.


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### references/explain-guide.md

```markdown
# EXPLAIN Analysis Guide

Comprehensive guide to interpreting execution plans across PostgreSQL, MySQL, and SQL Server.

## Table of Contents

1. [PostgreSQL EXPLAIN](#postgresql-explain)
2. [MySQL EXPLAIN](#mysql-explain)
3. [SQL Server Execution Plans](#sql-server-execution-plans)
4. [Key Metrics to Monitor](#key-metrics-to-monitor)
5. [Common Patterns and Solutions](#common-patterns-and-solutions)

## PostgreSQL EXPLAIN

### Basic EXPLAIN Syntax

**EXPLAIN** - Show query plan without execution:
```sql
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';
```

**EXPLAIN ANALYZE** - Execute query and show actual timing:
```sql
EXPLAIN ANALYZE SELECT * FROM users WHERE email = '[email protected]';
```

**EXPLAIN (ANALYZE, BUFFERS)** - Include buffer usage:
```sql
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM users WHERE email = '[email protected]';
```

### Reading PostgreSQL Output

**Example Output:**
```
Seq Scan on users  (cost=0.00..1500.00 rows=1 width=100) (actual time=50.123..50.124 rows=1 loops=1)
  Filter: (email = '[email protected]'::text)
  Rows Removed by Filter: 99999
Planning Time: 0.100 ms
Execution Time: 50.150 ms
```

**Key Components:**

- **Operation**: `Seq Scan` (scan type)
- **Table**: `users` (table being scanned)
- **cost=0.00..1500.00**: Estimated cost range (startup..total)
- **rows=1**: Estimated rows returned
- **width=100**: Average row size in bytes
- **actual time=50.123..50.124**: Actual time range (milliseconds)
- **rows=1**: Actual rows returned
- **loops=1**: Number of times operation executed

**Cost Interpretation:**
- Cost is arbitrary units (not milliseconds)
- Compare relative costs between plans
- Lower cost = better (usually)

### PostgreSQL Scan Types

**Sequential Scan (Seq Scan):**
- Reads entire table from disk
- No index used
- Acceptable for small tables or full table queries
- **Red flag** for large tables with WHERE clause

**Index Scan:**
- Direct index traversal
- Excellent for small result sets
- Accesses heap table to retrieve full rows

**Index-Only Scan:**
- All data retrieved from index (no heap access)
- Best performance
- Requires covering index with all needed columns

**Bitmap Heap Scan:**
- Two-step process: identify rows in index → fetch from heap
- Efficient for medium result sets
- Combines multiple index scans

**Nested Loop Join:**
- Iterate outer table, lookup inner table per row
- Good when outer table is small
- Requires index on inner table join column

**Hash Join:**
- Build hash table of inner table, probe with outer
- Good for large tables
- Memory-intensive

**Merge Join:**
- Sort both tables, merge sorted results
- Good for pre-sorted data
- Expensive if sorting required

### PostgreSQL Optimization Examples

**Example 1: Sequential Scan → Index Scan**

**Before:**
```sql
EXPLAIN ANALYZE SELECT * FROM users WHERE email = '[email protected]';
```
```
Seq Scan on users  (cost=0.00..1500.00 rows=1) (actual time=50.123..50.124 rows=1)
  Filter: (email = '[email protected]')
  Rows Removed by Filter: 99999
```

**Optimization:**
```sql
CREATE INDEX idx_users_email ON users (email);
```

**After:**
```sql
EXPLAIN ANALYZE SELECT * FROM users WHERE email = '[email protected]';
```
```
Index Scan using idx_users_email on users  (cost=0.42..8.44 rows=1) (actual time=0.025..0.026 rows=1)
  Index Cond: (email = '[email protected]')
```

**Result:** 1000x faster (50ms → 0.05ms)

## MySQL EXPLAIN

### Basic EXPLAIN Syntax

**Standard EXPLAIN:**
```sql
EXPLAIN SELECT * FROM products WHERE category_id = 5 AND price > 100;
```

**JSON Format** (detailed output):
```sql
EXPLAIN FORMAT=JSON SELECT * FROM products WHERE category_id = 5;
```

### Reading MySQL Output

**Example Output:**
```
+----+-------------+----------+------+---------------+------+---------+------+-------+-------------+
| id | select_type | table    | type | possible_keys | key  | key_len | ref  | rows  | Extra       |
+----+-------------+----------+------+---------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | products | ALL  | NULL          | NULL | NULL    | NULL | 50000 | Using where |
+----+-------------+----------+------+---------------+------+---------+------+-------+-------------+
```

**Key Columns:**

- **id**: Query identifier
- **select_type**: SIMPLE, PRIMARY, SUBQUERY, DERIVED, UNION
- **table**: Table being accessed
- **type**: Access type (performance indicator)
- **possible_keys**: Indexes MySQL could use
- **key**: Index actually used (NULL = no index)
- **key_len**: Bytes of index used
- **ref**: Columns/constants compared to index
- **rows**: Estimated rows examined
- **Extra**: Additional information

### MySQL Access Types (type column)

**Best to Worst Performance:**

1. **system** - Single row table (best)
2. **const** - Primary key or unique index lookup
3. **eq_ref** - One row from table for each previous row combination
4. **ref** - Non-unique index lookup
5. **range** - Index range scan (BETWEEN, >, <, IN)
6. **index** - Full index scan
7. **ALL** - Full table scan (worst)

**Target:** Achieve `const`, `eq_ref`, `ref`, or `range` types.

**Red Flag:** `ALL` (full table scan) on large tables.

### MySQL Extra Column Meanings

- **Using index**: Index-only scan (excellent)
- **Using where**: Filtering rows after retrieval
- **Using temporary**: Temporary table created (expensive)
- **Using filesort**: Sorting required (expensive for large result sets)
- **Using join buffer**: Join buffer used (index missing on join column)

### MySQL Optimization Examples

**Example 1: ALL → range**

**Before:**
```sql
EXPLAIN SELECT * FROM products WHERE category_id = 5 AND price > 100;
```
```
type: ALL, possible_keys: NULL, key: NULL, rows: 50000
```

**Optimization:**
```sql
CREATE INDEX idx_products_category_price ON products (category_id, price);
```

**After:**
```sql
EXPLAIN SELECT * FROM products WHERE category_id = 5 AND price > 100;
```
```
type: range, key: idx_products_category_price, rows: 150
```

**Result:** Rows examined reduced from 50,000 to 150.

## SQL Server Execution Plans

### Accessing Execution Plans

**Estimated Execution Plan** (Ctrl+L in SSMS):
```sql
-- Right-click query → Display Estimated Execution Plan
SELECT * FROM Sales.Orders WHERE CustomerID = 123;
```

**Actual Execution Plan** (Ctrl+M, then execute):
```sql
-- Query → Include Actual Execution Plan → Execute
SELECT * FROM Sales.Orders WHERE CustomerID = 123;
```

### Reading SQL Server Execution Plans

**Graphical Execution Plan Components:**

- **Operations**: Boxes representing operations (Scan, Seek, Join)
- **Arrows**: Data flow direction (right to left)
- **Thickness**: Relative row count (thick = many rows)
- **Warnings**: Yellow exclamation marks (missing indexes, implicit conversions)
- **Cost %**: Percentage of total query cost

**Read Direction:** Right to left, top to bottom.

### SQL Server Scan Types

**Table Scan:**
- Reads entire table
- No index available
- Red flag for large tables

**Clustered Index Scan:**
- Reads entire clustered index (full table)
- Similar to table scan

**Index Seek:**
- Direct index lookup
- Excellent performance
- Target for WHERE, JOIN conditions

**Index Scan:**
- Reads entire index
- Better than table scan if index is smaller
- Still inefficient for large indexes

**Key Lookup:**
- Additional lookup to retrieve non-indexed columns
- Indicates covering index opportunity

### SQL Server Optimization Examples

**Example 1: Identify Missing Index**

**Execution Plan Warning:**
```
Missing Index (Impact 95%)
CREATE NONCLUSTERED INDEX [<Name of Missing Index>]
ON [dbo].[Orders] ([CustomerID])
```

**Action:**
```sql
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID
ON dbo.Orders (CustomerID);
```

**Example 2: Query Store Analysis**

Find top 10 expensive queries:
```sql
SELECT TOP 10
    q.query_id,
    qt.query_sql_text,
    rs.avg_duration / 1000.0 AS avg_duration_ms,
    rs.avg_logical_io_reads,
    rs.count_executions
FROM sys.query_store_query q
INNER JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
INNER JOIN sys.query_store_plan p ON q.query_id = p.query_id
INNER JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id
ORDER BY rs.avg_duration DESC;
```

## Key Metrics to Monitor

### Cross-Database Metrics

| Metric | Good | Warning | Critical |
|--------|------|---------|----------|
| Rows examined vs returned | <10x | 10-100x | >100x |
| Execution time | <10ms | 10-100ms | >100ms |
| Index usage | Present | Partial | None |
| Sort operations | None | Small dataset | Large dataset |

### PostgreSQL-Specific Metrics

- **Buffer hits vs reads**: High hit ratio indicates good cache usage
- **Planning time**: Should be <1ms typically
- **Execution time**: Target <100ms for user-facing queries

### MySQL-Specific Metrics

- **Handler_read_rnd_next**: High value indicates full table scans
- **Created_tmp_tables**: Temporary table creation count
- **Sort_scan**: Number of sorts requiring table scan

### SQL Server-Specific Metrics

- **Logical reads**: Pages read from cache
- **Physical reads**: Pages read from disk (minimize)
- **CPU time**: CPU milliseconds consumed

## Common Patterns and Solutions

### Pattern 1: High Row Count with Low Results

**Symptom:**
```
Seq Scan on table  (cost=0.00..10000.00 rows=100000)
  Filter: (column = value)
  Rows Removed by Filter: 99999
```

**Solution:** Add index on filter column
```sql
CREATE INDEX idx_table_column ON table (column);
```

### Pattern 2: Nested Loop with Large Outer Table

**Symptom:**
```
Nested Loop  (cost=0.00..50000.00 rows=10000)
  -> Seq Scan on large_table (rows=10000)
  -> Index Scan on small_table
```

**Solutions:**
1. Add index on large_table filter columns (reduce outer rows)
2. Reorder joins (start with smaller result set)
3. Force hash join if appropriate (PostgreSQL: `SET enable_nestloop = off`)

### Pattern 3: Sort Operation on Large Result Set

**Symptom:**
```
Sort  (cost=5000.00..6000.00 rows=100000)
  Sort Key: created_at DESC
  -> Seq Scan on orders
```

**Solution:** Create index matching ORDER BY
```sql
CREATE INDEX idx_orders_created ON orders (created_at DESC);
```

### Pattern 4: Multiple OR Conditions

**Symptom:**
```sql
SELECT * FROM users WHERE status = 'active' OR status = 'pending' OR status = 'verified';
```
```
Seq Scan on users
  Filter: ((status = 'active') OR (status = 'pending') OR (status = 'verified'))
```

**Solution:** Use IN or UNION ALL
```sql
-- Better: Use IN
SELECT * FROM users WHERE status IN ('active', 'pending', 'verified');

-- Or: Use UNION ALL with separate indexes
SELECT * FROM users WHERE status = 'active'
UNION ALL
SELECT * FROM users WHERE status = 'pending'
UNION ALL
SELECT * FROM users WHERE status = 'verified';
```

## Quick Reference: EXPLAIN Command Comparison

| Feature | PostgreSQL | MySQL | SQL Server |
|---------|-----------|-------|------------|
| Basic plan | `EXPLAIN` | `EXPLAIN` | Ctrl+L (SSMS) |
| Execute + timing | `EXPLAIN ANALYZE` | N/A | Ctrl+M, execute |
| Detailed output | `EXPLAIN (ANALYZE, BUFFERS, VERBOSE)` | `EXPLAIN FORMAT=JSON` | Execution Plan XML |
| Cost shown | Yes (arbitrary units) | Yes (not displayed in output) | Yes (percentage) |
| Actual rows | With ANALYZE | No | With actual plan |
| Index recommendations | No | No | Yes (warnings) |

## Best Practices

1. **Always run EXPLAIN before optimizing** - Understand the problem before solving it
2. **Compare before/after plans** - Verify optimizations work
3. **Use ANALYZE variant when possible** - Actual timing beats estimates
4. **Check row estimates vs actuals** - Large discrepancies indicate outdated statistics
5. **Update statistics regularly** - Run ANALYZE/UPDATE STATISTICS weekly
6. **Monitor production queries** - Enable slow query log or Query Store
7. **Archive execution plans** - Track performance changes over time

```

### references/scan-types.md

```markdown
# Scan Types Interpretation

Detailed guide to understanding scan types and join algorithms across PostgreSQL, MySQL, and SQL Server.

## PostgreSQL Scan Types

### Sequential Scan (Seq Scan)

**Description:** Reads entire table sequentially from disk.

**When Used:**
- No suitable index available
- Small tables (optimizer determines full scan is cheaper)
- Query needs majority of table rows

**Performance:**
- Poor for large tables with selective WHERE clause
- Acceptable for small tables (<1000 rows)
- Acceptable when retrieving large percentage of rows

**Example:**
```sql
EXPLAIN SELECT * FROM users WHERE last_login < '2020-01-01';
```
```
Seq Scan on users  (cost=0.00..1500.00 rows=50000)
  Filter: (last_login < '2020-01-01')
```

**Optimization:** Add index on `last_login`

### Index Scan

**Description:** Traverses index to find matching rows, then fetches full rows from heap table.

**When Used:**
- Index available on WHERE/JOIN columns
- Small to medium result sets
- Random access pattern acceptable

**Performance:**
- Excellent for selective queries (<5% of table)
- Better than Seq Scan for most filtered queries

**Example:**
```sql
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';
```
```
Index Scan using idx_users_email on users  (cost=0.42..8.44 rows=1)
  Index Cond: (email = '[email protected]')
```

**Key Indicator:** `Index Cond` shows index is being used.

### Index-Only Scan

**Description:** Retrieves all data from index without accessing heap table.

**When Used:**
- Covering index contains all needed columns
- Index includes WHERE and SELECT columns

**Performance:**
- Best possible performance
- No heap access required
- Minimal I/O

**Example:**
```sql
CREATE INDEX idx_users_email_covering ON users (email) INCLUDE (id, name);

EXPLAIN SELECT id, name FROM users WHERE email = '[email protected]';
```
```
Index Only Scan using idx_users_email_covering on users  (cost=0.42..4.44 rows=1)
  Index Cond: (email = '[email protected]')
```

**Optimization Goal:** Design covering indexes for frequently-run queries.

### Bitmap Heap Scan

**Description:** Two-step process:
1. Bitmap Index Scan: Create bitmap of matching row locations
2. Bitmap Heap Scan: Fetch rows from heap using bitmap

**When Used:**
- Medium-sized result sets (5-50% of table)
- Multiple index conditions combined with OR
- PostgreSQL optimizer determines it's more efficient than Index Scan

**Performance:**
- More efficient than Index Scan for medium result sets
- Allows combining multiple indexes

**Example:**
```sql
EXPLAIN SELECT * FROM orders WHERE status = 'pending' AND priority = 'high';
```
```
Bitmap Heap Scan on orders  (cost=20.00..500.00 rows=1000)
  Recheck Cond: ((status = 'pending') AND (priority = 'high'))
  -> Bitmap Index Scan on idx_orders_status  (cost=0.00..10.00 rows=2000)
        Index Cond: (status = 'pending')
  -> Bitmap Index Scan on idx_orders_priority  (cost=0.00..10.00 rows=1500)
        Index Cond: (priority = 'high')
```

**Key Indicator:** Multiple bitmap index scans combined.

## PostgreSQL Join Algorithms

### Nested Loop Join

**Description:** For each row in outer table, scan inner table for matches.

**When Used:**
- Small outer table
- Index on inner table join column
- Selective join conditions

**Performance:**
- Excellent when outer table is small (<100 rows)
- Poor when outer table is large

**Example:**
```sql
EXPLAIN SELECT * FROM small_table
INNER JOIN large_table ON small_table.id = large_table.small_id;
```
```
Nested Loop  (cost=0.42..100.00 rows=10)
  -> Seq Scan on small_table  (cost=0.00..1.10 rows=10)
  -> Index Scan using idx_large_small_id on large_table  (cost=0.42..9.89 rows=1)
        Index Cond: (small_id = small_table.id)
```

**Optimization:** Ensure inner table has index on join column.

### Hash Join

**Description:**
1. Build hash table of inner table
2. Probe hash table with outer table rows

**When Used:**
- Large tables being joined
- Equality join conditions (=)
- No suitable indexes

**Performance:**
- Excellent for large tables
- Memory-intensive (hash table must fit in work_mem)
- Single pass through each table

**Example:**
```sql
EXPLAIN SELECT * FROM orders
INNER JOIN customers ON orders.customer_id = customers.id;
```
```
Hash Join  (cost=500.00..5000.00 rows=10000)
  Hash Cond: (orders.customer_id = customers.id)
  -> Seq Scan on orders  (cost=0.00..1000.00 rows=10000)
  -> Hash  (cost=250.00..250.00 rows=5000)
        -> Seq Scan on customers  (cost=0.00..250.00 rows=5000)
```

**Tuning:** Increase `work_mem` if hash join spills to disk.

### Merge Join

**Description:**
1. Sort both tables by join key
2. Merge sorted results

**When Used:**
- Both tables already sorted by join key
- Merge join cheaper than hash join
- Equality join conditions

**Performance:**
- Excellent when data pre-sorted
- Expensive if sorting required
- Efficient for very large tables (no memory constraints)

**Example:**
```sql
EXPLAIN SELECT * FROM table1
INNER JOIN table2 ON table1.sorted_col = table2.sorted_col;
```
```
Merge Join  (cost=1000.00..2000.00 rows=10000)
  Merge Cond: (table1.sorted_col = table2.sorted_col)
  -> Index Scan using idx_table1_sorted on table1  (cost=0.00..500.00 rows=10000)
  -> Index Scan using idx_table2_sorted on table2  (cost=0.00..500.00 rows=10000)
```

**Optimization:** Add indexes matching join columns for pre-sorted data.

## MySQL Access Types

### ALL (Table Scan)

**Description:** Full table scan, reads every row.

**When Used:**
- No suitable index
- Small tables
- Query retrieves most rows

**Performance:**
- Worst performance for large tables
- Acceptable for small tables (<1000 rows)

**Example:**
```sql
EXPLAIN SELECT * FROM products WHERE description LIKE '%widget%';
```
```
type: ALL, rows: 50000
```

**Optimization:** Add index or use full-text search.

### const

**Description:** Single row lookup via primary key or unique index.

**When Used:**
- WHERE clause on primary key or unique index with constant value
- Comparison with constant value

**Performance:**
- Best possible performance
- Single row access

**Example:**
```sql
EXPLAIN SELECT * FROM users WHERE id = 123;
```
```
type: const, key: PRIMARY, rows: 1
```

**Key Indicator:** `type: const`, `rows: 1`

### eq_ref

**Description:** One row from table for each combination of rows from previous tables.

**When Used:**
- JOIN on primary key or unique index
- All parts of index used

**Performance:**
- Excellent for joins
- One row per outer table row

**Example:**
```sql
EXPLAIN SELECT * FROM orders
INNER JOIN customers ON orders.customer_id = customers.id;
```
```
type: eq_ref, key: PRIMARY, ref: orders.customer_id, rows: 1
```

### ref

**Description:** Non-unique index lookup.

**When Used:**
- Index used but not unique
- Multiple rows may match

**Performance:**
- Good performance
- Efficient for moderately selective queries

**Example:**
```sql
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
```
```
type: ref, key: idx_orders_customer, ref: const, rows: 10
```

### range

**Description:** Index range scan using BETWEEN, <, >, IN, LIKE.

**When Used:**
- Range conditions on indexed columns
- IN with small list
- LIKE with prefix (not wildcard at start)

**Performance:**
- Good performance
- Better than full table scan

**Example:**
```sql
EXPLAIN SELECT * FROM orders WHERE created_at > '2025-01-01';
```
```
type: range, key: idx_orders_created, rows: 1000
```

### index

**Description:** Full index scan (reads entire index).

**When Used:**
- All needed columns in index (covering index)
- Scanning entire index cheaper than table scan

**Performance:**
- Better than ALL if index smaller than table
- Still inefficient for large indexes

**Example:**
```sql
EXPLAIN SELECT id FROM users;
```
```
type: index, key: PRIMARY, rows: 100000
```

## SQL Server Scan Types

### Table Scan

**Description:** Reads entire heap table (no clustered index).

**When Used:**
- Table has no clustered index
- No suitable non-clustered index

**Performance:**
- Worst performance
- Entire table read from disk

**Optimization:** Add clustered index or non-clustered index.

### Clustered Index Scan

**Description:** Reads entire clustered index (reads all table data).

**When Used:**
- Query needs most/all rows
- No filtering or non-sargable WHERE clause

**Performance:**
- Similar to table scan
- Entire table scanned

**Example:**
```sql
-- Execution plan shows "Clustered Index Scan"
SELECT * FROM Orders WHERE YEAR(OrderDate) = 2025;
```

**Optimization:** Make WHERE clause sargable or add non-clustered index.

### Index Seek

**Description:** Efficient index traversal to specific rows.

**When Used:**
- WHERE clause uses indexed columns
- Sargable query conditions

**Performance:**
- Best performance
- Minimal I/O

**Example:**
```sql
-- Execution plan shows "Index Seek"
SELECT * FROM Orders WHERE OrderID = 12345;
```

**Key Indicator:** Seek operation in graphical plan.

### Index Scan

**Description:** Reads entire index (not entire table).

**When Used:**
- Covering index contains all needed columns
- Non-sargable conditions

**Performance:**
- Better than table scan if index smaller
- Not as efficient as Index Seek

**Example:**
```sql
-- Execution plan shows "Index Scan" on non-clustered index
SELECT CustomerID FROM Orders;
```

### Key Lookup (RID Lookup / Bookmark Lookup)

**Description:** After index seek, additional lookup to retrieve non-indexed columns.

**When Used:**
- Non-clustered index seek finds rows
- Additional columns needed not in index

**Performance:**
- Additional I/O per row
- Indicates covering index opportunity

**Example:**
```sql
-- Index on CustomerID, but SELECT includes other columns
SELECT * FROM Orders WHERE CustomerID = 123;
```
```
Index Seek (idx_Orders_CustomerID)
  -> Key Lookup (retrieve remaining columns)
```

**Optimization:** Create covering index with INCLUDE clause:
```sql
CREATE NONCLUSTERED INDEX idx_Orders_CustomerID_Covering
ON Orders (CustomerID)
INCLUDE (OrderDate, TotalAmount);
```

## Performance Hierarchy

### PostgreSQL (Best to Worst)

1. Index-Only Scan
2. Index Scan
3. Bitmap Heap Scan
4. Sequential Scan

### MySQL (Best to Worst)

1. system
2. const
3. eq_ref
4. ref
5. range
6. index
7. ALL

### SQL Server (Best to Worst)

1. Index Seek (non-clustered)
2. Clustered Index Seek
3. Index Scan (covering index)
4. Clustered Index Scan
5. Table Scan

## Decision Matrix: When to Add Index

| Scan Type | Rows Examined | Rows Returned | Action |
|-----------|---------------|---------------|--------|
| Seq Scan / ALL | >10,000 | <1% | ADD INDEX |
| Seq Scan / ALL | >10,000 | 1-10% | ADD INDEX |
| Seq Scan / ALL | >10,000 | >50% | OK (full scan appropriate) |
| Index Scan | Any | <5% | GOOD |
| Bitmap Heap Scan | Any | 5-50% | GOOD (PostgreSQL optimization) |
| Hash Join | Large tables | Any | Consider indexes on join columns |

## Common Optimization Patterns

### Pattern 1: Seq Scan → Index Scan

**Symptom:**
```
Seq Scan on table  (cost=0.00..10000.00 rows=1)
  Filter: (column = value)
  Rows Removed by Filter: 999999
```

**Solution:**
```sql
CREATE INDEX idx_table_column ON table (column);
```

### Pattern 2: Index Scan → Index-Only Scan

**Symptom:**
```
Index Scan using idx_table_column on table  (cost=0.42..100.00 rows=10)
  Index Cond: (column = value)
```

**Solution:** Create covering index
```sql
CREATE INDEX idx_table_column_covering
ON table (column) INCLUDE (other_column1, other_column2);
```

### Pattern 3: Nested Loop → Hash Join

**Symptom:**
```
Nested Loop  (cost=0.00..100000.00 rows=10000)
  -> Seq Scan on large_outer  (rows=10000)
  -> Index Scan on large_inner
```

**Solution:** Add index to outer table to reduce rows, or let optimizer choose hash join.

### Pattern 4: Multiple Index Scans → Composite Index

**Symptom:**
```
Bitmap Heap Scan on table
  -> Bitmap Index Scan on idx_col1
  -> Bitmap Index Scan on idx_col2
```

**Solution:** Create composite index
```sql
CREATE INDEX idx_table_col1_col2 ON table (col1, col2);
```

## Quick Reference

### Scan Type Performance

| Database | Best | Good | Acceptable | Poor |
|----------|------|------|------------|------|
| PostgreSQL | Index-Only Scan | Index Scan, Bitmap Heap Scan | - | Seq Scan |
| MySQL | const, eq_ref | ref, range | index | ALL |
| SQL Server | Index Seek | Index Scan (covering) | Clustered Index Scan | Table Scan |

### When Each Scan Type is OK

- **Sequential/Full Table Scan**: Small tables (<1000 rows), or query needs >50% of rows
- **Index Scan**: Selective queries (<10% of rows), index available
- **Index-Only Scan**: Always preferred when possible (covering index)
- **Bitmap Heap Scan**: Medium result sets (PostgreSQL optimization)
- **Nested Loop**: Small outer table with index on inner join column
- **Hash Join**: Large tables, equality joins, no suitable indexes

```

### references/indexing-decisions.md

```markdown
# Indexing Decisions Guide

Comprehensive framework for deciding when and how to add indexes to optimize SQL query performance.

## Table of Contents

1. [Index Decision Framework](#index-decision-framework)
2. [Index Selection Criteria](#index-selection-criteria)
3. [Single-Column vs Composite Indexes](#single-column-vs-composite-indexes)
4. [Covering Indexes](#covering-indexes)
5. [When NOT to Add Indexes](#when-not-to-add-indexes)
6. [Index Maintenance](#index-maintenance)

## Index Decision Framework

### Primary Decision Tree

```
Is column used in WHERE, JOIN, ORDER BY, or GROUP BY?
├─ YES → Is column selective (many unique values)?
│  ├─ YES → Is table frequently queried?
│  │  ├─ YES → Is table write-heavy?
│  │  │  ├─ YES → Balance read vs write performance
│  │  │  └─ NO → ADD INDEX ✅
│  │  └─ NO → Consider based on query frequency
│  └─ NO (low selectivity) → Skip index ❌
│     Exception: Partial index for specific subset
└─ NO → Skip index ❌
```

### Selectivity Assessment

**High Selectivity (Good for Indexes):**
- Primary keys (100% unique)
- Email addresses (usually unique)
- UUIDs (unique)
- User IDs (unique per user)
- Timestamps (high variety)

**Low Selectivity (Poor for Indexes):**
- Boolean fields (true/false)
- Status fields with few values (active/inactive)
- Gender fields (limited values)
- Small enum fields (<10 values)

**Selectivity Formula:**
```sql
-- PostgreSQL
SELECT
  COUNT(DISTINCT column_name)::float / COUNT(*)::float AS selectivity
FROM table_name;
```

**Guideline:** Selectivity > 0.1 (10% unique) = candidate for index

### Query Frequency Assessment

**High Frequency (Prioritize Indexing):**
- User-facing queries (every page load)
- API endpoints hit frequently
- Real-time dashboards
- Background jobs running every minute

**Medium Frequency (Evaluate Trade-offs):**
- Admin dashboards
- Reporting queries (daily/weekly)
- Batch processes (hourly)

**Low Frequency (Deprioritize Indexing):**
- Ad-hoc queries
- One-time data migrations
- Infrequent reports (monthly/quarterly)

## Index Selection Criteria

### Criteria 1: Column Usage Patterns

**WHERE Clause:**
```sql
SELECT * FROM orders WHERE customer_id = 123;
```
**Decision:** Index `customer_id` (equality filter)

**JOIN Conditions:**
```sql
SELECT * FROM orders
INNER JOIN customers ON orders.customer_id = customers.id;
```
**Decision:** Index `orders.customer_id` and `customers.id` (foreign key + primary key)

**ORDER BY:**
```sql
SELECT * FROM orders ORDER BY created_at DESC LIMIT 10;
```
**Decision:** Index `created_at DESC` (sort optimization)

**GROUP BY:**
```sql
SELECT status, COUNT(*) FROM orders GROUP BY status;
```
**Decision:** Index `status` (grouping optimization)

### Criteria 2: Table Size

| Table Size | Index Threshold | Reasoning |
|-----------|----------------|-----------|
| <1,000 rows | Skip indexes | Query planner may prefer seq scan |
| 1,000-10,000 rows | Selective indexes | Index beneficial for selective queries |
| 10,000-1M rows | Most queries | Indexes critical for performance |
| >1M rows | All frequent queries | Index essential, consider partitioning |

### Criteria 3: Write vs Read Ratio

**Read-Heavy Tables (>90% reads):**
- Aggressive indexing strategy
- Multiple indexes acceptable
- Covering indexes beneficial

**Balanced Tables (50-90% reads):**
- Moderate indexing
- Focus on most frequent queries
- Limit to 3-5 indexes per table

**Write-Heavy Tables (>50% writes):**
- Minimal indexing
- Only index critical queries
- Consider batching writes

**Write Performance Impact:**
```
Each additional index:
- INSERT: +5-10% overhead per index
- UPDATE: +5-10% overhead per affected index
- DELETE: +5-10% overhead per index
```

### Criteria 4: Data Type Considerations

**Good for Indexing:**
- Integer types (INT, BIGINT)
- UUID/GUID
- Timestamps (DATE, TIMESTAMP)
- Short strings (VARCHAR(100))

**Acceptable for Indexing:**
- Medium strings (VARCHAR(255))
- DECIMAL/NUMERIC

**Poor for Indexing:**
- Large TEXT/BLOB columns
- Very long strings (>1000 chars)
- JSON (use specialized indexes: GIN for PostgreSQL, generated columns for MySQL)

**Exception:** Full-text indexes for TEXT columns

## Single-Column vs Composite Indexes

### When to Use Single-Column Index

**Use Case 1: Single Filter**
```sql
SELECT * FROM users WHERE email = '[email protected]';
```
**Index:**
```sql
CREATE INDEX idx_users_email ON users (email);
```

**Use Case 2: Simple JOIN**
```sql
SELECT * FROM orders
INNER JOIN customers ON orders.customer_id = customers.id;
```
**Index:**
```sql
CREATE INDEX idx_orders_customer ON orders (customer_id);
```

### When to Use Composite Index

**Use Case 1: Multiple Equality Filters**
```sql
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'shipped';
```
**Index:**
```sql
CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);
```

**Use Case 2: Filter + Sort**
```sql
SELECT * FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC;
```
**Index:**
```sql
CREATE INDEX idx_orders_customer_created ON orders (customer_id, created_at DESC);
```

**Use Case 3: Filter + Group**
```sql
SELECT customer_id, status, COUNT(*)
FROM orders
WHERE created_at > '2025-01-01'
GROUP BY customer_id, status;
```
**Index:**
```sql
CREATE INDEX idx_orders_created_customer_status
ON orders (created_at, customer_id, status);
```

### Composite Index Column Order

**Rule of Thumb:**
1. **Equality filters** (most selective first)
2. **Range filters** (if any)
3. **ORDER BY columns** (matching sort direction)

**Example:**
```sql
SELECT * FROM orders
WHERE customer_id = 123        -- Equality (put first)
  AND status IN ('shipped', 'pending')  -- Equality (put second)
  AND total > 100              -- Range (put third)
ORDER BY created_at DESC;      -- Sort (put last)
```

**Optimal Index:**
```sql
CREATE INDEX idx_orders_customer_status_total_created
ON orders (customer_id, status, total, created_at DESC);
```

**Left-Prefix Rule:**
Composite index on (A, B, C) can be used for:
- WHERE A = ?
- WHERE A = ? AND B = ?
- WHERE A = ? AND B = ? AND C = ?

But NOT for:
- WHERE B = ? (skips leading column)
- WHERE C = ? (skips leading columns)

## Covering Indexes

### What is a Covering Index?

Index that contains ALL columns needed by query (no heap table access required).

**Benefits:**
- Fastest possible performance (Index-Only Scan)
- Reduced I/O (no heap access)
- Better cache utilization

### Creating Covering Indexes

**PostgreSQL INCLUDE Clause:**
```sql
CREATE INDEX idx_users_email_covering
ON users (email) INCLUDE (id, name, created_at);
```

**MySQL Composite Index:**
```sql
-- MySQL doesn't have INCLUDE, add columns to index
CREATE INDEX idx_users_email_id_name
ON users (email, id, name);
```

**SQL Server INCLUDE Clause:**
```sql
CREATE NONCLUSTERED INDEX IX_Users_Email_Covering
ON Users (Email)
INCLUDE (ID, Name, CreatedAt);
```

### When to Use Covering Indexes

**Use Case 1: Frequent Query with Specific Columns**
```sql
-- Query runs 1000x per second
SELECT id, name FROM users WHERE email = '[email protected]';
```
**Covering Index:**
```sql
CREATE INDEX idx_users_email_covering ON users (email) INCLUDE (id, name);
```

**Use Case 2: API Endpoint Returning Specific Fields**
```sql
-- API: GET /api/orders?customer_id=123 (returns id, total, status)
SELECT id, total, status FROM orders WHERE customer_id = 123;
```
**Covering Index:**
```sql
CREATE INDEX idx_orders_customer_covering
ON orders (customer_id) INCLUDE (id, total, status);
```

### Covering Index Trade-offs

**Benefits:**
- Dramatic performance improvement (Index-Only Scan)
- No heap access = less I/O

**Costs:**
- Larger index size
- Slower writes (more data to update)
- More storage required

**Guideline:** Use covering indexes for critical queries (user-facing, high-frequency).

## When NOT to Add Indexes

### Anti-Pattern 1: Indexing Low-Selectivity Columns

**Bad:**
```sql
CREATE INDEX idx_users_is_active ON users (is_active);  -- ❌ Boolean
```

**Why:** Only 2 values (true/false), index scan often worse than seq scan.

**Exception:** Partial index for minority case
```sql
-- If 99% inactive, 1% active
CREATE INDEX idx_users_active ON users (id) WHERE is_active = true;
```

### Anti-Pattern 2: Indexing Small Tables

**Bad:**
```sql
CREATE INDEX idx_config_key ON config (key);  -- ❌ Table has 50 rows
```

**Why:** Small tables fit in memory, seq scan faster than index overhead.

**Guideline:** Skip indexes for tables <1000 rows (unless foreign keys).

### Anti-Pattern 3: Over-Indexing

**Bad:**
```sql
CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_users_email_name ON users (email, name);       -- ❌ Redundant
CREATE INDEX idx_users_email_name_id ON users (email, name, id); -- ❌ Redundant
```

**Why:** Multiple overlapping indexes waste space and slow writes.

**Fix:** Use single covering index
```sql
CREATE INDEX idx_users_email_covering ON users (email) INCLUDE (name, id);
```

### Anti-Pattern 4: Indexing Write-Heavy Columns

**Bad:**
```sql
CREATE INDEX idx_pageviews_timestamp ON pageviews (timestamp);  -- ❌ High-insert table
```

**Why:** Each insert updates index, slowing down write-heavy operations.

**Alternative:** Partition table by time range, index within partitions.

### Anti-Pattern 5: Indexing Calculated Columns (Without Expression Index)

**Bad:**
```sql
-- Query uses function
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
-- Regular index won't help
CREATE INDEX idx_users_email ON users (email);  -- ❌ Not used
```

**Fix:** Expression index (PostgreSQL, SQL Server)
```sql
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
```

## Index Maintenance

### Monitoring Index Usage

**PostgreSQL:**
```sql
-- Find unused indexes
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexname NOT LIKE 'pg_toast_%'
ORDER BY idx_scan;
```

**MySQL:**
```sql
-- Enable index statistics
SELECT * FROM sys.schema_unused_indexes;
```

**SQL Server:**
```sql
-- Find unused indexes
SELECT
  OBJECT_NAME(i.object_id) AS TableName,
  i.name AS IndexName,
  s.user_seeks,
  s.user_scans,
  s.user_lookups,
  s.user_updates
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats s
  ON i.object_id = s.object_id AND i.index_id = s.index_id
WHERE s.user_seeks = 0
  AND s.user_scans = 0
  AND s.user_lookups = 0
  AND i.type_desc = 'NONCLUSTERED';
```

### Removing Unused Indexes

**Process:**
1. Identify unused indexes (0 scans over 7+ days)
2. Verify index not used for constraints
3. Drop index during low-traffic period
4. Monitor for performance regressions

**Drop Command:**
```sql
DROP INDEX idx_unused_index ON table_name;
```

### Updating Index Statistics

**PostgreSQL:**
```sql
-- Analyze entire database
ANALYZE;

-- Analyze specific table
ANALYZE table_name;
```

**MySQL:**
```sql
-- Analyze table
ANALYZE TABLE table_name;
```

**SQL Server:**
```sql
-- Update statistics
UPDATE STATISTICS table_name;

-- Update statistics for specific index
UPDATE STATISTICS table_name index_name;
```

**Schedule:** Weekly for active tables, monthly for stable tables.

### Rebuilding Fragmented Indexes

**PostgreSQL:**
```sql
-- Rebuild index (locks table)
REINDEX INDEX idx_name;

-- Rebuild concurrently (no lock)
REINDEX INDEX CONCURRENTLY idx_name;
```

**MySQL:**
```sql
-- Optimize table (rebuilds indexes)
OPTIMIZE TABLE table_name;
```

**SQL Server:**
```sql
-- Rebuild index
ALTER INDEX index_name ON table_name REBUILD;

-- Reorganize index (online, less intrusive)
ALTER INDEX index_name ON table_name REORGANIZE;
```

**When to Rebuild:**
- Fragmentation >30%
- After large batch deletes
- Query performance degrades over time

## Index Design Patterns

### Pattern 1: Foreign Key Indexing

**Rule:** Always index foreign keys.

**Example:**
```sql
CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_id INT,
  FOREIGN KEY (customer_id) REFERENCES customers(id)
);

-- Always add this index
CREATE INDEX idx_orders_customer ON orders (customer_id);
```

**Why:** Enables efficient joins and cascading deletes.

### Pattern 2: Status + Timestamp Indexing

**Use Case:** Queries filtering by status and ordering by time.

```sql
SELECT * FROM orders
WHERE status = 'pending'
ORDER BY created_at DESC
LIMIT 10;
```

**Index:**
```sql
CREATE INDEX idx_orders_status_created
ON orders (status, created_at DESC);
```

### Pattern 3: Multi-Tenant Indexing

**Use Case:** SaaS application with tenant_id in most queries.

```sql
SELECT * FROM documents
WHERE tenant_id = 123 AND status = 'active'
ORDER BY updated_at DESC;
```

**Index:**
```sql
CREATE INDEX idx_documents_tenant_status_updated
ON documents (tenant_id, status, updated_at DESC);
```

**Rule:** Always lead with tenant_id in multi-tenant applications.

### Pattern 4: Partial Index for Subset

**Use Case:** Query only active records (minority of data).

```sql
SELECT * FROM users WHERE status = 'active';
```

**Full Index (Inefficient):**
```sql
CREATE INDEX idx_users_status ON users (status);  -- Large index
```

**Partial Index (Efficient - PostgreSQL):**
```sql
CREATE INDEX idx_users_active ON users (id)
WHERE status = 'active';  -- Smaller, faster
```

### Pattern 5: Expression Index for Case-Insensitive Search

**Use Case:** Case-insensitive email lookup.

```sql
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
```

**Expression Index (PostgreSQL, SQL Server):**
```sql
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
```

**MySQL Alternative (Generated Column):**
```sql
ALTER TABLE users ADD COLUMN email_lower VARCHAR(255)
  GENERATED ALWAYS AS (LOWER(email)) STORED;
CREATE INDEX idx_users_email_lower ON users (email_lower);
```

## Quick Reference

### Index Decision Checklist

- [ ] Column used in WHERE, JOIN, ORDER BY, or GROUP BY?
- [ ] High selectivity (>10% unique values)?
- [ ] Table size >1,000 rows?
- [ ] Query frequency high (user-facing or frequent)?
- [ ] Write-to-read ratio acceptable (<50% writes)?
- [ ] No overlapping composite indexes already exist?
- [ ] Index size acceptable (<20% of table size)?

### Index Type Selection

| Use Case | PostgreSQL | MySQL | SQL Server |
|----------|-----------|-------|------------|
| General queries | B-tree | B-tree | Non-clustered |
| Equality only | Hash | B-tree | Non-clustered |
| Full-text search | GIN | Full-text | Full-text |
| Spatial data | GiST | Spatial | Spatial |
| JSONB | GIN | Generated column + index | JSON index |
| Large sequential table | BRIN | B-tree with partitions | Clustered columnstore |

### Index Maintenance Schedule

| Task | Frequency | Purpose |
|------|-----------|---------|
| Update statistics | Weekly | Optimize query plans |
| Review unused indexes | Monthly | Remove waste |
| Rebuild fragmented indexes | Quarterly | Fix fragmentation |
| Analyze query performance | Weekly | Identify missing indexes |

```

### references/index-types.md

```markdown
# Index Types by Database

Quick reference guide to index types across PostgreSQL, MySQL, and SQL Server.

## PostgreSQL Index Types

| Index Type | Use Case | Operators Supported | Creation |
|-----------|----------|-------------------|----------|
| **B-tree** | General-purpose (default) | <, ≤, =, ≥, >, BETWEEN, IN, IS NULL | `CREATE INDEX ON table (column)` |
| **Hash** | Equality only | = | `CREATE INDEX ON table USING HASH (column)` |
| **GIN** | Full-text, JSONB, arrays | @>, ?, ?&, ?|, @@ | `CREATE INDEX ON table USING GIN (column)` |
| **GiST** | Spatial, ranges, full-text | &&, <->, @>, <<, &< | `CREATE INDEX ON table USING GIST (column)` |
| **SP-GiST** | Non-balanced trees, partitioned | Same as GiST | `CREATE INDEX ON table USING SPGIST (column)` |
| **BRIN** | Large sequential tables | <, ≤, =, ≥, > | `CREATE INDEX ON table USING BRIN (column)` |
| **Bloom** | Multi-column equality | = (multiple columns) | `CREATE INDEX ON table USING BLOOM (col1, col2, ...)` |

**Recommendations:**
- **Default:** B-tree (99% of use cases)
- **Full-text:** GIN on `tsvector`
- **JSONB:** GIN
- **Spatial:** GiST
- **Time-series >100GB:** BRIN

## MySQL Index Types

| Index Type | Use Case | Storage Engines | Creation |
|-----------|----------|----------------|----------|
| **B-tree** | General-purpose (default) | InnoDB, MyISAM | `CREATE INDEX ON table (column)` |
| **Hash** | Equality only | MEMORY engine only | `CREATE INDEX USING HASH ON table (column)` |
| **Full-text** | Text search | InnoDB (5.6+), MyISAM | `CREATE FULLTEXT INDEX ON table (column)` |
| **Spatial** | Geometric data | InnoDB (5.7+), MyISAM | `CREATE SPATIAL INDEX ON table (column)` |

**Recommendations:**
- **Default:** B-tree
- **Text search:** Full-text index
- **Spatial:** Spatial index

## SQL Server Index Types

| Index Type | Use Case | Clustered | Creation |
|-----------|----------|-----------|----------|
| **Clustered** | Primary table organization | Yes (1 per table) | `CREATE CLUSTERED INDEX ON table (column)` |
| **Non-Clustered** | Secondary lookups | No (multiple per table) | `CREATE NONCLUSTERED INDEX ON table (column)` |
| **Covering (INCLUDE)** | Index-only scans | No | `CREATE INDEX ON table (col) INCLUDE (col2, ...)` |
| **Filtered** | Partial index | No | `CREATE INDEX ON table (col) WHERE condition` |
| **Columnstore** | Analytics/DW | Yes or No | `CREATE COLUMNSTORE INDEX ON table` |
| **Full-text** | Text search | No | `CREATE FULLTEXT INDEX ON table (column)` |
| **Spatial** | Geometric data | No | `CREATE SPATIAL INDEX ON table (column)` |
| **XML** | XML data | No | `CREATE XML INDEX ON table (column)` |

**Recommendations:**
- **Primary key:** Clustered index (default)
- **Foreign keys:** Non-clustered index
- **Frequent queries:** Covering index
- **Analytics:** Columnstore index

## Cross-Database Index Comparison

### General-Purpose Indexes

| Database | Name | Notes |
|----------|------|-------|
| PostgreSQL | B-tree | Default, most common |
| MySQL | B-tree | InnoDB uses clustered primary key |
| SQL Server | Non-Clustered | Separate from table data |

### Full-Text Search

| Database | Implementation | Query Syntax |
|----------|---------------|--------------|
| PostgreSQL | GIN + tsvector | `to_tsvector() @@ to_tsquery()` |
| MySQL | Full-text index | `MATCH() AGAINST()` |
| SQL Server | Full-text index | `CONTAINS()`, `FREETEXT()` |

### Partial/Filtered Indexes

| Database | Support | Syntax |
|----------|---------|--------|
| PostgreSQL | Yes (Partial Index) | `CREATE INDEX ... WHERE condition` |
| MySQL | No | Use generated columns as workaround |
| SQL Server | Yes (Filtered Index) | `CREATE INDEX ... WHERE condition` |

### Expression/Computed Indexes

| Database | Support | Syntax |
|----------|---------|--------|
| PostgreSQL | Yes (Expression Index) | `CREATE INDEX ON table (LOWER(column))` |
| MySQL | Via Generated Columns | `ADD COLUMN ... GENERATED ... + INDEX` |
| SQL Server | Via Computed Columns | `ADD COLUMN ... AS expression + INDEX` |

### Covering Indexes

| Database | Implementation | Syntax |
|----------|---------------|--------|
| PostgreSQL | INCLUDE clause | `CREATE INDEX ON t (col) INCLUDE (col2, col3)` |
| MySQL | Add columns to index | `CREATE INDEX ON t (col, col2, col3)` |
| SQL Server | INCLUDE clause | `CREATE INDEX ON t (col) INCLUDE (col2, col3)` |

## Index Selection Decision Tree

```
What type of query?
├─ Equality (column = value)
│  ├─ PostgreSQL → B-tree or Hash
│  ├─ MySQL → B-tree
│  └─ SQL Server → Non-clustered
│
├─ Range (column > value, BETWEEN)
│  ├─ PostgreSQL → B-tree
│  ├─ MySQL → B-tree
│  └─ SQL Server → Non-clustered
│
├─ Full-text search
│  ├─ PostgreSQL → GIN (tsvector)
│  ├─ MySQL → Full-text
│  └─ SQL Server → Full-text
│
├─ JSON queries
│  ├─ PostgreSQL → GIN (JSONB)
│  ├─ MySQL → Generated column + B-tree
│  └─ SQL Server → JSON index (2016+)
│
├─ Spatial queries
│  ├─ PostgreSQL → GiST (PostGIS)
│  ├─ MySQL → Spatial
│  └─ SQL Server → Spatial
│
└─ Large time-series table
   ├─ PostgreSQL → BRIN
   ├─ MySQL → Partitioning + B-tree
   └─ SQL Server → Partitioning + Columnstore
```

```

### references/composite-indexes.md

```markdown
# Composite Index Design

Guide to designing multi-column composite indexes for optimal query performance.

## Column Order Rules

### Rule 1: Equality Filters First (Most Selective)

**Query Pattern:**
```sql
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'shipped';
```

**Optimal Index:**
```sql
CREATE INDEX idx_orders_customer_status
ON orders (customer_id, status);
```

**Why:** Most selective filters first reduce index tree traversal.

### Rule 2: Range Filters After Equality

**Query Pattern:**
```sql
SELECT * FROM orders
WHERE customer_id = 123
  AND created_at > '2025-01-01';
```

**Optimal Index:**
```sql
CREATE INDEX idx_orders_customer_created
ON orders (customer_id, created_at);
```

**Why:** Equality narrows down to specific branch, then range scan within.

### Rule 3: ORDER BY Columns Last

**Query Pattern:**
```sql
SELECT * FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC
LIMIT 10;
```

**Optimal Index:**
```sql
CREATE INDEX idx_orders_customer_created
ON orders (customer_id, created_at DESC);
```

**Why:** Pre-sorted results, no separate sort operation needed.

## Left-Prefix Rule

**Composite Index:** `(A, B, C)`

**Can Be Used For:**
- `WHERE A = ?`
- `WHERE A = ? AND B = ?`
- `WHERE A = ? AND B = ? AND C = ?`
- `WHERE A = ? ORDER BY B`

**Cannot Be Used For:**
- `WHERE B = ?` (skips leading column A)
- `WHERE C = ?` (skips leading columns A, B)
- `WHERE B = ? AND C = ?` (skips leading column A)

**Example:**
```sql
-- Index: (customer_id, status, created_at)

-- ✅ Uses index (customer_id)
SELECT * FROM orders WHERE customer_id = 123;

-- ✅ Uses index (customer_id, status)
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';

-- ✅ Uses full index
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'pending' AND created_at > '2025-01-01';

-- ❌ Cannot use index (skips customer_id)
SELECT * FROM orders WHERE status = 'pending';
```

## Common Patterns

### Pattern 1: Multi-Tenant Application

**Query:**
```sql
SELECT * FROM documents
WHERE tenant_id = 123 AND status = 'active'
ORDER BY updated_at DESC;
```

**Index:**
```sql
CREATE INDEX idx_documents_tenant_status_updated
ON documents (tenant_id, status, updated_at DESC);
```

**Why:** tenant_id first (always filtered), status second, updated_at for sorting.

### Pattern 2: Status + Time Range

**Query:**
```sql
SELECT * FROM tasks
WHERE status = 'pending' AND due_date < NOW();
```

**Index:**
```sql
CREATE INDEX idx_tasks_status_due
ON tasks (status, due_date);
```

### Pattern 3: Multiple Equality + Range

**Query:**
```sql
SELECT * FROM products
WHERE category_id = 5
  AND brand_id = 10
  AND price > 100;
```

**Index:**
```sql
CREATE INDEX idx_products_category_brand_price
ON products (category_id, brand_id, price);
```

**Column Order:** Most selective first, range last.

### Pattern 4: JOIN + Filter

**Query:**
```sql
SELECT * FROM order_items
INNER JOIN orders ON order_items.order_id = orders.id
WHERE orders.customer_id = 123;
```

**Indexes:**
```sql
-- Index on order_items for JOIN
CREATE INDEX idx_order_items_order ON order_items (order_id);

-- Composite index on orders
CREATE INDEX idx_orders_customer ON orders (customer_id);
```

## Selectivity Ordering

### High Selectivity → Low Selectivity

**Table:** 1,000,000 orders
- `customer_id`: 10,000 unique values (high selectivity)
- `status`: 5 unique values (low selectivity)

**Query:**
```sql
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';
```

**Optimal:**
```sql
CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);
```

**Why:**
- `customer_id = 123` narrows to ~100 rows
- `status = 'pending'` filters within those 100 rows
- Much better than filtering 200,000 pending orders for customer_id

**Suboptimal:**
```sql
CREATE INDEX idx_orders_status_customer ON orders (status, customer_id);
```
- `status = 'pending'` starts with 200,000 rows
- Then filters for customer_id
- Larger initial scan

## Index vs Query Mismatch

### Mismatch Example

**Index:**
```sql
CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);
```

**Query:**
```sql
-- ❌ Query uses status first (skips customer_id)
SELECT * FROM orders WHERE status = 'pending';
```

**Solution:** Create separate index for status-only queries:
```sql
CREATE INDEX idx_orders_status ON orders (status);
```

### Partial Index Solution (PostgreSQL)

Instead of full index on low-selectivity column:

```sql
-- ✅ Partial index for specific status
CREATE INDEX idx_orders_pending
ON orders (created_at)
WHERE status = 'pending';
```

## Including Non-Indexed Columns

### PostgreSQL INCLUDE Clause

**Query:**
```sql
SELECT customer_id, status, total, created_at
FROM orders
WHERE customer_id = 123;
```

**Covering Index:**
```sql
CREATE INDEX idx_orders_customer_covering
ON orders (customer_id)
INCLUDE (status, total, created_at);
```

**Benefit:** Index-Only Scan (no heap access).

### MySQL Approach

**Composite Index with Extra Columns:**
```sql
CREATE INDEX idx_orders_customer_covering
ON orders (customer_id, status, total, created_at);
```

**Trade-off:** Larger index, but enables covering scans.

### SQL Server INCLUDE Clause

**Covering Index:**
```sql
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID
ON Orders (CustomerID)
INCLUDE (Status, Total, CreatedAt);
```

## Multi-Column vs Multiple Indexes

### Single Composite Index

**Index:**
```sql
CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);
```

**Queries Supported:**
- `WHERE customer_id = ?`
- `WHERE customer_id = ? AND status = ?`

**Queries NOT Supported:**
- `WHERE status = ?` (skips leading column)

### Multiple Single-Column Indexes

**Indexes:**
```sql
CREATE INDEX idx_orders_customer ON orders (customer_id);
CREATE INDEX idx_orders_status ON orders (status);
```

**Queries Supported:**
- `WHERE customer_id = ?` (uses idx_orders_customer)
- `WHERE status = ?` (uses idx_orders_status)
- `WHERE customer_id = ? AND status = ?` (bitmap scan in PostgreSQL, index merge in MySQL)

**Trade-off:**
- More indexes = slower writes
- More flexible for varied query patterns

### Recommendation

**High-frequency query patterns:** Composite index
**Varied query patterns:** Multiple indexes or partial indexes

## Testing Composite Indexes

### Before Creating Index

```sql
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'pending'
ORDER BY created_at DESC;
```

**Look for:**
- Sequential Scan / Table Scan
- High row counts
- Sort operation

### After Creating Index

```sql
CREATE INDEX idx_orders_customer_status_created
ON orders (customer_id, status, created_at DESC);

EXPLAIN ANALYZE
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'pending'
ORDER BY created_at DESC;
```

**Look for:**
- Index Scan / Index Seek
- Low row counts
- No sort operation (pre-sorted by index)

## Quick Reference

### Column Order Priority

1. **Equality filters** (most selective first)
2. **Additional equality filters** (by selectivity)
3. **Range filters** (after all equality)
4. **ORDER BY columns** (matching sort direction)
5. **GROUP BY columns** (if not already covered)

### Common Mistakes

**❌ Wrong order:**
```sql
CREATE INDEX ON orders (created_at, customer_id);
-- Query: WHERE customer_id = ? ORDER BY created_at
-- Inefficient: Large scan on created_at first
```

**✅ Correct order:**
```sql
CREATE INDEX ON orders (customer_id, created_at);
-- Query: WHERE customer_id = ? ORDER BY created_at
-- Efficient: Narrow by customer_id, then sorted by created_at
```

### Index Size Considerations

**Each additional column:**
- Increases index size by column width
- Slows down write operations slightly
- Enables more query patterns

**Balance:**
- 2-4 columns typical
- 5+ columns rare (diminishing returns)

```

### references/anti-patterns.md

```markdown
# SQL Anti-Patterns

Common SQL performance anti-patterns with explanations, impact analysis, and solutions.

## Table of Contents

1. [SELECT * Anti-Pattern](#select--anti-pattern)
2. [N+1 Query Problem](#n1-query-problem)
3. [Missing Indexes on Foreign Keys](#missing-indexes-on-foreign-keys)
4. [Non-Sargable Queries](#non-sargable-queries)
5. [Implicit Type Conversion](#implicit-type-conversion)
6. [Correlated Subqueries](#correlated-subqueries)
7. [Unnecessary DISTINCT](#unnecessary-distinct)
8. [OR vs IN Performance](#or-vs-in-performance)
9. [NOT IN with NULL Values](#not-in-with-null-values)
10. [Wildcard at Start of LIKE](#wildcard-at-start-of-like)

## SELECT * Anti-Pattern

### Problem Description

Fetching all columns when only subset needed.

**Anti-Pattern:**
```sql
-- ❌ Bad: Fetches all 50 columns
SELECT * FROM users WHERE id = 1;
```

**Impact:**
- Increased I/O (reading unnecessary data from disk)
- Higher network transfer (sending unnecessary data)
- More memory usage (larger result sets)
- Slower query execution
- Breaks application when schema changes

**Solution:**
```sql
-- ✅ Good: Fetch only needed columns
SELECT id, name, email, created_at FROM users WHERE id = 1;
```

### When SELECT * is Acceptable

**Exception 1: Small tables with few columns**
```sql
SELECT * FROM settings;  -- OK: 3-5 columns, small table
```

**Exception 2: Exploratory queries (development only)**
```sql
-- OK during development/debugging
SELECT * FROM users LIMIT 5;
```

**Exception 3: All columns genuinely needed**
```sql
-- OK if truly need every column
SELECT * FROM user_profiles WHERE user_id = 123;
```

### Performance Comparison

**Test Case:** Users table with 50 columns, 100,000 rows

```sql
-- SELECT * : 250ms, 500MB transferred
SELECT * FROM users WHERE status = 'active';

-- Specific columns: 50ms, 50MB transferred
SELECT id, name, email FROM users WHERE status = 'active';
```

**Result:** 5x performance improvement

## N+1 Query Problem

### Problem Description

Executing 1 query to fetch parent records, then N queries (one per parent) to fetch related records.

**Anti-Pattern:**
```sql
-- ❌ Bad: 1 + N queries

-- Query 1: Fetch users
SELECT * FROM users LIMIT 100;

-- Query 2-101: For each user, fetch posts (executed 100 times in application loop)
SELECT * FROM posts WHERE user_id = ?;
```

**Impact:**
- 101 database round trips instead of 1
- Network latency multiplied by N
- Database connection overhead × N
- Poor scalability (linear growth with N)

**Solution 1: Single JOIN**
```sql
-- ✅ Good: Single query with JOIN
SELECT
  users.id,
  users.name,
  posts.id AS post_id,
  posts.title,
  posts.content
FROM users
LEFT JOIN posts ON users.id = posts.user_id
WHERE users.id IN (1, 2, 3, ...);
```

**Solution 2: Separate Queries with IN Clause**
```sql
-- ✅ Also good: 2 queries instead of N+1
-- Query 1: Fetch users
SELECT * FROM users LIMIT 100;

-- Query 2: Fetch all posts for these users
SELECT * FROM posts WHERE user_id IN (1, 2, 3, ..., 100);
```

### N+1 Detection

**Rails ActiveRecord Example:**
```ruby
# ❌ N+1 Problem
users = User.limit(100)
users.each do |user|
  puts user.posts.count  # Triggers N queries
end

# ✅ Fixed with eager loading
users = User.includes(:posts).limit(100)
users.each do |user|
  puts user.posts.count  # Uses preloaded data
end
```

**Django ORM Example:**
```python
# ❌ N+1 Problem
users = User.objects.all()[:100]
for user in users:
    print(user.posts.count())  # Triggers N queries

# ✅ Fixed with select_related / prefetch_related
users = User.objects.prefetch_related('posts').all()[:100]
for user in users:
    print(user.posts.count())  # Uses preloaded data
```

### Performance Comparison

**Test Case:** 100 users, average 5 posts each

```sql
-- N+1 approach: 101 queries, ~1000ms
-- JOIN approach: 1 query, ~50ms
```

**Result:** 20x performance improvement

## Missing Indexes on Foreign Keys

### Problem Description

Foreign key columns without indexes cause slow joins and cascading operations.

**Anti-Pattern:**
```sql
-- ❌ Bad: No index on foreign key
CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_id INT,  -- No index!
  total DECIMAL(10,2),
  FOREIGN KEY (customer_id) REFERENCES customers(id)
);
```

**Impact:**
- Slow joins (sequential scan on orders table)
- Slow cascading deletes (must scan entire orders table)
- Slow cascading updates
- Poor performance for queries filtering by customer_id

**Solution:**
```sql
-- ✅ Good: Index on foreign key
CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_id INT,
  total DECIMAL(10,2),
  FOREIGN KEY (customer_id) REFERENCES customers(id)
);

CREATE INDEX idx_orders_customer ON orders (customer_id);
```

### Why This Matters

**Query Without Index:**
```sql
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
```
```
Seq Scan on orders  (cost=0.00..10000.00 rows=100)
  Filter: (customer_id = 123)
```
**100,000 rows scanned**

**Query With Index:**
```sql
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
```
```
Index Scan using idx_orders_customer on orders  (cost=0.42..12.44 rows=100)
  Index Cond: (customer_id = 123)
```
**100 rows accessed directly**

### Cascading Delete Performance

**Without Index:**
```sql
DELETE FROM customers WHERE id = 123;
-- Scans entire orders table to find matching rows
-- Time: 500ms for 1 million row orders table
```

**With Index:**
```sql
DELETE FROM customers WHERE id = 123;
-- Uses index to find matching rows directly
-- Time: 5ms
```

**Result:** 100x faster cascading deletes

## Non-Sargable Queries

### Problem Description

**Sargable:** Search ARGument ABLE - conditions that can use indexes.

**Non-sargable:** Functions or operations on indexed columns prevent index usage.

### Anti-Pattern 1: Function on Indexed Column

**Anti-Pattern:**
```sql
-- ❌ Bad: Function prevents index usage
SELECT * FROM orders WHERE YEAR(created_at) = 2025;
```

**Why It Fails:**
- Index on `created_at` cannot be used
- Database must evaluate YEAR() for every row
- Results in sequential scan

**Solution:**
```sql
-- ✅ Good: Sargable range condition
SELECT * FROM orders
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01';
```

### Anti-Pattern 2: Arithmetic on Indexed Column

**Anti-Pattern:**
```sql
-- ❌ Bad: Arithmetic prevents index usage
SELECT * FROM products WHERE price * 1.1 > 100;
```

**Solution:**
```sql
-- ✅ Good: Move arithmetic to other side
SELECT * FROM products WHERE price > 100 / 1.1;
```

### Anti-Pattern 3: String Concatenation

**Anti-Pattern:**
```sql
-- ❌ Bad: Concatenation prevents index usage
SELECT * FROM users WHERE first_name || ' ' || last_name = 'John Doe';
```

**Solution 1: Separate Conditions**
```sql
-- ✅ Better: Use separate conditions
SELECT * FROM users WHERE first_name = 'John' AND last_name = 'Doe';
```

**Solution 2: Computed Column (SQL Server, MySQL)**
```sql
-- Add computed column with index
ALTER TABLE users ADD full_name AS (first_name + ' ' + last_name);
CREATE INDEX idx_users_full_name ON users (full_name);
```

### Anti-Pattern 4: LIKE with Leading Wildcard

**Anti-Pattern:**
```sql
-- ❌ Bad: Leading wildcard prevents index usage
SELECT * FROM users WHERE email LIKE '%@example.com';
```

**Why It Fails:**
- Cannot use B-tree index for prefix search
- Must scan entire table

**Solution 1: Trailing Wildcard (if applicable)**
```sql
-- ✅ Good: Trailing wildcard can use index
SELECT * FROM users WHERE email LIKE 'john%';
```

**Solution 2: Full-Text Index**
```sql
-- PostgreSQL: Use trigram index
CREATE INDEX idx_users_email_trgm ON users USING GIN (email gin_trgm_ops);
SELECT * FROM users WHERE email LIKE '%@example.com';

-- MySQL: Full-text index
CREATE FULLTEXT INDEX idx_users_email_fulltext ON users (email);
SELECT * FROM users WHERE MATCH(email) AGAINST('example.com');
```

### Sargable vs Non-Sargable Examples

| Non-Sargable (Bad) | Sargable (Good) |
|-------------------|----------------|
| `WHERE YEAR(date) = 2025` | `WHERE date >= '2025-01-01' AND date < '2026-01-01'` |
| `WHERE LOWER(email) = '[email protected]'` | Use expression index on `LOWER(email)` |
| `WHERE price * 1.1 > 100` | `WHERE price > 100 / 1.1` |
| `WHERE column + 10 = 50` | `WHERE column = 40` |
| `WHERE email LIKE '%@example.com'` | `WHERE email LIKE 'john%'` or use full-text |

## Implicit Type Conversion

### Problem Description

Comparing different data types forces type conversion, preventing index usage.

**Anti-Pattern:**
```sql
-- ❌ Bad: user_id is INT, '123' is VARCHAR
SELECT * FROM users WHERE user_id = '123';
```

**Why It Fails:**
- Database converts every row's user_id to string
- Index on user_id cannot be used efficiently
- Sequential scan likely

**EXPLAIN Output:**
```
Seq Scan on users  (cost=0.00..1500.00 rows=1)
  Filter: ((user_id)::text = '123'::text)
```

**Solution:**
```sql
-- ✅ Good: Matching types
SELECT * FROM users WHERE user_id = 123;
```

**EXPLAIN Output:**
```
Index Scan using idx_users_id on users  (cost=0.42..8.44 rows=1)
  Index Cond: (user_id = 123)
```

### Common Type Mismatch Scenarios

**Scenario 1: String to Number**
```sql
-- ❌ Bad
SELECT * FROM orders WHERE order_id = '12345';  -- order_id is INT

-- ✅ Good
SELECT * FROM orders WHERE order_id = 12345;
```

**Scenario 2: Date String Comparison**
```sql
-- ❌ Bad: String comparison on DATE column
SELECT * FROM events WHERE event_date = '2025-01-01';  -- Implicit conversion

-- ✅ Good: Explicit DATE type
SELECT * FROM events WHERE event_date = DATE '2025-01-01';
```

**Scenario 3: UUID Format**
```sql
-- PostgreSQL: UUID column
-- ❌ Bad
SELECT * FROM records WHERE uuid_column = 'a1b2c3d4-e5f6-7890-abcd-ef1234567890';

-- ✅ Good
SELECT * FROM records WHERE uuid_column = 'a1b2c3d4-e5f6-7890-abcd-ef1234567890'::uuid;
```

## Correlated Subqueries

### Problem Description

Subquery executes once per row in outer query, resulting in poor performance.

**Anti-Pattern:**
```sql
-- ❌ Bad: Correlated subquery
SELECT
  name,
  (SELECT COUNT(*) FROM orders WHERE orders.user_id = users.id) AS order_count
FROM users;
```

**Why It Fails:**
- Subquery executes once per user
- For 10,000 users → 10,000 subquery executions
- Even with indexes, overhead is significant

**Solution 1: JOIN with GROUP BY**
```sql
-- ✅ Good: Single query with JOIN
SELECT
  users.name,
  COUNT(orders.id) AS order_count
FROM users
LEFT JOIN orders ON users.id = orders.user_id
GROUP BY users.id, users.name;
```

**Solution 2: Lateral Join (PostgreSQL)**
```sql
-- ✅ Good: Lateral join for complex subqueries
SELECT
  users.name,
  recent_orders.order_count
FROM users
LEFT JOIN LATERAL (
  SELECT COUNT(*) AS order_count
  FROM orders
  WHERE orders.user_id = users.id
    AND orders.created_at > NOW() - INTERVAL '30 days'
) recent_orders ON true;
```

### Performance Comparison

**Test Case:** 10,000 users, 100,000 orders

```sql
-- Correlated subquery: ~5000ms
-- JOIN with GROUP BY: ~100ms
```

**Result:** 50x performance improvement

## Unnecessary DISTINCT

### Problem Description

Using DISTINCT when data is already unique adds expensive deduplication step.

**Anti-Pattern:**
```sql
-- ❌ Bad: DISTINCT on primary key (already unique)
SELECT DISTINCT id FROM users WHERE status = 'active';
```

**Why It Fails:**
- DISTINCT requires sorting or hashing
- Unnecessary overhead when uniqueness guaranteed
- Wasted CPU and memory

**Solution:**
```sql
-- ✅ Good: Remove DISTINCT (id is unique)
SELECT id FROM users WHERE status = 'active';
```

### When DISTINCT is Necessary

**Necessary Use Case:**
```sql
-- ✅ Good: DISTINCT needed for duplicate customer_ids
SELECT DISTINCT customer_id FROM orders WHERE status = 'completed';
```

**Alternative (often better):**
```sql
-- ✅ Better: Use GROUP BY (can add aggregations)
SELECT customer_id FROM orders WHERE status = 'completed' GROUP BY customer_id;
```

### DISTINCT in JOINs

**Anti-Pattern:**
```sql
-- ❌ Bad: DISTINCT to fix JOIN duplication
SELECT DISTINCT users.id, users.name
FROM users
INNER JOIN orders ON users.id = orders.user_id;
```

**Why It's Wrong:**
- DISTINCT is band-aid for incorrect JOIN
- Expensive deduplication

**Solution:**
```sql
-- ✅ Good: Use EXISTS or LEFT JOIN properly
SELECT id, name FROM users
WHERE EXISTS (SELECT 1 FROM orders WHERE orders.user_id = users.id);

-- Or if you need aggregation:
SELECT users.id, users.name, COUNT(orders.id) AS order_count
FROM users
INNER JOIN orders ON users.id = orders.user_id
GROUP BY users.id, users.name;
```

## OR vs IN Performance

### Problem Description

Multiple OR conditions can be slower than IN clause or UNION ALL.

**Anti-Pattern:**
```sql
-- ❌ Suboptimal: Multiple OR conditions
SELECT * FROM users
WHERE status = 'active'
   OR status = 'pending'
   OR status = 'verified'
   OR status = 'trial';
```

**Why It's Suboptimal:**
- May not use index efficiently
- Query planner may choose sequential scan

**Solution 1: Use IN**
```sql
-- ✅ Better: IN clause
SELECT * FROM users
WHERE status IN ('active', 'pending', 'verified', 'trial');
```

**Solution 2: Use UNION ALL (if separate indexes exist)**
```sql
-- ✅ Alternative: UNION ALL with separate index scans
SELECT * FROM users WHERE status = 'active'
UNION ALL
SELECT * FROM users WHERE status = 'pending'
UNION ALL
SELECT * FROM users WHERE status = 'verified'
UNION ALL
SELECT * FROM users WHERE status = 'trial';
```

### OR on Different Columns

**Anti-Pattern:**
```sql
-- ❌ Bad: OR on different columns
SELECT * FROM users WHERE email = '[email protected]' OR phone = '555-1234';
```

**Why It Fails:**
- Cannot use indexes effectively
- Often results in sequential scan

**Solution: UNION ALL**
```sql
-- ✅ Good: UNION ALL allows index usage on both columns
SELECT * FROM users WHERE email = '[email protected]'
UNION ALL
SELECT * FROM users WHERE phone = '555-1234';
```

## NOT IN with NULL Values

### Problem Description

NOT IN with NULL values returns unexpected results (empty set).

**Anti-Pattern:**
```sql
-- ❌ Bad: NOT IN with potential NULLs
SELECT * FROM users
WHERE id NOT IN (SELECT user_id FROM deleted_users);
-- Returns ZERO rows if any user_id is NULL!
```

**Why It Fails:**
- SQL three-valued logic (TRUE, FALSE, NULL)
- NULL in list makes entire NOT IN return NULL
- NULL is not TRUE, so row is excluded

**Solution 1: NOT EXISTS**
```sql
-- ✅ Good: NOT EXISTS handles NULLs correctly
SELECT * FROM users
WHERE NOT EXISTS (
  SELECT 1 FROM deleted_users WHERE deleted_users.user_id = users.id
);
```

**Solution 2: Filter NULLs in Subquery**
```sql
-- ✅ Also good: Explicitly exclude NULLs
SELECT * FROM users
WHERE id NOT IN (
  SELECT user_id FROM deleted_users WHERE user_id IS NOT NULL
);
```

### Performance Comparison

```sql
-- NOT IN: May be slower, fails with NULLs
-- NOT EXISTS: Usually faster, handles NULLs correctly
```

## Wildcard at Start of LIKE

### Problem Description

Leading wildcard in LIKE prevents index usage.

**Anti-Pattern:**
```sql
-- ❌ Bad: Leading wildcard
SELECT * FROM products WHERE name LIKE '%widget%';
```

**Why It Fails:**
- B-tree index cannot be used
- Must scan entire table
- Evaluate LIKE for every row

**Solution 1: Trailing Wildcard (if applicable)**
```sql
-- ✅ Good: Trailing wildcard can use index
SELECT * FROM products WHERE name LIKE 'super%';
```

**Solution 2: Full-Text Search**
```sql
-- PostgreSQL: Full-text search with GIN index
CREATE INDEX idx_products_name_fts ON products
  USING GIN (to_tsvector('english', name));

SELECT * FROM products
WHERE to_tsvector('english', name) @@ to_tsquery('english', 'widget');

-- MySQL: Full-text index
CREATE FULLTEXT INDEX idx_products_name_fulltext ON products (name);

SELECT * FROM products
WHERE MATCH(name) AGAINST('widget' IN NATURAL LANGUAGE MODE);
```

**Solution 3: Trigram Index (PostgreSQL)**
```sql
-- PostgreSQL: Trigram index for LIKE patterns
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_products_name_trgm ON products USING GIN (name gin_trgm_ops);

-- Now LIKE '%widget%' can use index
SELECT * FROM products WHERE name LIKE '%widget%';
```

## Quick Reference: Anti-Patterns Summary

| Anti-Pattern | Impact | Solution |
|-------------|--------|----------|
| SELECT * | Over-fetching, wasted I/O | SELECT specific columns |
| N+1 queries | Network latency × N | JOIN or IN clause |
| Missing FK indexes | Slow joins, cascades | Index all foreign keys |
| Functions on columns | No index usage | Sargable conditions or expression index |
| Type mismatch | Implicit conversion | Match data types |
| Correlated subquery | Subquery per row | JOIN with GROUP BY |
| Unnecessary DISTINCT | Expensive dedup | Remove if uniqueness guaranteed |
| Multiple ORs | Poor index usage | IN clause or UNION ALL |
| NOT IN with NULLs | Unexpected results | NOT EXISTS |
| Leading wildcard LIKE | Full table scan | Full-text or trigram index |

## Anti-Pattern Detection Queries

### PostgreSQL: Find SELECT * Queries

```sql
-- Enable query logging
ALTER DATABASE yourdb SET log_statement = 'all';

-- Analyze logs for SELECT * patterns
-- (requires log analysis tool or grep on log files)
```

### PostgreSQL: Find Missing Foreign Key Indexes

```sql
SELECT
  c.conrelid::regclass AS table_name,
  a.attname AS column_name,
  c.conname AS constraint_name
FROM pg_constraint c
JOIN pg_attribute a ON a.attnum = ANY(c.conkey) AND a.attrelid = c.conrelid
WHERE c.contype = 'f'
  AND NOT EXISTS (
    SELECT 1 FROM pg_index i
    WHERE i.indrelid = c.conrelid
      AND a.attnum = ANY(i.indkey)
  );
```

### SQL Server: Find Implicit Conversions

```sql
-- Check execution plans for warnings
-- Look for "CONVERT_IMPLICIT" warnings in graphical plans
```

```

### references/efficient-patterns.md

```markdown
# Efficient Query Patterns

Collection of proven efficient SQL patterns for optimal query performance.

## Table of Contents

1. [Existence Checks](#existence-checks)
2. [Pagination](#pagination)
3. [Aggregation Patterns](#aggregation-patterns)
4. [Union Operations](#union-operations)
5. [Window Functions](#window-functions)
6. [Batch Operations](#batch-operations)

## Existence Checks

### Pattern: EXISTS vs COUNT

**Use Case:** Check if related records exist.

**Inefficient:**
```sql
-- ❌ Bad: Counts all matching rows
SELECT * FROM users
WHERE (SELECT COUNT(*) FROM orders WHERE orders.user_id = users.id) > 0;
```

**Efficient:**
```sql
-- ✅ Good: Stops at first match
SELECT * FROM users
WHERE EXISTS (SELECT 1 FROM orders WHERE orders.user_id = users.id);
```

**Why EXISTS is Better:**
- Stops execution at first matching row
- No need to count all rows
- Better performance for large result sets

**Performance Comparison:**
```
COUNT(*) with 1000 matches: ~100ms
EXISTS with 1000 matches: ~5ms (stops at first match)
```

### Pattern: EXISTS vs IN

**Use Case:** Filter by values in subquery.

**Less Efficient:**
```sql
-- ❌ Suboptimal: Builds full list
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 1000);
```

**More Efficient:**
```sql
-- ✅ Better: Can stop early with semi-join
SELECT * FROM users
WHERE EXISTS (
  SELECT 1 FROM orders WHERE orders.user_id = users.id AND orders.total > 1000
);
```

**When to Use Each:**
- **EXISTS**: When checking existence or correlated conditions
- **IN**: When list is small and static (e.g., `IN (1, 2, 3)`)

## Pagination

### Pattern: Efficient LIMIT/OFFSET

**Inefficient for Deep Pagination:**
```sql
-- ❌ Bad: Offset skips rows but database still processes them
SELECT * FROM posts
ORDER BY created_at DESC
LIMIT 20 OFFSET 10000;  -- Skip 10,000 rows
```

**Why It's Slow:**
- Database processes all 10,020 rows
- Sorts 10,020 rows
- Then discards first 10,000
- Performance degrades linearly with offset

**Efficient Keyset Pagination:**
```sql
-- ✅ Good: Use last seen value as cursor
SELECT * FROM posts
WHERE created_at < '2025-01-15 10:30:00'  -- Last seen timestamp
ORDER BY created_at DESC
LIMIT 20;
```

**Why It's Fast:**
- Index scan starts at cursor position
- No offset processing
- Constant performance regardless of page depth

**Implementation Pattern:**
```sql
-- Page 1
SELECT id, title, created_at FROM posts
ORDER BY created_at DESC, id DESC
LIMIT 20;

-- Page 2 (last_created_at = '2025-01-15 10:30:00', last_id = 12345)
SELECT id, title, created_at FROM posts
WHERE (created_at < '2025-01-15 10:30:00')
   OR (created_at = '2025-01-15 10:30:00' AND id < 12345)
ORDER BY created_at DESC, id DESC
LIMIT 20;
```

**Required Index:**
```sql
CREATE INDEX idx_posts_created_id ON posts (created_at DESC, id DESC);
```

### Pattern: Cursor-Based Pagination

**API Response Format:**
```json
{
  "data": [...],
  "cursor": {
    "next": "eyJjcmVhdGVkX2F0IjoiMjAyNS0wMS0xNSIsImlkIjoxMjM0NX0=",
    "has_more": true
  }
}
```

**Cursor Encoding:**
```python
import base64
import json

# Encode cursor
cursor_data = {"created_at": "2025-01-15 10:30:00", "id": 12345}
cursor = base64.b64encode(json.dumps(cursor_data).encode()).decode()

# Decode cursor
cursor_data = json.loads(base64.b64decode(cursor).decode())
```

## Aggregation Patterns

### Pattern: Conditional Aggregation

**Use Case:** Multiple aggregations with different conditions.

**Inefficient (Multiple Subqueries):**
```sql
-- ❌ Bad: Multiple scans of orders table
SELECT
  c.id,
  c.name,
  (SELECT COUNT(*) FROM orders WHERE customer_id = c.id) AS total_orders,
  (SELECT COUNT(*) FROM orders WHERE customer_id = c.id AND status = 'completed') AS completed_orders,
  (SELECT SUM(total) FROM orders WHERE customer_id = c.id AND status = 'completed') AS revenue
FROM customers c;
```

**Efficient (Single Query with CASE):**
```sql
-- ✅ Good: Single scan with conditional aggregation
SELECT
  c.id,
  c.name,
  COUNT(o.id) AS total_orders,
  COUNT(CASE WHEN o.status = 'completed' THEN 1 END) AS completed_orders,
  SUM(CASE WHEN o.status = 'completed' THEN o.total ELSE 0 END) AS revenue
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
```

**Performance Comparison:**
```
Multiple subqueries: 3 table scans, ~300ms
Conditional aggregation: 1 table scan, ~100ms
```

### Pattern: Filtered Aggregation (PostgreSQL)

**PostgreSQL FILTER Clause:**
```sql
-- ✅ PostgreSQL-specific: FILTER clause (more readable)
SELECT
  c.id,
  c.name,
  COUNT(o.id) AS total_orders,
  COUNT(o.id) FILTER (WHERE o.status = 'completed') AS completed_orders,
  SUM(o.total) FILTER (WHERE o.status = 'completed') AS revenue
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
```

## Union Operations

### Pattern: UNION ALL vs UNION

**Inefficient:**
```sql
-- ❌ Bad: UNION removes duplicates (expensive)
SELECT id, name FROM active_users
UNION
SELECT id, name FROM trial_users;
```

**Why UNION is Expensive:**
- Sorts both result sets
- Performs deduplication
- Additional memory and CPU overhead

**Efficient:**
```sql
-- ✅ Good: UNION ALL (no deduplication)
SELECT id, name FROM active_users
UNION ALL
SELECT id, name FROM trial_users;
```

**When to Use Each:**
- **UNION ALL**: When duplicates acceptable or datasets don't overlap (99% of cases)
- **UNION**: Only when duplicates must be removed and datasets may overlap

**Performance Comparison:**
```
UNION with 100k rows: ~500ms (sorting + dedup)
UNION ALL with 100k rows: ~50ms (no overhead)
```

### Pattern: Efficient Set Operations

**Use Case:** Combine results from partitioned tables.

```sql
-- ✅ Good: UNION ALL for partitioned data
SELECT * FROM orders_2024 WHERE status = 'pending'
UNION ALL
SELECT * FROM orders_2025 WHERE status = 'pending';
```

## Window Functions

### Pattern: Row Numbering for Deduplication

**Use Case:** Get first/last record per group.

**Inefficient (Correlated Subquery):**
```sql
-- ❌ Bad: Correlated subquery
SELECT *
FROM products p
WHERE p.created_at = (
  SELECT MAX(created_at)
  FROM products
  WHERE category_id = p.category_id
);
```

**Efficient (Window Function):**
```sql
-- ✅ Good: Window function with CTE
WITH ranked_products AS (
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY created_at DESC) AS rn
  FROM products
)
SELECT * FROM ranked_products WHERE rn = 1;
```

**Performance Comparison:**
```
Correlated subquery: ~2000ms (N subqueries)
Window function: ~200ms (single scan)
```

### Pattern: Running Totals

**Use Case:** Calculate cumulative sum.

**Inefficient (Correlated Subquery):**
```sql
-- ❌ Bad: Subquery per row
SELECT
  order_date,
  total,
  (SELECT SUM(total)
   FROM orders o2
   WHERE o2.order_date <= orders.order_date) AS running_total
FROM orders;
```

**Efficient (Window Function):**
```sql
-- ✅ Good: Window function
SELECT
  order_date,
  total,
  SUM(total) OVER (ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM orders;
```

### Pattern: Rank Within Group

**Use Case:** Find top N items per category.

```sql
-- ✅ Efficient: Top 3 products per category by sales
WITH ranked_products AS (
  SELECT
    category_id,
    product_name,
    sales,
    RANK() OVER (PARTITION BY category_id ORDER BY sales DESC) AS rank
  FROM products
)
SELECT * FROM ranked_products WHERE rank <= 3;
```

## Batch Operations

### Pattern: Bulk INSERT

**Inefficient:**
```sql
-- ❌ Bad: Individual INSERTs (1000 round trips)
INSERT INTO users (name, email) VALUES ('User1', '[email protected]');
INSERT INTO users (name, email) VALUES ('User2', '[email protected]');
-- ... 998 more
```

**Efficient:**
```sql
-- ✅ Good: Bulk INSERT (1 round trip)
INSERT INTO users (name, email) VALUES
  ('User1', '[email protected]'),
  ('User2', '[email protected]'),
  ('User3', '[email protected]'),
  -- ... up to 1000 rows
  ('User1000', '[email protected]');
```

**Performance Comparison:**
```
Individual INSERTs: ~5000ms
Bulk INSERT: ~50ms
```

**Batch Size Recommendations:**
- **PostgreSQL**: 1000-5000 rows per INSERT
- **MySQL**: 1000 rows per INSERT (max_allowed_packet limit)
- **SQL Server**: 1000 rows per INSERT

### Pattern: Bulk UPDATE with CASE

**Use Case:** Update multiple rows with different values.

**Inefficient:**
```sql
-- ❌ Bad: Multiple UPDATE statements
UPDATE products SET price = 19.99 WHERE id = 1;
UPDATE products SET price = 29.99 WHERE id = 2;
UPDATE products SET price = 39.99 WHERE id = 3;
```

**Efficient:**
```sql
-- ✅ Good: Single UPDATE with CASE
UPDATE products
SET price = CASE id
  WHEN 1 THEN 19.99
  WHEN 2 THEN 29.99
  WHEN 3 THEN 39.99
END
WHERE id IN (1, 2, 3);
```

### Pattern: Upsert (INSERT or UPDATE)

**PostgreSQL (ON CONFLICT):**
```sql
-- ✅ Efficient upsert
INSERT INTO user_stats (user_id, login_count, last_login)
VALUES (123, 1, NOW())
ON CONFLICT (user_id)
DO UPDATE SET
  login_count = user_stats.login_count + 1,
  last_login = NOW();
```

**MySQL (ON DUPLICATE KEY UPDATE):**
```sql
-- ✅ Efficient upsert
INSERT INTO user_stats (user_id, login_count, last_login)
VALUES (123, 1, NOW())
ON DUPLICATE KEY UPDATE
  login_count = login_count + 1,
  last_login = NOW();
```

**SQL Server (MERGE):**
```sql
-- ✅ Efficient upsert
MERGE INTO user_stats AS target
USING (SELECT 123 AS user_id, 1 AS login_count, GETDATE() AS last_login) AS source
ON target.user_id = source.user_id
WHEN MATCHED THEN
  UPDATE SET login_count = target.login_count + 1, last_login = GETDATE()
WHEN NOT MATCHED THEN
  INSERT (user_id, login_count, last_login)
  VALUES (source.user_id, source.login_count, source.last_login);
```

## Common Table Expressions (CTEs)

### Pattern: Readable Complex Queries

**Inefficient (Nested Subqueries):**
```sql
-- ❌ Hard to read and maintain
SELECT *
FROM (
  SELECT *
  FROM (
    SELECT user_id, SUM(total) as revenue
    FROM orders
    WHERE status = 'completed'
    GROUP BY user_id
  ) AS user_revenue
  WHERE revenue > 1000
) AS high_value_users
INNER JOIN users ON users.id = high_value_users.user_id;
```

**Efficient (CTEs):**
```sql
-- ✅ Readable and maintainable
WITH user_revenue AS (
  SELECT user_id, SUM(total) as revenue
  FROM orders
  WHERE status = 'completed'
  GROUP BY user_id
),
high_value_users AS (
  SELECT * FROM user_revenue WHERE revenue > 1000
)
SELECT
  users.*,
  high_value_users.revenue
FROM high_value_users
INNER JOIN users ON users.id = high_value_users.user_id;
```

**Benefits:**
- Better readability
- Easier debugging (can SELECT from CTEs individually)
- Query optimizer can optimize entire query

### Pattern: Recursive CTEs

**Use Case:** Hierarchical data (org charts, nested categories).

```sql
-- ✅ Recursive CTE for org chart
WITH RECURSIVE org_chart AS (
  -- Base case: top-level managers
  SELECT id, name, manager_id, 1 AS level
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive case: employees reporting to previous level
  SELECT e.id, e.name, e.manager_id, oc.level + 1
  FROM employees e
  INNER JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart ORDER BY level, name;
```

## Index-Friendly Patterns

### Pattern: Prefix Matching

**Efficient:**
```sql
-- ✅ Can use B-tree index
SELECT * FROM users WHERE email LIKE 'john%';
```

**Index:**
```sql
CREATE INDEX idx_users_email ON users (email);
```

### Pattern: Range Queries

**Efficient:**
```sql
-- ✅ Can use B-tree index
SELECT * FROM orders
WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01';
```

**Index:**
```sql
CREATE INDEX idx_orders_created ON orders (created_at);
```

### Pattern: Composite Filters

**Efficient:**
```sql
-- ✅ Uses composite index efficiently
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'pending'
ORDER BY created_at DESC
LIMIT 10;
```

**Optimal Index:**
```sql
CREATE INDEX idx_orders_customer_status_created
ON orders (customer_id, status, created_at DESC);
```

## Quick Reference

### Existence Checks
- Use `EXISTS` instead of `COUNT(*) > 0`
- Use `NOT EXISTS` instead of `NOT IN` (handles NULLs)

### Pagination
- Use keyset/cursor pagination instead of OFFSET for deep pagination
- Index columns in ORDER BY and WHERE clauses

### Aggregation
- Use conditional aggregation (CASE in aggregate) instead of multiple subqueries
- PostgreSQL: Use FILTER clause for readability

### Unions
- Use `UNION ALL` by default (no deduplication overhead)
- Only use `UNION` when duplicates must be removed

### Window Functions
- Use window functions instead of correlated subqueries for ranking/running totals
- More efficient and more readable

### Batch Operations
- Bulk INSERT/UPDATE instead of row-by-row operations
- Use upsert operations (ON CONFLICT, ON DUPLICATE KEY, MERGE)

### CTEs
- Use CTEs for complex queries (better readability)
- PostgreSQL 12+: CTEs are inline-optimized by default

```

### references/postgresql.md

```markdown
# PostgreSQL-Specific Optimizations

PostgreSQL-specific features, index types, and optimization techniques.

## PostgreSQL Index Types

### B-tree (Default)

**Use Case:** General-purpose index for most queries.

**Supported Operators:** <, ≤, =, ≥, >, BETWEEN, IN, IS NULL, IS NOT NULL

**Creation:**
```sql
CREATE INDEX idx_users_email ON users (email);
-- Or explicitly:
CREATE INDEX idx_users_email ON users USING BTREE (email);
```

### Hash

**Use Case:** Equality comparisons only (=).

**Benefits:** Faster than B-tree for equality, smaller index size.

**Limitations:** Only supports =, no range queries.

**Creation:**
```sql
CREATE INDEX idx_users_email_hash ON users USING HASH (email);
```

**When to Use:** High-frequency equality lookups only.

### GIN (Generalized Inverted Index)

**Use Case:** Full-text search, JSONB, arrays, composite types.

**Supported Types:**
- JSONB documents
- Arrays
- Full-text search (tsvector)
- hstore

**Full-Text Search:**
```sql
-- Create GIN index on tsvector
CREATE INDEX idx_articles_content_fts
ON articles USING GIN (to_tsvector('english', content));

-- Query
SELECT * FROM articles
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'postgres & optimization');
```

**JSONB:**
```sql
-- Create GIN index on JSONB column
CREATE INDEX idx_users_metadata ON users USING GIN (metadata);

-- Query with containment
SELECT * FROM users WHERE metadata @> '{"premium": true}';

-- Query with existence
SELECT * FROM users WHERE metadata ? 'premium';
```

**Arrays:**
```sql
-- Create GIN index on array column
CREATE INDEX idx_products_tags ON products USING GIN (tags);

-- Query with array overlap
SELECT * FROM products WHERE tags && ARRAY['electronics', 'sale'];

-- Query with array contains
SELECT * FROM products WHERE tags @> ARRAY['electronics'];
```

### GiST (Generalized Search Tree)

**Use Case:** Spatial data, ranges, full-text search, geometric types.

**Geometric Types:**
```sql
-- Create GiST index on geometry column
CREATE INDEX idx_locations_point ON locations USING GIST (point);

-- Query with spatial operators
SELECT * FROM locations
WHERE point <-> POINT(40.7128, -74.0060) < 10;  -- Within 10 units
```

**Range Types:**
```sql
-- Create GiST index on date range
CREATE INDEX idx_bookings_dates ON bookings USING GIST (date_range);

-- Query with range overlap
SELECT * FROM bookings
WHERE date_range && '[2025-01-01,2025-01-31)'::daterange;
```

### BRIN (Block Range Index)

**Use Case:** Very large tables (100GB+) with naturally ordered data.

**Benefits:**
- Tiny index size (1000x smaller than B-tree)
- Very fast to build
- Minimal maintenance overhead

**Limitations:**
- Only efficient for ordered data (timestamps, sequential IDs)
- Less precise than B-tree (may scan extra blocks)

**Creation:**
```sql
-- BRIN index on timestamp column
CREATE INDEX idx_logs_created_brin ON logs USING BRIN (created_at);
```

**When to Use:**
- Tables >100GB
- Naturally ordered data (append-only logs, time-series)
- Insert-heavy workloads

**Performance:**
```
B-tree index on 1TB table: ~50GB index size
BRIN index on 1TB table: ~50MB index size
```

## PostgreSQL-Specific Features

### Partial Indexes

**Use Case:** Index only subset of rows matching condition.

**Benefits:**
- Smaller index size
- Faster index scans
- Lower maintenance overhead

**Example 1: Index Active Users Only**
```sql
CREATE INDEX idx_users_active
ON users (last_login)
WHERE status = 'active';
```

**Example 2: Index Recent Orders**
```sql
CREATE INDEX idx_orders_recent
ON orders (created_at)
WHERE created_at > NOW() - INTERVAL '90 days';
```

**When to Use:**
- Queries frequently filter on same condition
- Minority of rows match condition (<20%)

### Expression Indexes

**Use Case:** Index computed values or function results.

**Example 1: Case-Insensitive Search**
```sql
CREATE INDEX idx_users_email_lower ON users (LOWER(email));

-- Query must use same expression
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
```

**Example 2: Date Truncation**
```sql
CREATE INDEX idx_orders_created_date ON orders (DATE(created_at));

-- Query for specific date
SELECT * FROM orders WHERE DATE(created_at) = '2025-01-15';
```

**Example 3: JSONB Path**
```sql
CREATE INDEX idx_users_premium ON users ((metadata->>'premium'));

SELECT * FROM users WHERE metadata->>'premium' = 'true';
```

### Covering Indexes (INCLUDE Clause)

**Use Case:** Include non-indexed columns in index for Index-Only Scans.

**Syntax:**
```sql
CREATE INDEX idx_users_email_covering
ON users (email) INCLUDE (id, name, created_at);
```

**Query Benefits:**
```sql
-- This query uses Index-Only Scan (no heap access)
SELECT id, name, created_at
FROM users
WHERE email = '[email protected]';
```

**Performance:**
```
Without INCLUDE: Index Scan + Heap Fetch = 10ms
With INCLUDE: Index-Only Scan = 1ms
```

### Concurrent Index Creation

**Use Case:** Create indexes on production tables without blocking writes.

**Standard Index Creation (Locks Table):**
```sql
CREATE INDEX idx_users_email ON users (email);
-- Blocks all writes until complete
```

**Concurrent Index Creation (No Lock):**
```sql
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);
-- Allows concurrent writes, takes longer
```

**Trade-offs:**
- Slower to build (multiple table scans)
- No table lock (production-safe)
- Can fail mid-build (index marked INVALID)

**Cleanup Failed Index:**
```sql
-- List invalid indexes
SELECT indexname FROM pg_indexes WHERE indexname LIKE '%_ccnew%';

-- Drop invalid index
DROP INDEX CONCURRENTLY idx_users_email;
```

## PostgreSQL Execution Plan Analysis

### EXPLAIN ANALYZE with Buffers

**Basic EXPLAIN ANALYZE:**
```sql
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 123;
```

**With Buffer Statistics:**
```sql
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE customer_id = 123;
```

**Output:**
```
Index Scan using idx_orders_customer on orders
  (cost=0.42..12.44 rows=10 width=200)
  (actual time=0.025..0.035 rows=10 loops=1)
  Index Cond: (customer_id = 123)
  Buffers: shared hit=5
Planning Time: 0.150 ms
Execution Time: 0.050 ms
```

**Buffer Analysis:**
- **shared hit**: Pages read from cache (good)
- **shared read**: Pages read from disk (slow)
- **shared dirtied**: Pages modified
- **shared written**: Pages written to disk

**Goal:** Maximize "shared hit", minimize "shared read".

### EXPLAIN Options

```sql
-- Verbose output (column list, table aliases)
EXPLAIN (VERBOSE) SELECT ...;

-- Show costs without execution
EXPLAIN SELECT ...;

-- Execute and show actual timing
EXPLAIN ANALYZE SELECT ...;

-- Show buffer usage
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

-- Show detailed timing per node
EXPLAIN (ANALYZE, TIMING) SELECT ...;

-- Machine-readable JSON format
EXPLAIN (ANALYZE, FORMAT JSON) SELECT ...;
```

## PostgreSQL Query Optimization

### Parallel Query Execution

**Enable Parallel Queries:**
```sql
-- Check parallel workers setting
SHOW max_parallel_workers_per_gather;

-- Set parallel workers for session
SET max_parallel_workers_per_gather = 4;
```

**Parallel Seq Scan:**
```sql
EXPLAIN SELECT COUNT(*) FROM large_table;
```
```
Finalize Aggregate  (cost=xxx rows=1)
  -> Gather  (cost=xxx rows=4)
        Workers Planned: 4
        -> Partial Aggregate
              -> Parallel Seq Scan on large_table
```

**Force Parallel Execution:**
```sql
-- Lower threshold for parallel scans
SET parallel_setup_cost = 0;
SET parallel_tuple_cost = 0;
```

### Common Table Expressions (CTEs)

**Materialized CTEs (Default in PostgreSQL 12+):**
```sql
-- CTE is inline-optimized (not materialized)
WITH recent_orders AS (
  SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days'
)
SELECT * FROM recent_orders WHERE total > 100;
```

**Force Materialization:**
```sql
-- Explicitly materialize CTE
WITH recent_orders AS MATERIALIZED (
  SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '7 days'
)
SELECT * FROM recent_orders WHERE total > 100;
```

**When to Materialize:**
- CTE used multiple times in main query
- CTE is expensive and produces small result set

### LATERAL Joins

**Use Case:** Correlated subqueries with better performance.

**Without LATERAL (Correlated Subquery):**
```sql
SELECT
  u.name,
  (SELECT COUNT(*) FROM orders WHERE orders.user_id = u.id) AS order_count
FROM users u;
```

**With LATERAL:**
```sql
SELECT
  u.name,
  o.order_count
FROM users u
LEFT JOIN LATERAL (
  SELECT COUNT(*) AS order_count
  FROM orders
  WHERE orders.user_id = u.id
) o ON true;
```

**Advanced LATERAL (Top N per Group):**
```sql
-- Get 3 most recent orders per customer
SELECT
  c.name,
  o.order_date,
  o.total
FROM customers c
LEFT JOIN LATERAL (
  SELECT order_date, total
  FROM orders
  WHERE customer_id = c.id
  ORDER BY order_date DESC
  LIMIT 3
) o ON true;
```

## PostgreSQL Configuration Tuning

### Memory Settings

**shared_buffers:**
```sql
-- 25% of system RAM (typical recommendation)
ALTER SYSTEM SET shared_buffers = '4GB';
```

**work_mem:**
```sql
-- Memory per sort/hash operation
-- Set per session for complex queries
SET work_mem = '256MB';
```

**effective_cache_size:**
```sql
-- Estimate of OS file cache
-- 50-75% of system RAM
ALTER SYSTEM SET effective_cache_size = '12GB';
```

### Query Planning

**random_page_cost:**
```sql
-- Lower for SSDs (default 4.0)
ALTER SYSTEM SET random_page_cost = 1.1;
```

**effective_io_concurrency:**
```sql
-- Number of concurrent I/O operations
-- Higher for SSDs (default 1)
ALTER SYSTEM SET effective_io_concurrency = 200;
```

### Statistics

**default_statistics_target:**
```sql
-- Increase for better query planning (default 100)
ALTER SYSTEM SET default_statistics_target = 500;

-- Per-column statistics
ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 1000;
```

## PostgreSQL Monitoring

### Find Slow Queries

**Enable slow query logging:**
```sql
-- Log queries slower than 100ms
ALTER SYSTEM SET log_min_duration_statement = 100;
SELECT pg_reload_conf();
```

### Index Usage Statistics

**Find unused indexes:**
```sql
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexname NOT LIKE 'pg_toast_%'
ORDER BY pg_relation_size(indexrelid) DESC;
```

**Find most used indexes:**
```sql
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC
LIMIT 20;
```

### Table Statistics

**Update statistics:**
```sql
-- Analyze entire database
ANALYZE;

-- Analyze specific table
ANALYZE orders;

-- Verbose output
ANALYZE VERBOSE orders;
```

**View table statistics:**
```sql
SELECT
  schemaname,
  tablename,
  n_live_tup AS live_rows,
  n_dead_tup AS dead_rows,
  last_vacuum,
  last_autovacuum,
  last_analyze,
  last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC;
```

## Quick Reference

### Index Type Selection

| Use Case | Index Type | Creation |
|----------|-----------|----------|
| General queries | B-tree | `CREATE INDEX ON table (column)` |
| Equality only | Hash | `CREATE INDEX ON table USING HASH (column)` |
| Full-text search | GIN | `CREATE INDEX ON table USING GIN (to_tsvector('english', column))` |
| JSONB | GIN | `CREATE INDEX ON table USING GIN (jsonb_column)` |
| Arrays | GIN | `CREATE INDEX ON table USING GIN (array_column)` |
| Spatial data | GiST | `CREATE INDEX ON table USING GIST (geometry_column)` |
| Large ordered tables | BRIN | `CREATE INDEX ON table USING BRIN (timestamp_column)` |

### PostgreSQL-Specific Optimizations

- Use **partial indexes** for frequently-filtered subsets
- Use **expression indexes** for computed values
- Use **INCLUDE** clause for covering indexes
- Use **LATERAL** for efficient correlated subqueries
- Use **BRIN** for very large time-series tables
- Create indexes **CONCURRENTLY** on production tables

```

### references/mysql.md

```markdown
# MySQL-Specific Optimizations

MySQL-specific features, storage engines, and optimization techniques.

## MySQL Storage Engines

### InnoDB (Default)

**Characteristics:**
- ACID-compliant transactions
- Row-level locking
- Crash recovery
- Foreign key support
- Clustered primary key

**Use Case:** Default choice for most applications.

**Creation:**
```sql
CREATE TABLE orders (
  id INT PRIMARY KEY,
  customer_id INT,
  total DECIMAL(10,2)
) ENGINE=InnoDB;
```

**Clustered Index:**
- Table data stored in primary key order
- Fast primary key lookups
- Choose small primary key (INT vs BIGINT vs UUID)

**Recommendation:** Use AUTO_INCREMENT INT or BIGINT for primary keys.

### MyISAM

**Characteristics:**
- No transactions
- Table-level locking
- Faster reads (no MVCC overhead)
- No foreign keys
- No crash recovery

**Use Case:** Read-heavy tables with no writes (archives, logs).

**Creation:**
```sql
CREATE TABLE archive_logs (
  id INT PRIMARY KEY,
  message TEXT
) ENGINE=MyISAM;
```

**Warning:** Deprecated, avoid for new applications.

## MySQL Index Types

### B-tree Index (Default)

**Use Case:** General-purpose index.

**Creation:**
```sql
CREATE INDEX idx_users_email ON users (email);
```

**Prefix Indexes for Long Strings:**
```sql
-- Index first 10 characters
CREATE INDEX idx_articles_title ON articles (title(10));
```

**Composite Indexes:**
```sql
CREATE INDEX idx_orders_customer_status
ON orders (customer_id, status);
```

### Full-Text Index

**Use Case:** Text search on VARCHAR/TEXT columns.

**Creation:**
```sql
-- Add full-text index
CREATE FULLTEXT INDEX idx_articles_content ON articles (content);

-- Or in table definition
CREATE TABLE articles (
  id INT PRIMARY KEY,
  title VARCHAR(255),
  content TEXT,
  FULLTEXT (title, content)
) ENGINE=InnoDB;
```

**Query Syntax:**
```sql
-- Natural language search
SELECT * FROM articles
WHERE MATCH(content) AGAINST('mysql optimization');

-- Boolean mode (AND/OR/NOT)
SELECT * FROM articles
WHERE MATCH(content) AGAINST('+mysql +optimization -postgres' IN BOOLEAN MODE);

-- Query expansion (finds related terms)
SELECT * FROM articles
WHERE MATCH(content) AGAINST('database' WITH QUERY EXPANSION);
```

### Spatial Index

**Use Case:** Geometric data (points, polygons).

**Creation:**
```sql
CREATE TABLE locations (
  id INT PRIMARY KEY,
  name VARCHAR(255),
  coordinates POINT NOT NULL,
  SPATIAL INDEX(coordinates)
) ENGINE=InnoDB;
```

**Query:**
```sql
-- Find locations within bounding box
SELECT name FROM locations
WHERE MBRContains(
  ST_GeomFromText('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))'),
  coordinates
);
```

## MySQL EXPLAIN Analysis

### EXPLAIN Output Format

**Basic EXPLAIN:**
```sql
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
```

**Output:**
```
+----+-------------+--------+------+-------------------+------+---------+------+-------+-------------+
| id | select_type | table  | type | possible_keys     | key  | key_len | ref  | rows  | Extra       |
+----+-------------+--------+------+-------------------+------+---------+------+-------+-------------+
|  1 | SIMPLE      | orders | ref  | idx_orders_cust   | idx  | 4       | const| 150   | Using where |
+----+-------------+--------+------+-------------------+------+---------+------+-------+-------------+
```

### EXPLAIN FORMAT=JSON

**Detailed JSON Output:**
```sql
EXPLAIN FORMAT=JSON
SELECT * FROM orders WHERE customer_id = 123;
```

**Benefits:**
- More detailed information
- Nested structure for complex queries
- Cost estimates
- Filtering statistics

### Access Types (type column)

**Performance Ranking (Best to Worst):**

1. **system** - Single row table
2. **const** - Primary key/unique lookup with constant
3. **eq_ref** - One row per previous table row (unique index join)
4. **ref** - Non-unique index lookup
5. **range** - Index range scan (BETWEEN, >, <, IN)
6. **index** - Full index scan
7. **ALL** - Full table scan

**Target:** const, eq_ref, ref, or range

### Extra Column Meanings

**Good:**
- **Using index** - Index-only scan (covering index)
- **Using index condition** - Index condition pushdown (ICP)

**Acceptable:**
- **Using where** - WHERE clause filtering after retrieval

**Warning:**
- **Using temporary** - Temporary table created
- **Using filesort** - Sorting required (not index-based)

**Bad:**
- **Using join buffer** - Join without index (add index!)

## MySQL Index Hints

### USE INDEX

**Suggest Index:**
```sql
SELECT * FROM orders USE INDEX (idx_orders_customer)
WHERE customer_id = 123 AND created_at > '2025-01-01';
```

**When to Use:** Optimizer chooses wrong index.

### FORCE INDEX

**Force Index Usage:**
```sql
SELECT * FROM orders FORCE INDEX (idx_orders_customer)
WHERE customer_id = 123;
```

**When to Use:** Must use specific index (rare).

### IGNORE INDEX

**Prevent Index Usage:**
```sql
SELECT * FROM orders IGNORE INDEX (idx_orders_status)
WHERE customer_id = 123 AND status = 'pending';
```

**When to Use:** Force full table scan or different index.

### Optimizer Hints (MySQL 8.0+)

**JOIN Order Hints:**
```sql
SELECT /*+ JOIN_ORDER(orders, customers) */
  orders.*, customers.name
FROM orders
INNER JOIN customers ON orders.customer_id = customers.id;
```

**Index Hints:**
```sql
SELECT /*+ INDEX(orders idx_orders_customer) */
  * FROM orders WHERE customer_id = 123;
```

**Subquery Hints:**
```sql
SELECT /*+ SUBQUERY(MATERIALIZATION) */
  * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 1000);
```

## MySQL-Specific Optimizations

### Generated Columns (MySQL 5.7+)

**Use Case:** Index computed values.

**Virtual Generated Column:**
```sql
ALTER TABLE users
ADD COLUMN email_lower VARCHAR(255)
  GENERATED ALWAYS AS (LOWER(email)) VIRTUAL;

-- Index generated column
CREATE INDEX idx_users_email_lower ON users (email_lower);
```

**Stored Generated Column:**
```sql
ALTER TABLE users
ADD COLUMN full_name VARCHAR(510)
  GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name)) STORED;

CREATE INDEX idx_users_full_name ON users (full_name);
```

**Difference:**
- **VIRTUAL**: Computed on read (no storage overhead)
- **STORED**: Computed on write (faster reads, storage overhead)

### JSON Indexes

**Create Generated Column for JSON Path:**
```sql
-- Extract JSON field
ALTER TABLE users
ADD COLUMN premium_status VARCHAR(10)
  GENERATED ALWAYS AS (metadata->>'$.premium') STORED;

-- Index generated column
CREATE INDEX idx_users_premium ON users (premium_status);
```

**Query:**
```sql
SELECT * FROM users WHERE premium_status = 'true';
```

### Index Condition Pushdown (ICP)

**MySQL 5.6+ Feature:** Push WHERE conditions down to storage engine.

**Without ICP:**
```sql
-- Storage engine returns all rows matching first index column
-- MySQL server filters remaining conditions
```

**With ICP:**
```sql
-- Storage engine filters all index columns
-- Fewer rows returned to MySQL server
```

**Check if Enabled:**
```sql
SHOW VARIABLES LIKE 'optimizer_switch';
-- Look for index_condition_pushdown=on
```

**EXPLAIN Indicator:**
```
Extra: Using index condition
```

### Multi-Range Read (MRR)

**MySQL 5.6+ Feature:** Optimize range scans by sorting row IDs before fetching.

**Benefits:**
- Sequential I/O instead of random I/O
- Fewer page fetches

**Enable:**
```sql
SET optimizer_switch='mrr=on,mrr_cost_based=off';
```

## MySQL Configuration Tuning

### Buffer Pool Size (InnoDB)

**Recommendation:** 70-80% of system RAM for dedicated database server.

```sql
-- Check current setting
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';

-- Set in my.cnf / my.ini
[mysqld]
innodb_buffer_pool_size = 8G
```

### Query Cache (Deprecated)

**Warning:** Query cache removed in MySQL 8.0.

**MySQL 5.7 and Earlier:**
```sql
-- Check query cache status
SHOW VARIABLES LIKE 'query_cache%';

-- Disable query cache (recommended for modern apps)
query_cache_type = 0
query_cache_size = 0
```

**Replacement:** Application-level caching (Redis, Memcached).

### Join Buffer Size

**Used for:** Joins without indexes.

```sql
-- Check current setting
SHOW VARIABLES LIKE 'join_buffer_size';

-- Set per session
SET SESSION join_buffer_size = 8388608;  -- 8MB
```

### Sort Buffer Size

**Used for:** ORDER BY, GROUP BY operations.

```sql
-- Check current setting
SHOW VARIABLES LIKE 'sort_buffer_size';

-- Set per session
SET SESSION sort_buffer_size = 2097152;  -- 2MB
```

## MySQL Monitoring

### Slow Query Log

**Enable:**
```sql
-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.5;  -- Log queries > 500ms
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow-query.log';
```

**Analyze Slow Queries:**
```bash
# mysqldumpslow - Parse slow query log
mysqldumpslow -s t -t 10 /var/log/mysql/slow-query.log
# -s t: Sort by time
# -t 10: Top 10 queries
```

### Performance Schema

**Enable:**
```sql
-- Check if enabled
SHOW VARIABLES LIKE 'performance_schema';

-- Enable in my.cnf
[mysqld]
performance_schema = ON
```

**Query Statistics:**
```sql
-- Top 10 slowest queries
SELECT
  DIGEST_TEXT,
  COUNT_STAR,
  AVG_TIMER_WAIT / 1000000000 AS avg_ms,
  SUM_TIMER_WAIT / 1000000000 AS total_ms
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;
```

**Index Usage:**
```sql
-- Tables without primary key
SELECT
  TABLE_SCHEMA,
  TABLE_NAME
FROM information_schema.TABLES
WHERE TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema')
  AND ENGINE = 'InnoDB'
  AND TABLE_TYPE = 'BASE TABLE'
  AND TABLE_CATALOG IS NULL
  AND NOT EXISTS (
    SELECT 1 FROM information_schema.STATISTICS
    WHERE TABLE_SCHEMA = TABLES.TABLE_SCHEMA
      AND TABLE_NAME = TABLES.TABLE_NAME
      AND INDEX_NAME = 'PRIMARY'
  );
```

### Table Statistics

**Update Statistics:**
```sql
-- Analyze table
ANALYZE TABLE orders;

-- Optimize table (rebuilds, updates stats)
OPTIMIZE TABLE orders;
```

**View Statistics:**
```sql
-- Table sizes
SELECT
  TABLE_SCHEMA,
  TABLE_NAME,
  ROUND(DATA_LENGTH / 1024 / 1024, 2) AS data_mb,
  ROUND(INDEX_LENGTH / 1024 / 1024, 2) AS index_mb,
  ROUND((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024, 2) AS total_mb,
  TABLE_ROWS
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'your_database'
ORDER BY (DATA_LENGTH + INDEX_LENGTH) DESC;
```

## MySQL Best Practices

### Primary Key Selection

**Recommended:**
```sql
-- Auto-increment integer (clustered index friendly)
CREATE TABLE orders (
  id INT AUTO_INCREMENT PRIMARY KEY,
  ...
) ENGINE=InnoDB;
```

**Avoid:**
```sql
-- UUID as primary key (poor clustered index performance)
CREATE TABLE orders (
  id CHAR(36) PRIMARY KEY DEFAULT (UUID()),  -- ❌ Fragmentation
  ...
) ENGINE=InnoDB;
```

**UUID Alternative (MySQL 8.0+):**
```sql
-- Use UUID_TO_BIN with reordering for better clustering
CREATE TABLE orders (
  id BINARY(16) PRIMARY KEY DEFAULT (UUID_TO_BIN(UUID(), 1)),
  ...
) ENGINE=InnoDB;
```

### Composite Index Order

**Rule:** Equality filters → Range filters → ORDER BY columns

```sql
-- Query pattern
SELECT * FROM orders
WHERE customer_id = 123
  AND status IN ('pending', 'processing')
  AND created_at > '2025-01-01'
ORDER BY created_at DESC;

-- Optimal index
CREATE INDEX idx_orders_customer_status_created
ON orders (customer_id, status, created_at);
```

### Avoid SELECT *

**Bad:**
```sql
SELECT * FROM users WHERE id = 123;
```

**Good:**
```sql
SELECT id, name, email FROM users WHERE id = 123;
```

### Use LIMIT for Large Result Sets

**Always limit:**
```sql
-- Good: Limits results
SELECT * FROM orders ORDER BY created_at DESC LIMIT 100;

-- Bad: No limit on large table
SELECT * FROM orders ORDER BY created_at DESC;
```

## Quick Reference

### Index Type Selection

| Use Case | Index Type | Creation |
|----------|-----------|----------|
| General queries | B-tree | `CREATE INDEX ON table (column)` |
| Long strings | B-tree prefix | `CREATE INDEX ON table (column(10))` |
| Full-text search | Full-text | `CREATE FULLTEXT INDEX ON table (column)` |
| Spatial data | Spatial | `CREATE SPATIAL INDEX ON table (column)` |

### Storage Engine Selection

| Requirement | Engine | Notes |
|------------|--------|-------|
| Transactions | InnoDB | Default, recommended |
| Read-only archive | MyISAM | Deprecated, avoid |
| In-memory | MEMORY | Temporary tables only |

### Configuration Priorities

1. **innodb_buffer_pool_size**: 70-80% of RAM
2. **innodb_log_file_size**: 256MB-1GB
3. **max_connections**: Based on workload (100-200 typical)
4. **innodb_flush_log_at_trx_commit**: 2 for better performance (1 for durability)

```

### references/sqlserver.md

```markdown
# SQL Server-Specific Optimizations

SQL Server-specific features, execution plans, and optimization techniques.

## SQL Server Index Types

### Clustered Index

**Characteristics:**
- Table data physically sorted by index key
- One per table
- Primary key creates clustered index by default

**Creation:**
```sql
-- Explicit clustered index
CREATE CLUSTERED INDEX IX_Orders_OrderDate
ON Orders (OrderDate);

-- Primary key with clustered index (default)
CREATE TABLE Orders (
  OrderID INT PRIMARY KEY CLUSTERED,
  ...
);
```

**Benefits:**
- Fast range queries on index key
- Fast ORDER BY on index columns

**Considerations:**
- Choose narrow, unique, sequential key (avoid GUID)
- Use INT IDENTITY or BIGINT IDENTITY

### Non-Clustered Index

**Characteristics:**
- Separate structure from table data
- Multiple non-clustered indexes per table
- Includes pointer to clustered index key

**Creation:**
```sql
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID
ON Orders (CustomerID);
```

### Covering Index with INCLUDE

**Use Case:** Include non-indexed columns for Index-Only Scans.

**Syntax:**
```sql
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_Covering
ON Orders (CustomerID)
INCLUDE (OrderDate, TotalAmount, Status);
```

**Query Benefits:**
```sql
-- Uses covering index (no Key Lookup)
SELECT OrderDate, TotalAmount, Status
FROM Orders
WHERE CustomerID = 123;
```

### Filtered Index

**Use Case:** Index subset of rows (similar to PostgreSQL partial index).

**Creation:**
```sql
-- Index only active orders
CREATE NONCLUSTERED INDEX IX_Orders_Active
ON Orders (OrderDate)
WHERE Status = 'Active';

-- Index only recent orders
CREATE NONCLUSTERED INDEX IX_Orders_Recent
ON Orders (OrderDate)
WHERE OrderDate >= '2025-01-01';
```

**Benefits:**
- Smaller index size
- Faster maintenance
- More efficient for common filtered queries

### Columnstore Index

**Use Case:** Data warehouse, analytics queries.

**Clustered Columnstore:**
```sql
-- Convert entire table to columnstore
CREATE CLUSTERED COLUMNSTORE INDEX CCI_Sales
ON Sales;
```

**Non-Clustered Columnstore:**
```sql
-- Add columnstore index alongside rowstore
CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_Sales
ON Sales (ProductID, OrderDate, Quantity, Amount);
```

**Benefits:**
- 10x compression
- 10-100x faster aggregation queries
- Batch mode execution

**When to Use:**
- Large fact tables (millions of rows)
- Read-heavy analytical queries
- Data warehouse scenarios

## SQL Server Execution Plans

### Accessing Execution Plans

**Estimated Execution Plan (Ctrl+L):**
- No query execution
- Estimated costs and row counts
- Quick analysis

**Actual Execution Plan (Ctrl+M, then execute):**
- Query executes
- Actual row counts and timing
- More accurate for optimization

### Reading Execution Plans

**Graphical Plan Direction:** Right to left, top to bottom.

**Operation Icons:**
- **Clustered Index Scan** - Full table scan (expensive)
- **Clustered Index Seek** - Efficient lookup (good)
- **Index Seek** - Non-clustered index lookup (good)
- **Index Scan** - Full index scan (less efficient)
- **Table Scan** - Full heap scan (worst)
- **Key Lookup** - Additional lookup for non-indexed columns
- **Nested Loops** - Join algorithm
- **Hash Match** - Join or aggregation

**Arrow Thickness:** Relative row count (thicker = more rows).

**Warnings (Yellow Exclamation Mark):**
- Missing index suggestion
- Implicit type conversion
- Excessive memory grant
- Spills to tempdb

### Missing Index Suggestions

**Execution Plan Recommendation:**
```xml
<MissingIndexes>
  <MissingIndexGroup Impact="95.5">
    <MissingIndex Database="YourDB" Schema="dbo" Table="Orders">
      <ColumnGroup Usage="EQUALITY">
        <Column Name="CustomerID" />
      </ColumnGroup>
      <ColumnGroup Usage="INCLUDE">
        <Column Name="OrderDate" />
        <Column Name="TotalAmount" />
      </ColumnGroup>
    </MissingIndex>
  </MissingIndexGroup>
</MissingIndexes>
```

**Create Suggested Index:**
```sql
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID
ON Orders (CustomerID)
INCLUDE (OrderDate, TotalAmount);
```

## SQL Server Query Store

### Enable Query Store

**Database-Level Setting:**
```sql
ALTER DATABASE YourDatabase
SET QUERY_STORE = ON (
  OPERATION_MODE = READ_WRITE,
  DATA_FLUSH_INTERVAL_SECONDS = 900,
  INTERVAL_LENGTH_MINUTES = 60,
  MAX_STORAGE_SIZE_MB = 1000,
  QUERY_CAPTURE_MODE = AUTO
);
```

### Find Expensive Queries

**Top Queries by Duration:**
```sql
SELECT TOP 10
  q.query_id,
  qt.query_sql_text,
  rs.avg_duration / 1000.0 AS avg_duration_ms,
  rs.avg_logical_io_reads,
  rs.avg_physical_io_reads,
  rs.count_executions,
  rs.last_execution_time
FROM sys.query_store_query q
INNER JOIN sys.query_store_query_text qt
  ON q.query_text_id = qt.query_text_id
INNER JOIN sys.query_store_plan p
  ON q.query_id = p.query_id
INNER JOIN sys.query_store_runtime_stats rs
  ON p.plan_id = rs.plan_id
WHERE rs.last_execution_time > DATEADD(DAY, -7, GETDATE())
ORDER BY rs.avg_duration DESC;
```

**Top Queries by CPU:**
```sql
SELECT TOP 10
  qt.query_sql_text,
  rs.avg_cpu_time / 1000.0 AS avg_cpu_ms,
  rs.count_executions,
  rs.avg_duration / 1000.0 AS avg_duration_ms
FROM sys.query_store_query q
INNER JOIN sys.query_store_query_text qt
  ON q.query_text_id = qt.query_text_id
INNER JOIN sys.query_store_plan p
  ON q.query_id = p.query_id
INNER JOIN sys.query_store_runtime_stats rs
  ON p.plan_id = rs.plan_id
ORDER BY rs.avg_cpu_time DESC;
```

### Query Performance Comparison

**Compare Plans Over Time:**
```sql
-- Track query performance regression
SELECT
  q.query_id,
  qt.query_sql_text,
  p.plan_id,
  rs.avg_duration / 1000.0 AS avg_duration_ms,
  rs.first_execution_time,
  rs.last_execution_time
FROM sys.query_store_query q
INNER JOIN sys.query_store_query_text qt
  ON q.query_text_id = qt.query_text_id
INNER JOIN sys.query_store_plan p
  ON q.query_id = p.query_id
INNER JOIN sys.query_store_runtime_stats rs
  ON p.plan_id = rs.plan_id
WHERE q.query_id = 123  -- Specific query
ORDER BY rs.first_execution_time;
```

## SQL Server-Specific Optimizations

### Statistics Management

**Update Statistics:**
```sql
-- Update all statistics for table
UPDATE STATISTICS Orders;

-- Update specific index statistics
UPDATE STATISTICS Orders IX_Orders_CustomerID;

-- Update with full scan (more accurate)
UPDATE STATISTICS Orders WITH FULLSCAN;
```

**Auto-Update Statistics:**
```sql
-- Check setting
SELECT name, is_auto_update_stats_on
FROM sys.databases
WHERE name = 'YourDatabase';

-- Enable auto-update
ALTER DATABASE YourDatabase
SET AUTO_UPDATE_STATISTICS ON;
```

### Index Fragmentation

**Check Fragmentation:**
```sql
SELECT
  OBJECT_NAME(ips.object_id) AS TableName,
  i.name AS IndexName,
  ips.index_type_desc,
  ips.avg_fragmentation_in_percent,
  ips.page_count
FROM sys.dm_db_index_physical_stats(
  DB_ID(), NULL, NULL, NULL, 'LIMITED'
) ips
INNER JOIN sys.indexes i
  ON ips.object_id = i.object_id
  AND ips.index_id = i.index_id
WHERE ips.avg_fragmentation_in_percent > 10
  AND ips.page_count > 1000
ORDER BY ips.avg_fragmentation_in_percent DESC;
```

**Rebuild vs Reorganize:**
```sql
-- Rebuild index (offline, faster, more thorough)
ALTER INDEX IX_Orders_CustomerID ON Orders REBUILD;

-- Rebuild online (SQL Server Enterprise)
ALTER INDEX IX_Orders_CustomerID ON Orders REBUILD WITH (ONLINE = ON);

-- Reorganize index (online, slower, less thorough)
ALTER INDEX IX_Orders_CustomerID ON Orders REORGANIZE;
```

**Guidelines:**
- **Fragmentation <10%**: No action needed
- **Fragmentation 10-30%**: REORGANIZE
- **Fragmentation >30%**: REBUILD

### Partitioning

**Partition Function:**
```sql
-- Create partition function (monthly partitions)
CREATE PARTITION FUNCTION PF_Orders_Monthly (DATE)
AS RANGE RIGHT FOR VALUES
  ('2025-01-01', '2025-02-01', '2025-03-01', ..., '2025-12-01');
```

**Partition Scheme:**
```sql
-- Create partition scheme
CREATE PARTITION SCHEME PS_Orders_Monthly
AS PARTITION PF_Orders_Monthly
ALL TO ([PRIMARY]);
```

**Partitioned Table:**
```sql
-- Create partitioned table
CREATE TABLE Orders (
  OrderID INT IDENTITY(1,1),
  OrderDate DATE,
  CustomerID INT,
  TotalAmount DECIMAL(10,2),
  ...
) ON PS_Orders_Monthly (OrderDate);
```

**Benefits:**
- Partition elimination (scan only relevant partitions)
- Partition switching (fast archive/purge)
- Parallel query execution per partition

### Computed Columns

**Persisted Computed Column:**
```sql
ALTER TABLE Users
ADD FullName AS (FirstName + ' ' + LastName) PERSISTED;

-- Index computed column
CREATE INDEX IX_Users_FullName ON Users (FullName);
```

**Non-Persisted Computed Column:**
```sql
ALTER TABLE Users
ADD EmailDomain AS (SUBSTRING(Email, CHARINDEX('@', Email) + 1, LEN(Email)));
-- Computed on read, no storage overhead
```

## SQL Server Configuration

### Max Degree of Parallelism (MAXDOP)

**Check Current Setting:**
```sql
SELECT value_in_use
FROM sys.configurations
WHERE name = 'max degree of parallelism';
```

**Set MAXDOP:**
```sql
-- Limit to 4 cores per query
EXEC sp_configure 'max degree of parallelism', 4;
RECONFIGURE;
```

**Recommendation:**
- Small servers (<8 cores): MAXDOP = 4
- Large servers (>8 cores): MAXDOP = 8
- OLTP workloads: Lower MAXDOP (2-4)
- Analytics workloads: Higher MAXDOP (8-16)

### Cost Threshold for Parallelism

**Set Threshold:**
```sql
-- Only parallelize queries with cost > 50
EXEC sp_configure 'cost threshold for parallelism', 50;
RECONFIGURE;
```

**Default:** 5 (too low for most systems)
**Recommended:** 50-100

### Memory Configuration

**Max Server Memory:**
```sql
-- Reserve memory for OS and other apps
-- Example: 32GB server, allocate 28GB to SQL Server
EXEC sp_configure 'max server memory (MB)', 28672;
RECONFIGURE;
```

**Recommendation:**
- Leave 4GB for OS on small servers
- Leave 10-20% for OS on large servers

## SQL Server Monitoring

### DMVs for Query Performance

**Most Expensive Queries (CPU):**
```sql
SELECT TOP 10
  SUBSTRING(qt.text, (qs.statement_start_offset/2)+1,
    ((CASE qs.statement_end_offset
      WHEN -1 THEN DATALENGTH(qt.text)
      ELSE qs.statement_end_offset
    END - qs.statement_start_offset)/2)+1) AS query_text,
  qs.execution_count,
  qs.total_worker_time / 1000 AS total_cpu_ms,
  qs.total_worker_time / qs.execution_count / 1000 AS avg_cpu_ms,
  qs.last_execution_time
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
ORDER BY qs.total_worker_time DESC;
```

**Missing Indexes:**
```sql
SELECT
  mid.database_id,
  mid.object_id,
  OBJECT_NAME(mid.object_id, mid.database_id) AS table_name,
  migs.avg_user_impact,
  migs.user_seeks,
  mid.equality_columns,
  mid.inequality_columns,
  mid.included_columns
FROM sys.dm_db_missing_index_details mid
INNER JOIN sys.dm_db_missing_index_groups mig
  ON mid.index_handle = mig.index_handle
INNER JOIN sys.dm_db_missing_index_group_stats migs
  ON mig.index_group_handle = migs.group_handle
WHERE migs.avg_user_impact > 50
ORDER BY migs.avg_user_impact DESC;
```

### Blocking and Waits

**Current Blocking:**
```sql
SELECT
  blocking.session_id AS blocking_session,
  blocked.session_id AS blocked_session,
  blocked_text.text AS blocked_query,
  blocking_text.text AS blocking_query
FROM sys.dm_exec_requests blocked
INNER JOIN sys.dm_exec_requests blocking
  ON blocked.blocking_session_id = blocking.session_id
CROSS APPLY sys.dm_exec_sql_text(blocked.sql_handle) blocked_text
CROSS APPLY sys.dm_exec_sql_text(blocking.sql_handle) blocking_text;
```

## Quick Reference

### Index Types

| Index Type | Use Case | Creation |
|-----------|----------|----------|
| Clustered | Primary key, range queries | `CREATE CLUSTERED INDEX` |
| Non-Clustered | Secondary lookups | `CREATE NONCLUSTERED INDEX` |
| Covering | Index-only scans | `CREATE INDEX ... INCLUDE (...)` |
| Filtered | Partial index | `CREATE INDEX ... WHERE ...` |
| Columnstore | Analytics, DW | `CREATE COLUMNSTORE INDEX` |

### Execution Plan Operations

| Operation | Performance | Action |
|-----------|-------------|--------|
| Clustered Index Seek | Excellent | Keep |
| Index Seek | Excellent | Keep |
| Index Scan | Fair | Consider covering index |
| Clustered Index Scan | Poor | Add index or acceptable for small tables |
| Table Scan | Worst | Add index |
| Key Lookup | Moderate | Consider covering index |

### Configuration Priorities

1. **Max Server Memory**: Leave 4GB+ for OS
2. **MAXDOP**: 4-8 for most workloads
3. **Cost Threshold for Parallelism**: 50-100
4. **Query Store**: Enable for all databases

```