SkillHub ClubShip Full StackFull StackBackendDesigner

postgresql

PostgreSQL best practices: multi-tenancy with RLS, schema design, Alembic migrations, async SQLAlchemy, and query optimization.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

Hot score

Updated

March 20, 2026

Overall rating

C1.4

Composite score

1.4

Best-practice grade

C62.8

Install command

npx @skill-hub/cli install itechmeat-llm-code-postgresql

postgresqldatabasemulti-tenancysqlalchemyrls

Repository

itechmeat/llm-code

Skill path: skills/postgresql

PostgreSQL best practices: multi-tenancy with RLS, schema design, Alembic migrations, async SQLAlchemy, and query optimization.

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack, Backend, Designer.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: itechmeat.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install postgresql into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/itechmeat/llm-code before adding postgresql to shared team environments
Use postgresql for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: postgresql
description: "PostgreSQL best practices: multi-tenancy with RLS, schema design, Alembic migrations, async SQLAlchemy, and query optimization."
version: "18.1"
release_date: "2025-11-13"
---

# PostgreSQL

## When to use

- Designing or changing multi-tenant tables with Row-Level Security (RLS)
- Debugging tenant isolation issues
- Adding/changing Alembic migrations for schema, RLS policies, or indexes
- Writing tests that validate RLS isolation
- Configuring PostgreSQL authentication, replication, or tuning

## RLS Multi-tenancy Pattern

### Non-negotiables

- **RLS context is mandatory** for any tenant-scoped query
- **Context must be set inside the same transaction** as the queries
- **No fallbacks** for tenant ID (fail fast if missing)
- **Async-only** DB access when using async frameworks

### Setting RLS Context

RLS works only if the current transaction has the context set:

```sql
SET LOCAL app.current_tenant_id = '<tenant_uuid>';
```

Must run before the first tenant-scoped query in that transaction.

### Common Failure Modes

- Setting `SET LOCAL ...` after the first `select()`
- Setting the context in one session, then querying in another
- Running queries outside the expected transaction scope

### Typical RLS Policy

```sql
ALTER TABLE some_table ENABLE ROW LEVEL SECURITY;

CREATE POLICY some_table_tenant_isolation
ON some_table
USING (tenant_id = current_setting('app.current_tenant_id', true)::uuid);
```

## Multi-tenant Table Checklist

- Tenant ID column is **UUID**
- FK to tenants table with `ON DELETE CASCADE`
- Indexes aligned with access patterns (usually tenant_id first)
  - PostgreSQL does **not** auto-index FK columns — add explicit indexes
  - UNIQUE allows multiple NULLs unless using `NULLS NOT DISTINCT` (PG15+)
- RLS is enabled and policies exist
- Application code sets RLS context at transaction start

## Alembic Migrations Checklist

1. Add/modify schema (columns, constraints, FKs)
2. Create/update indexes
3. Enable RLS and create/adjust policies
4. Add verification (tests) for isolation
5. Provide a real downgrade (no stubs)

## RLS Isolation Testing Recipe

Goal:

- Data for tenant A is visible to tenant A
- Data for tenant A is NOT visible to tenant B

Canonical flow:

1. Setup data through an **admin session** (RLS bypass) for tenant A and B
2. Assert via an **RLS session**:
   - set context to tenant A → sees only tenant A data
   - set context to tenant B → does not see tenant A data

## Destructive Operations Safety

Hard rules:

- Never run `DELETE` without a narrow `WHERE` targeting specific data
- Never run `TRUNCATE`/`DROP` without explicit confirmation

Pre-flight before destructive actions:

1. Confirm exact target (tables / IDs / date range)
2. Run a `SELECT`/row count first and show results
3. Ask for final confirmation, then execute

## References

### Schema & Design

- [table-design.md](references/table-design.md) — Data types, constraints, indexing, partitioning, JSONB, safe schema evolution
- [charset-encoding.md](references/charset-encoding.md) — Character sets, encoding, collation, ICU, locale settings

### Authentication

- [authentication.md](references/authentication.md) — pg_hba.conf, SCRAM-SHA-256, md5, peer, cert, LDAP, GSSAPI
- [authentication-oauth.md](references/authentication-oauth.md) — OAuth 2.0 (PostgreSQL 18+), SASL OAUTHBEARER, validators
- [user-management.md](references/user-management.md) — CREATE/ALTER/DROP ROLE, membership, GRANT/REVOKE, predefined roles

### Runtime Configuration

- [connection-settings.md](references/connection-settings.md) — listen_addresses, max_connections, SSL, TCP keepalives
- [query-tuning.md](references/query-tuning.md) — Planner settings, work_mem, parallel query, cost constants
- [replication.md](references/replication.md) — Streaming replication, WAL, synchronous commit, logical replication
- [vacuum.md](references/vacuum.md) — Autovacuum, vacuum cost model, freeze ages, per-table tuning
- [error-handling.md](references/error-handling.md) — exit_on_error, restart_after_crash, data_sync_retry

### Internals

- [internals.md](references/internals.md) — Query processing pipeline, parser/rewriter/planner/executor, system catalogs, wire protocol, access methods
- [protocol.md](references/protocol.md) — Wire protocol v3.2: message format, startup, auth, query, COPY, replication

## See also

- [sql-expert](../sql-expert/SKILL.md) — Query patterns, EXPLAIN workflow, optimization


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### references/table-design.md

```markdown
# PostgreSQL Table Design

## Scope

Use this reference when designing or refactoring PostgreSQL tables: data types, constraints, indexes, partitioning, JSONB.

For RLS/multi-tenancy rules, see the main [SKILL.md](../SKILL.md).

## Core Rules

- Prefer `snake_case` identifiers. Avoid quoted/mixed-case.
- Define keys deliberately:
  - Multi-tenant: `bot_id` MUST be `UUID`
  - Surrogate keys: `BIGINT GENERATED ALWAYS AS IDENTITY`
- Normalize first (3NF). Denormalize only after measuring read bottlenecks.
- Use `NOT NULL` whenever domain requires it; use `DEFAULT` for common values.
- Create indexes for access paths you actually query.

## PostgreSQL Gotchas

- **FK indexes are not automatic**: PostgreSQL does not auto-index foreign keys. Add explicit indexes.
- **UNIQUE + NULL**: UNIQUE allows multiple NULLs. Use `UNIQUE (...) NULLS NOT DISTINCT` (PG15+) for single NULL.
- **Identity gaps are normal**: rollbacks and concurrency create gaps; don't try to "fix" them.
- **MVCC bloat**: updates/deletes leave dead tuples; vacuum handles them. Avoid wide hot-row churn.

## Data Types (Recommended Defaults)

| Use Case | Type | Notes |
|----------|------|-------|
| Time | `TIMESTAMPTZ` | Avoid `TIMESTAMP` without timezone |
| Money | `NUMERIC(p,s)` | Never float; avoid `money` |
| Strings | `TEXT` | Use `CHECK (length(col) <= n)` if max needed |
| External IDs | `UUID` | Bot-scoped / externally-visible |
| Internal PKs | `BIGINT GENERATED ALWAYS AS IDENTITY` | |
| JSON | `JSONB` | Prefer over `JSON` |

### Avoid

| Type | Why |
|------|-----|
| `serial` | Use identity |
| `timestamp` | Without timezone is ambiguous |
| `char(n)` / `varchar(n)` | Use `text` + `check` |
| `money` | Locale-dependent, imprecise |

## Constraints

### Primary Key

Implies `UNIQUE` + `NOT NULL`, creates B-tree index automatically.

### Foreign Key

Always specify `ON DELETE` / `ON UPDATE`:

```sql
REFERENCES parent(id) ON DELETE CASCADE
REFERENCES parent(id) ON DELETE SET NULL
REFERENCES parent(id) ON DELETE RESTRICT
```

**Add index on FK column** — PostgreSQL doesn't do this automatically.

### CHECK

NULL passes checks; combine with `NOT NULL` when needed:

```sql
status TEXT NOT NULL CHECK (status IN ('active', 'inactive'))
```

### UNIQUE

```sql
-- Standard (multiple NULLs allowed)
UNIQUE (email)

-- Single NULL only (PG15+)
UNIQUE NULLS NOT DISTINCT (email)
```

## Indexing Patterns

| Type | Use Case |
|------|----------|
| B-tree | Equality and range queries (default) |
| GIN | JSONB, arrays, full-text search |
| GiST | Ranges, exclusion constraints |
| BRIN | Huge, naturally ordered tables |

### Composite Indexes

Order matters; put most selective columns first:

```sql
CREATE INDEX ON orders (customer_id, created_at);
-- Supports: WHERE customer_id = ? AND created_at > ?
-- Supports: WHERE customer_id = ?
-- Does NOT support: WHERE created_at > ?
```

### Covering Indexes (INCLUDE)

Enable index-only scans:

```sql
CREATE INDEX ON orders (customer_id) INCLUDE (total, status);
```

### Partial Indexes

Hot subsets:

```sql
CREATE INDEX ON orders (created_at) WHERE status = 'pending';
```

### Expression Indexes

```sql
CREATE INDEX ON users (LOWER(email));
```

### Multi-tenant Indexing

For tenant-scoped queries, indexes typically start with `bot_id`:

```sql
CREATE INDEX ON messages (bot_id, created_at);
CREATE INDEX ON messages (bot_id, user_id);
```

## JSONB Guidance

GIN index for containment queries:

```sql
CREATE INDEX ON tbl USING GIN (attrs);

-- Query
SELECT * FROM tbl WHERE attrs @> '{"type": "premium"}';
```

Extract frequently queried scalar:

```sql
ALTER TABLE tbl
  ADD COLUMN theme TEXT GENERATED ALWAYS AS (attrs->>'theme') STORED;

CREATE INDEX ON tbl (theme);
```

## Partitioning

Use only for very large tables with consistent partition key filtering (often time).

```sql
CREATE TABLE events (
  id BIGINT GENERATED ALWAYS AS IDENTITY,
  created_at TIMESTAMPTZ NOT NULL,
  data JSONB
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2024_01 PARTITION OF events
  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
```

**Note**: Unique constraints must include partition key.

## Safe Schema Evolution

### Adding Columns

```sql
-- 1. Add nullable
ALTER TABLE tbl ADD COLUMN new_col TEXT;

-- 2. Backfill in batches
UPDATE tbl SET new_col = 'default' WHERE id BETWEEN 1 AND 10000;
-- Repeat...

-- 3. Add NOT NULL
ALTER TABLE tbl ALTER COLUMN new_col SET NOT NULL;
```

### Adding Indexes

```sql
-- Non-blocking (cannot run in transaction)
CREATE INDEX CONCURRENTLY ON tbl (col);
```

### Avoid on Large Tables

Volatile defaults that force table rewrites:

```sql
-- BAD: rewrites entire table
ALTER TABLE tbl ADD COLUMN created_at TIMESTAMPTZ DEFAULT now();

-- GOOD: add nullable, backfill, then add default
ALTER TABLE tbl ADD COLUMN created_at TIMESTAMPTZ;
UPDATE tbl SET created_at = now() WHERE created_at IS NULL;
ALTER TABLE tbl ALTER COLUMN created_at SET DEFAULT now();
ALTER TABLE tbl ALTER COLUMN created_at SET NOT NULL;
```

## Example: Tenant-Scoped Table

```sql
CREATE TABLE messages (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  bot_id UUID NOT NULL REFERENCES bots(id) ON DELETE CASCADE,
  user_id BIGINT NOT NULL,
  content TEXT NOT NULL,
  metadata JSONB,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- FK index (manual)
CREATE INDEX ON messages (bot_id);

-- Access pattern indexes
CREATE INDEX ON messages (bot_id, user_id);
CREATE INDEX ON messages (bot_id, created_at);

-- Optional: JSONB if queried
CREATE INDEX ON messages USING GIN (metadata);
```

## See Also

- [Main PostgreSQL SKILL](../SKILL.md) — RLS, multi-tenancy, Alembic
- [authentication.md](authentication.md) — Auth methods
- [sql-expert skill](../../sql-expert/SKILL.md) — Query patterns, EXPLAIN

```

### references/charset-encoding.md

```markdown
# PostgreSQL Character Sets, Encoding & Collation

Character encoding, locale settings, and collation rules for text storage and comparison.

## Core Concepts

| Concept | Description |
|---------|-------------|
| **Encoding** | How characters are stored (UTF8, LATIN1, etc.) |
| **Locale** | Regional settings (LC_COLLATE, LC_CTYPE) |
| **Collation** | Rules for sorting and comparing strings |
| **Provider** | Collation implementation (libc, ICU, builtin) |

## Database Encoding

### Set at Cluster Creation

```bash
# Set default encoding for cluster
initdb -E UTF8
initdb -E EUC_JP
initdb --locale=en_US.UTF-8
```

### Set at Database Creation

```bash
# Command line
createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean
```

```sql
-- SQL
CREATE DATABASE music
    LOCALE 'sv_SE.utf8'
    TEMPLATE template0;

CREATE DATABASE music2
    LOCALE 'sv_SE.iso885915'
    ENCODING LATIN9
    TEMPLATE template0;

CREATE DATABASE korean
    ENCODING 'EUC_KR'
    LC_COLLATE='ko_KR.euckr'
    LC_CTYPE='ko_KR.euckr'
    TEMPLATE template0;
```

**Important**: Always use `TEMPLATE template0` when specifying custom encoding/locale.

### Common Encodings

| Encoding | Description | Language | Server? | Bytes/Char |
|----------|-------------|----------|---------|------------|
| `UTF8` | Unicode, 8-bit | all | Yes | 1-4 |
| `SQL_ASCII` | unspecified | any | Yes | 1 |
| `EUC_JP` | Extended UNIX Code-JP | Japanese | Yes | 1-3 |
| `EUC_CN` | Extended UNIX Code-CN | Simplified Chinese | Yes | 1-3 |
| `EUC_KR` | Extended UNIX Code-KR | Korean | Yes | 1-3 |
| `LATIN1` | ISO 8859-1 | Western European | Yes | 1 |
| `LATIN2` | ISO 8859-2 | Central European | Yes | 1 |
| `LATIN9` | ISO 8859-15 | Western European | Yes | 1 |
| `WIN1251` | Windows CP1251 | Cyrillic | Yes | 1 |
| `WIN1252` | Windows CP1252 | Western European | Yes | 1 |
| `SJIS` | Shift JIS | Japanese | No (client only) | 1-2 |
| `BIG5` | Big Five | Traditional Chinese | No (client only) | 1-2 |
| `GBK` | Extended National Standard | Simplified Chinese | No (client only) | 1-2 |

**Warning**: `SQL_ASCII` performs no encoding validation. Data with invalid encoding may be stored, causing issues later. Avoid for new databases.

### Available Client Conversions

| Server Encoding | Available Client Encodings |
|-----------------|---------------------------|
| `UTF8` | All supported encodings |
| `LATIN1` | LATIN1, MULE_INTERNAL, UTF8 |
| `LATIN2` | LATIN2, MULE_INTERNAL, UTF8, WIN1250 |
| `EUC_JP` | EUC_JP, MULE_INTERNAL, SJIS, UTF8 |
| `EUC_KR` | EUC_KR, MULE_INTERNAL, UTF8 |
| `WIN1251` | WIN1251, ISO_8859_5, KOI8R, MULE_INTERNAL, UTF8, WIN866 |

### Query Encoding

```sql
SHOW server_encoding;
SHOW client_encoding;
```

## Client Encoding

### Set Client Encoding

```sql
SET CLIENT_ENCODING TO 'UTF8';
SET NAMES 'UTF8';  -- Standard SQL equivalent

SHOW client_encoding;
RESET client_encoding;
```

```bash
# psql
\encoding UTF8
\encoding  -- Show current
```

```c
// libpq
int PQsetClientEncoding(PGconn *conn, const char *encoding);
int enc = PQclientEncoding(conn);
```

### Encoding Conversion

```sql
-- Convert bytea between encodings
SELECT convert('\x746578745f696e5f75746638', 'UTF8', 'LATIN1');

-- Decode bytea to text
SELECT convert_from('\x746578745f696e5f75746638', 'UTF8');

-- Encode text to bytea
SELECT convert_to('some_text', 'UTF8');
```

### Encoding Functions

```sql
-- Encoding name to internal ID
SELECT pg_char_to_encoding('UTF8');  -- Returns 6

-- Internal ID to name
SELECT pg_encoding_to_char(6);  -- Returns 'UTF8'

-- Current client encoding
SELECT pg_client_encoding();
```

## Locale Settings

### Cluster-Level (initdb)

```bash
initdb --locale=en_US.UTF-8
initdb --locale-provider=icu --icu-locale=en

# Mix locale categories
initdb --locale=fr_CA --lc-monetary=en_US
```

### Database-Level

```sql
CREATE DATABASE mydb
    LOCALE 'en_US.UTF-8'
    LOCALE_PROVIDER = icu
    ICU_LOCALE = 'en-US'
    TEMPLATE template0;
```

### Locale Categories

| Category | Purpose | Fixed at DB creation? |
|----------|---------|----------------------|
| `LC_COLLATE` | String sort order | **Yes** |
| `LC_CTYPE` | Character classification | **Yes** |
| `LC_MESSAGES` | Message language | No (runtime) |
| `LC_MONETARY` | Currency formatting | No (runtime) |
| `LC_NUMERIC` | Number formatting | No (runtime) |
| `LC_TIME` | Date/time formatting | No (runtime) |

**Important**: `LC_COLLATE` and `LC_CTYPE` affect index sort order and **cannot be changed** after database creation.

### Locale Behavior Impact

Locale settings affect:
- Sort order in `ORDER BY` on text
- `upper()`, `lower()`, `initcap()` functions  
- Pattern matching (`LIKE`, `SIMILAR TO`, regex)
- `to_char()` family functions
- Index usage with `LIKE` clauses

**Performance**: Using locales other than `C` or `POSIX` slows character handling and may prevent index usage with `LIKE`. Use locales only when needed.

### Special Locales

| Locale | Behavior |
|--------|----------|
| `C` | Byte-order sorting, ASCII-only character classes |
| `POSIX` | Same as `C` |
| `C.UTF-8` | UTF-8 aware, Unicode character classes (builtin provider) |

## Collation Providers

### Comparison

| Provider | Source | Advantages | Limitations |
|----------|--------|------------|-------------|
| `builtin` | PostgreSQL | No dependencies, consistent | Only C, C.UTF-8, PG_UNICODE_FAST |
| `icu` | ICU library | Consistent across platforms, feature-rich | Requires ICU library |
| `libc` | OS locales | No extra dependencies | Platform-dependent behavior |

### builtin Provider

Available only for specific locales:

| Locale | Database Encoding | Behavior |
|--------|-------------------|----------|
| `C` | Any | Byte-order sorting, basic ASCII |
| `C.UTF-8` | UTF8 only | Unicode-aware, "POSIX Compatible" semantics |
| `PG_UNICODE_FAST` | UTF8 only | Unicode-aware, "Standard" semantics, full case mapping |

```sql
-- Using builtin provider
CREATE DATABASE mydb
    LOCALE_PROVIDER = builtin
    BUILTIN_LOCALE = 'C.UTF-8'
    TEMPLATE template0;
```

### icu Provider

ICU provides consistent behavior across platforms. Results may vary between ICU library versions.

```sql
-- ICU locale with database
CREATE DATABASE mydb
    LOCALE_PROVIDER = icu
    ICU_LOCALE = 'en-US'
    TEMPLATE template0;
```

### libc Provider

Uses operating system locales. **Same locale name may behave differently on different platforms.**

```bash
# List available libc locales
locale -a
```

### Creating Collations

#### libc Collations

```sql
CREATE COLLATION german (provider = libc, locale = 'de_DE');
CREATE COLLATION french (locale = 'fr_FR.utf8');  -- libc default
```

#### ICU Collations

```sql
CREATE COLLATION german (provider = icu, locale = 'de-DE');
CREATE COLLATION mycollation1 (provider = icu, locale = 'ja-JP');
CREATE COLLATION mycollation2 (provider = icu, locale = 'fr');
```

ICU accepts BCP 47 language tags (e.g., `de-DE`) or libc-style names (converted automatically):

```sql
CREATE COLLATION mycollation4 (provider = icu, locale = 'de_DE.utf8');
-- NOTICE: using standard form "de-DE" for locale "de_DE.utf8"
```

### Copy Existing Collations

```sql
CREATE COLLATION german FROM "de_DE";
CREATE COLLATION french FROM "fr-x-icu";
```

## ICU Collation Features

### Case Sensitivity

```sql
-- Case-insensitive (level 2 = ignore case)
CREATE COLLATION case_insensitive (
    provider = icu,
    deterministic = false,
    locale = 'und-u-ks-level2'
);
SELECT 'aB' = 'Ab' COLLATE case_insensitive;  -- true
```

### Accent Insensitivity

```sql
-- Ignore accents and case (level 1)
CREATE COLLATION ignore_accent_case (
    provider = icu,
    deterministic = false,
    locale = 'und-u-ks-level1'
);
SELECT 'Å' = 'A' COLLATE ignore_accent_case;  -- true
```

### Numeric Sorting

```sql
-- Treat digits numerically
CREATE COLLATION num_ignore_punct (
    provider = icu,
    deterministic = false,
    locale = 'und-u-ka-shifted-kn'
);
SELECT 'id-45' < 'id-123' COLLATE num_ignore_punct;  -- true
```

### Phone Book Order

```sql
CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');
```

### Case Order

```sql
-- Uppercase first
CREATE COLLATION upper_first (provider = icu, locale = 'und-u-kf-upper');
SELECT 'B' < 'b' COLLATE upper_first;  -- true
```

### Custom Tailoring Rules

```sql
CREATE COLLATION custom (provider = icu, locale = 'und', rules = '&V << w <<< W');
```

### ICU Collation Keys

| Key | Description | Example |
|-----|-------------|---------|
| `co` | Collation type | `de-u-co-phonebk` |
| `ks` | Comparison strength | `und-u-ks-level2` |
| `kn` | Numeric ordering | `en-u-kn` |
| `kf` | Case first | `und-u-kf-upper` |
| `ka` | Alternate handling | `und-u-ka-shifted` |
| `kr` | Reorder scripts | `en-u-kr-grek-latn` |

### Strength Levels

| Level | Ignores | Example |
|-------|---------|---------|
| `level1` (primary) | Case, accents | `e = E = é` |
| `level2` (secondary) | Case | `e = E ≠ é` |
| `level3` (tertiary) | Nothing | `e ≠ E ≠ é` |
| `level4` (quaternary) | Punctuation | Fine distinctions |
| `identic` | Nothing | Byte-level |

```sql
CREATE COLLATION level3 (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-level3');
CREATE COLLATION level4 (provider = icu, deterministic = false, locale = 'und-u-ka-shifted-ks-level4');

SELECT 'x-y' = 'x_y' COLLATE level3;  -- true (punctuation ignored)
SELECT 'x-y' = 'x_y' COLLATE level4;  -- false
```

## Nondeterministic Collations

Required for case/accent insensitive comparisons:

```sql
CREATE COLLATION ndcoll (provider = icu, locale = 'und', deterministic = false);
```

### Deterministic vs Nondeterministic

| Type | Comparison | Use Case |
|------|------------|----------|
| **Deterministic** | Byte-by-byte, 'x' ≠ 'X' | Default, indexes work normally |
| **Nondeterministic** | Semantic, 'x' = 'X' possible | Case/accent insensitive |

### Limitations of Nondeterministic Collations

- ❌ `LIKE` / `SIMILAR TO` pattern matching not supported
- ❌ Hash indexes may not work correctly
- ❌ Hash joins may produce incorrect results
- ❌ `DISTINCT` may behave unexpectedly
- ⚠️ B-tree indexes work but with semantic equality

```sql
-- This will ERROR with nondeterministic collation
SELECT * FROM t WHERE name LIKE 'foo%' COLLATE case_insensitive;
-- ERROR: nondeterministic collations are not supported for LIKE
```

## Using Collations

### In Column Definitions

```sql
CREATE TABLE test1 (
    a text COLLATE "de_DE",
    b text COLLATE "es_ES"
);
```

### In Queries

```sql
SELECT * FROM test1 ORDER BY a COLLATE "C";
SELECT a < 'foo' COLLATE "de_DE" FROM test1;
```

### In Indexes

```sql
-- Default collation index
CREATE INDEX test1c_content_index ON test1c (content);

-- Specific collation index
CREATE INDEX test1c_content_y_index ON test1c (content COLLATE "y");
```

### Conflicting Collations

```sql
-- ERROR: conflicting collations
SELECT a < b FROM test1;  -- a has de_DE, b has es_ES

-- Fix: explicit collation
SELECT a COLLATE "de_DE" < b COLLATE "de_DE" FROM test1;
```

## Managing Collations

### List Collations

```sql
SELECT * FROM pg_collation;

-- psql
\dO
```

### Alter Collation

```sql
ALTER COLLATION german RENAME TO german_de;
ALTER COLLATION german OWNER TO admin;
ALTER COLLATION german SET SCHEMA myschema;

-- Refresh after OS upgrade
ALTER COLLATION "de_DE" REFRESH VERSION;
```

### Drop Collation

```sql
DROP COLLATION german;
DROP COLLATION IF EXISTS german CASCADE;
```

### Collation Version Warnings

After OS/ICU upgrades, collation versions may mismatch:

```sql
WARNING: collation "xx-x-icu" has version mismatch
DETAIL: The collation in the database was created using version 1.2.3.4, 
        but the operating system provides version 2.3.4.5.
```

Fix by rebuilding indexes and refreshing:

```sql
REINDEX DATABASE mydb;
ALTER DATABASE mydb REFRESH COLLATION VERSION;
```

## Character Encoding Conversion

### Create Conversion

```sql
CREATE CONVERSION myconv FOR 'UTF8' TO 'LATIN1' FROM myfunc;
CREATE DEFAULT CONVERSION myconv FOR 'UTF8' TO 'LATIN1' FROM myfunc;
```

### Drop Conversion

```sql
DROP CONVERSION myconv;
```

## Unicode Functions

### Normalize Unicode

```sql
-- Normalize to NFC form
SELECT normalize(U&'\0061\0308bc', NFC);  -- ä + bc

-- Check normalization
SELECT U&'\0061\0308bc' IS NFD NORMALIZED;  -- true
```

### Unicode Escapes

```sql
-- Evaluate Unicode escapes
SELECT unistr('d\0061t\+000061');  -- 'data'
SELECT unistr('d\u0061t\U00000061');  -- 'data'
```

### ASCII Conversion

```sql
SELECT to_ascii('Karél');  -- 'Karel' (removes accents)
```

## Text Search Locale Integration

### Create Language-Specific Config

```sql
CREATE TEXT SEARCH CONFIGURATION fr (COPY = french);

-- Add unaccent for accent-insensitive search
ALTER TEXT SEARCH CONFIGURATION fr
    ALTER MAPPING FOR hword, hword_part, word
    WITH unaccent, french_stem;

SELECT to_tsvector('fr', 'Hôtels de la Mer');
-- 'hotel':1 'mer':4
```

### Dictionary for Specific Language

```sql
CREATE TEXT SEARCH DICTIONARY my_russian (
    template = snowball,
    language = russian,
    stopwords = myrussian
);
```

## Quick Reference

### Locale Selection Scope

From broadest to narrowest (each overrides the previous):

1. **OS environment** → defaults for `initdb`
2. **initdb options** → cluster-wide defaults
3. **CREATE DATABASE** → per-database settings
4. **Column COLLATE** → per-column collation
5. **Query COLLATE** → per-expression collation

### Encoding Best Practices

- Use UTF8 for new databases (universal support)
- Match client and server encoding when possible
- Use `TEMPLATE template0` for custom encoding
- Avoid SQL_ASCII (no validation)
- For maximum performance with ASCII data, use `C` locale

### Collation Best Practices

- Use ICU for cross-platform consistency
- Use `builtin` provider for minimal dependencies
- Create explicit collations rather than relying on OS locales
- Use deterministic collations for indexed columns with LIKE
- Test collation behavior after OS/ICU upgrades
- Avoid mixing stripped and non-stripped locale names

### Common Issues

| Issue | Cause | Solution |
|-------|-------|----------|
| Encoding mismatch | Client/server encoding differ | Set client_encoding |
| Collation not found | OS locale not installed | Install locale or use ICU |
| Version mismatch | OS/ICU upgraded | REINDEX + REFRESH VERSION |
| Conflicting collations | Mixed column collations | Explicit COLLATE clause |

## See Also

- [PostgreSQL Character Set Support](https://www.postgresql.org/docs/current/multibyte.html)
- [PostgreSQL Collation Support](https://www.postgresql.org/docs/current/collation.html)
- [PostgreSQL Locale Support](https://www.postgresql.org/docs/current/locale.html)
- [ICU Collation Specification](https://unicode.org/reports/tr35/tr35-collation.html)

```

### references/authentication.md

```markdown
# PostgreSQL Authentication

## Overview

PostgreSQL authentication is controlled via `pg_hba.conf` (Host-Based Authentication). Rules are evaluated top-to-bottom; first match wins.

## Authentication Methods Summary

| Method | Security | Use Case |
|--------|----------|----------|
| `trust` | None | Local dev only |
| `reject` | — | Explicit deny |
| `peer` | OS-level | Local Unix sockets |
| `ident` | OS-level | Remote (RFC 1413) |
| `scram-sha-256` | Strong | Remote (recommended) |
| `md5` | Weak | Legacy |
| `password` | Very weak | Never use |
| `cert` | Strong | mTLS |
| `ldap` | Delegated | Enterprise |
| `radius` | Delegated | Enterprise |
| `gssapi` | Delegated | Kerberos/AD |
| `sspi` | Delegated | Windows AD |
| `pam` | Delegated | PAM modules |
| `oauth` | Delegated | Cloud-native (PG18+) |

## pg_hba.conf Format

### Record Structure

```text
TYPE  DATABASE  USER  ADDRESS  METHOD  [OPTIONS]
```

### Connection Types

| Type | Description |
|------|-------------|
| `local` | Unix socket connections |
| `host` | TCP/IP (SSL or plain) |
| `hostssl` | TCP/IP with SSL required |
| `hostnossl` | TCP/IP without SSL |
| `hostgssenc` | TCP/IP with GSSAPI encryption |
| `hostnogssenc` | TCP/IP without GSSAPI encryption |

### Database Field

| Value | Meaning |
|-------|---------|
| `all` | All databases |
| `sameuser` | Database matching role name |
| `samerole` | User is member of role with DB name |
| `replication` | Replication connections |
| `dbname` | Specific database |
| `db1,db2` | Multiple databases |
| `@file.txt` | Read from file |

### User Field

| Value | Meaning |
|-------|---------|
| `all` | All users |
| `username` | Specific user |
| `+groupname` | Members of group |
| `user1,user2` | Multiple users |
| `@file.txt` | Read from file |

### Address Field

| Format | Example |
|--------|---------|
| CIDR | `192.168.1.0/24` |
| IP/mask | `192.168.1.0 255.255.255.0` |
| Hostname | `host.example.com` |
| Domain suffix | `.example.com` |
| `all` | Any address |
| `samehost` | Server's own IPs |
| `samenet` | Server's subnets |

### Include Directives

```text
include auth.conf
include_if_exists optional.conf
include_dir conf.d
```

### Example pg_hba.conf

```text
# TYPE   DATABASE  USER       ADDRESS          METHOD

# Local connections
local    all       postgres                    peer
local    all       all                         peer

# IPv4 local connections
host     all       all        127.0.0.1/32     scram-sha-256

# IPv6 local connections  
host     all       all        ::1/128          scram-sha-256

# Internal network
host     all       all        10.0.0.0/8       scram-sha-256

# Remote (SSL required)
hostssl  all       all        0.0.0.0/0        scram-sha-256

# Replication
host     replication repl_user 10.0.0.0/8      scram-sha-256

# Block specific user
host     all       baduser    0.0.0.0/0        reject
```

---

## Password Authentication

### SCRAM-SHA-256 (Recommended)

Most secure password-based method:
- Salted Challenge Response
- No plaintext transmission
- Server stores only verifier (non-recoverable)
- Channel binding with SSL

**Setup:**

```sql
-- postgresql.conf
ALTER SYSTEM SET password_encryption = 'scram-sha-256';
SELECT pg_reload_conf();
```

```text
# pg_hba.conf
host all all 0.0.0.0/0 scram-sha-256
```

### MD5 (Legacy)

- Weaker, vulnerable to replay attacks
- Auto-upgrades to SCRAM if password stored as SCRAM

**Migration to SCRAM:**

```sql
-- 1. Set encryption
ALTER SYSTEM SET password_encryption = 'scram-sha-256';
SELECT pg_reload_conf();

-- 2. Reset passwords
ALTER USER myuser PASSWORD 'new_password';

-- 3. Update pg_hba.conf: md5 -> scram-sha-256
-- 4. Reload
SELECT pg_reload_conf();
```

**Check encoding:**

```sql
SELECT rolname, 
       CASE WHEN rolpassword LIKE 'SCRAM%' THEN 'scram-sha-256'
            WHEN rolpassword LIKE 'md5%' THEN 'md5'
            ELSE 'unknown' END AS encoding
FROM pg_authid WHERE rolpassword IS NOT NULL;
```

### password (Never Use)

Transmits password in cleartext. Never use.

---

## OS-Level Authentication

### peer (Local Only)

Maps OS user to PostgreSQL role via Unix socket:

```text
local all all peer
local all all peer map=mymap
```

### ident (Remote)

Uses RFC 1413 ident server on client machine. Rarely used; requires client ident daemon.

```text
host all all 192.168.0.0/24 ident map=mymap
```

### User Mapping (pg_ident.conf)

Maps external usernames to PostgreSQL roles:

```text
# MAPNAME   SYSTEM-USER   PG-USER
mymap       admin         postgres
mymap       /^(.*)$       \1
```

---

## Certificate Authentication

Requires SSL with client certificate:

```text
hostssl all all 0.0.0.0/0 cert
hostssl all all 0.0.0.0/0 cert clientcert=verify-full
```

Options:
- `clientcert=verify-ca` — Verify CA only
- `clientcert=verify-full` — Verify CA + hostname

Map certificate CN:

```text
# pg_ident.conf
certmap  /CN=(.*)@corp\.com$  \1
```

---

## Enterprise Authentication (Brief)

### LDAP

```text
host all all 0.0.0.0/0 ldap ldapserver=ldap.example.com ldapbasedn="dc=example,dc=com" ldapsearchattribute=uid
```

Two modes:
- **Simple bind**: `ldapprefix` + username + `ldapsuffix`
- **Search+bind**: Search for DN, then bind

### RADIUS

```text
host all all 0.0.0.0/0 radius radiusservers="radius.example.com" radiussecrets="sharedsecret"
```

### GSSAPI (Kerberos)

```text
host all all 0.0.0.0/0 gss include_realm=0 krb_realm=EXAMPLE.COM
```

Requires Kerberos infrastructure.

### SSPI (Windows)

```text
host all all 0.0.0.0/0 sspi include_realm=0
```

Windows-specific, similar to GSSAPI.

### PAM

```text
host all all 0.0.0.0/0 pam pamservice=postgresql
```

Delegates to PAM subsystem.

---

## Special Methods

### trust

No authentication. **Never in production.**

```text
local all all trust
```

### reject

Explicitly deny connections:

```text
host all baduser 0.0.0.0/0 reject
```

---

## Security Best Practices

### Production Checklist

- [ ] `scram-sha-256` for all password auth
- [ ] `hostssl` for remote connections
- [ ] No `trust` authentication
- [ ] Restrict by IP/CIDR
- [ ] `password_encryption = 'scram-sha-256'`
- [ ] `log_connections = on`
- [ ] Audit pg_hba.conf changes

### Reload Configuration

```sql
SELECT pg_reload_conf();
```

Or:

```bash
pg_ctl reload -D $PGDATA
```

**Note:** SSL certificate changes require restart.

### Test Authentication

```bash
psql -h hostname -U user -d dbname
```

```sql
SELECT current_user, session_user, inet_server_addr();
```

---

## See Also

- [authentication-oauth.md](authentication-oauth.md) — OAuth 2.0 (PG18+)
- [PostgreSQL Auth Docs](https://www.postgresql.org/docs/current/client-authentication.html)

```

### references/authentication-oauth.md

```markdown
# PostgreSQL OAuth 2.0 Authentication (PostgreSQL 18+)

> Consolidated from PostgreSQL documentation Chapters 20.15 (OAuth Authorization/Authentication) and 50 (OAuth Validator Modules).

OAuth 2.0 authentication allows PostgreSQL to delegate authentication to external identity providers (IdP).

## Requirements

- PostgreSQL 18+ (built with `--with-openssl --with-libcurl`)
- OAuth 2.0 / OpenID Connect identity provider
- Server-side validator module (must be implemented separately; PostgreSQL does not ship with any default implementations)
- Compatible client (libpq with OAUTHBEARER support)

---

## Core Concepts

### Terminology (from PostgreSQL 18 docs)

| Term | Description |
|------|-------------|
| **Resource Owner (End User)** | The user who owns protected resources and can grant access. When using psql to connect with OAuth, you are the resource owner/end user. |
| **Client** | Applications using libpq (like psql) that access protected resources using access tokens. |
| **Resource Server** | The PostgreSQL cluster being connected to. |
| **Provider** | The organization/vendor developing and administering OAuth authorization servers and clients. Different providers have different implementation details; clients of one provider are not generally compatible with servers of another. |
| **Authorization Server** | Issues access tokens to clients after the authenticated resource owner gives approval. PostgreSQL does **not** provide this; it's the OAuth provider's responsibility. |
| **Issuer** | An HTTPS URL identifying the authorization server, providing a trusted "namespace" for OAuth clients. Allows a single authorization server to talk to clients of mutually untrusting entities using separate issuers. |

**Note from docs**: For small deployments, there may not be a meaningful distinction between "provider", "authorization server", and "issuer". For more complicated setups, a provider may rent out multiple issuer identifiers to separate tenants, then provide multiple authorization servers, possibly with different supported feature sets.

### Bearer Tokens

PostgreSQL supports bearer tokens (RFC 6750). The format is implementation-specific and chosen by each authorization server.

---

## Server Configuration

### postgresql.conf

```text
# Load validator modules
oauth_validator_libraries = 'my_validator'
```

**Parameter Details** (`oauth_validator_libraries`):

- If only one validator library is provided, it will be used by default for any OAuth connections
- If multiple validators are configured, **all** `oauth` HBA entries must explicitly set a `validator`
- If set to empty string (the default), OAuth connections will be refused
- This parameter can only be set in `postgresql.conf`
- Validator modules must be implemented/obtained separately; PostgreSQL does not ship with any default implementations

### pg_hba.conf

```text
# TYPE   DATABASE  USER    ADDRESS      METHOD  OPTIONS
host     all       myuser  ::1/128      oauth   issuer=https://idp.example.com scope="openid profile" validator=my_validator

# With pg_ident.conf mapping
host     all       myuser  0.0.0.0/0    oauth   issuer=https://idp.example.com scope="openid" validator=my_validator map=oauth_map

# With delegated mapping (validator handles identity)
host     all       myuser  0.0.0.0/0    oauth   issuer=https://idp.example.com scope="openid" validator=my_validator delegate_ident_mapping=1
```

### OAuth HBA Options (from official docs)

| Option | Required | Description |
|--------|----------|-------------|
| `issuer` | **Yes** | HTTPS URL: either the **exact** issuer identifier from the authorization server's discovery document, or a well-known URI pointing directly to that document. |
| `scope` | **Yes** | Space-separated list of OAuth scopes needed to both authorize the client and authenticate the user. Appropriate values are determined by the authorization server and the OAuth validator module. |
| `validator` | Conditional | Must exactly match one of the libraries in `oauth_validator_libraries`. Required if multiple validators configured; optional otherwise. |
| `map` | No | Allows mapping between OAuth identity provider and database user names. See Section 20.2 for details. If not specified, user name from token must exactly match requested role. **Incompatible with `delegate_ident_mapping`.** |
| `delegate_ident_mapping` | No | **Advanced option, not for common use.** When `1`: standard user mapping with pg_ident.conf is skipped. Validator takes full responsibility for mapping end user identities to database roles. Connection proceeds if validator authorizes, regardless of authentication status. **Incompatible with `map`.** |

### Issuer Discovery URL Construction

From the docs:
- When a client connects, a URL for the discovery document is constructed using the issuer identifier
- By default: path `/.well-known/openid-configuration` is appended to the issuer identifier
- If issuer already contains a `/.well-known/` path segment, that URL is provided to the client as-is

**⚠️ Warning (from docs)**: The OAuth client in libpq requires the server's issuer setting to **exactly** match the issuer identifier which is provided in the discovery document, which must in turn match the client's `oauth_issuer` setting. **No variations in case or formatting are permitted.**

### pg_ident.conf (User Mapping)

```text
# MAPNAME    SYSTEM-USERNAME    PG-USERNAME
oauth_map   abcdef_user_id     myuser
oauth_map   /^(.*)@corp\.com$  \1
```

The `SYSTEM-USERNAME` comes from the validator's returned authenticated ID (`authn_id`).

---

## Validator Module (Chapter 50)

### Overview

PostgreSQL provides infrastructure for creating custom modules to perform server-side validation of OAuth bearer tokens. Because OAuth implementations vary widely, and bearer token validation is heavily dependent on the issuing party, **the server cannot check the token itself**; validator modules provide the integration layer between the server and the OAuth provider in use.

**⚠️ Warning (from docs)**: Since a misbehaving validator might let unauthorized users into the database, correct implementation is crucial for server safety.

### Validator Responsibilities (from Section 50.1.1)

Implementations generally need to perform three separate actions:

#### 1. Validate the Token

The validator must ensure that the presented token is a valid Bearer token for use in client authentication.

**Offline Validation**:
- Typically validate signature with IdP's public keys
- Must verify issuer, audience, validity period
- Follow provider's instructions exactly
- Cannot catch revoked tokens (use short TTL)
- Much faster than online validation

**Online Validation** (Token Introspection):
- Present token to provider's introspection endpoint
- Allows central revocation
- Requires network call per authentication (must complete within `authentication_timeout`)
- Provider may not provide introspection endpoints for external resource servers

If the token cannot be validated, the module should immediately fail.

#### 2. Authorize the Client

Ensure the end user has given the client permission to access the server on their behalf:
- Check scopes assigned to the token
- Scopes must cover database access for current HBA parameters

**Purpose** (from docs): Prevent OAuth client from obtaining token under false pretenses. If validator requires tokens to carry database access scopes, the provider should prompt the user to grant that access during the flow, giving them opportunity to reject if client shouldn't be using their credentials for database access.

Even if authorization fails, module may choose to continue to pull authentication information from the token for auditing and debugging.

#### 3. Authenticate the End User

Determine a user identifier for the token and return it to the server:
- Either ask provider for this information or extract from token
- Server makes final authorization decision using HBA configuration
- Identifier available via `system_user` and recorded in server logs if `log_connections` enabled

Different providers record different claims. Use claims that providers document as trustworthy for authorization (e.g., don't use changeable display names).

Anonymous/pseudonymous login is possible with `delegate_ident_mapping`.

### General Coding Guidelines (from Section 50.1.2)

#### Token Confidentiality

- Modules should **not** write tokens (or pieces of tokens) into the server log
- True even if module considers the token invalid (attacker confusion scenario)
- Network transmissions (e.g., introspection) must authenticate peer and ensure strong transport security

#### Logging

- Use standard extension logging facilities
- Rules for client log entries are different during authentication phase
- Log verification problems at `COMMERROR` level and return normally
- **Do not** use `ERROR`/`FATAL` to unwind the stack (avoids leaking info to unauthenticated clients)

#### Interruptibility

- Modules must remain interruptible by signals
- Server needs to handle authentication timeouts and shutdown signals from pg_ctl
- Use `WaitLatchOrSocket()`, `WaitEventSetWait()` instead of blocking socket calls
- Long-running loops should periodically call `CHECK_FOR_INTERRUPTS()`
- Failure may result in unresponsive backend sessions

#### Testing

Negative testing should be considered **mandatory**. It's trivial to design a module that lets authorized users in; the whole point is to keep unauthorized users out.

#### Documentation

Validator implementations should document:
- Contents and format of authenticated ID reported for each end user (email? org ID? UUID?)
- Whether safe to use in `delegate_ident_mapping=1` mode
- Additional configuration required for that mode

### Usermap Delegation (Section 50.1.3)

With `delegate_ident_mapping`, the validator bypasses username mapping entirely:
- Validator assumes responsibility for authorizing user connections
- May use token scopes or equivalent to decide if user can connect under desired role
- User identifier still recorded but doesn't determine connection authorization

**Anonymous Access**: As long as module reports authorized, login continues even without user identifier. This enables anonymous/pseudonymous access where provider authenticates but doesn't provide user-identifying info to server.

**⚠️ Warning (from docs)**: Usermap delegation provides most architectural flexibility, but turns the validator module into a single point of failure for connection authorization. Use with caution.

**Note on Device Authorization** (from docs): The Device Authorization client flow supported by libpq does not usually meet the "trusted client" bar, since it's designed for use by public/untrusted clients.

### Module Initialization (Section 50.2)

Validator modules are dynamically loaded from shared libraries listed in `oauth_validator_libraries`. Modules are loaded on demand when requested from a login in progress. Normal library search path is used.

```c
/* The module must provide this exported function */
typedef const OAuthValidatorCallbacks *(*OAuthValidatorModuleInit)(void);

/* Entry point name */
const OAuthValidatorCallbacks *_PG_oauth_validator_module_init(void);

/* Callbacks structure */
typedef struct OAuthValidatorCallbacks
{
    uint32  magic;  /* must be set to PG_OAUTH_VALIDATOR_MAGIC */

    ValidatorStartupCB  startup_cb;   /* optional */
    ValidatorShutdownCB shutdown_cb;  /* optional */
    ValidatorValidateCB validate_cb;  /* REQUIRED */
} OAuthValidatorCallbacks;
```

- Return value must be of server lifetime (typically `static const` variable in global scope)
- Only `validate_cb` is required; others are optional

### OAuth Validator Callbacks (Section 50.3)

#### Startup Callback (50.3.1)

```c
typedef void (*ValidatorStartupCB)(ValidatorModuleState *state);
```

- Executed immediately after module is loaded
- Used to initialize local state or perform additional setup
- Can store state in `state->private_data`

#### Validate Callback (50.3.2) — **REQUIRED**

```c
typedef bool (*ValidatorValidateCB)(
    const ValidatorModuleState *state,
    const char *token,      /* Bearer token to validate */
    const char *role,       /* Requested PostgreSQL role */
    ValidatorModuleResult *result
);

typedef struct ValidatorModuleResult {
    bool   authorized;   /* true = allow connection */
    char  *authn_id;     /* Authenticated user identifier (palloc'd) */
} ValidatorModuleResult;
```

- Called during OAuth exchange when user attempts to authenticate
- Connection proceeds only if `result->authorized` is set to `true`
- Module must ensure token carries sufficient permissions for user to log in under requested role

#### Shutdown Callback (50.3.3)

```c
typedef void (*ValidatorShutdownCB)(ValidatorModuleState *state);
```

- Executed when backend process associated with connection exits
- Responsible for freeing allocated state to prevent resource leaks

### Minimal Validator Example

```c
#include "postgres.h"
#include "libpq/oauth.h"

static void my_startup(ValidatorModuleState *state) {
    /* Initialize: load signing keys, set up caches, etc. */
    /* Can store data in state->private_data */
}

static bool my_validate(const ValidatorModuleState *state,
                        const char *token, const char *role,
                        ValidatorModuleResult *result) {
    /*
     * 1. Validate token (offline signature check or online introspection)
     * 2. Authorize client (check scopes cover database access)
     * 3. Authenticate end user (extract user identifier from token)
     * 4. Set result fields
     */
    
    result->authorized = true;  /* or false */
    result->authn_id = pstrdup("username_from_token");
    return true;  /* false = internal error */
}

static void my_shutdown(ValidatorModuleState *state) {
    /* Free any resources allocated during lifetime */
}

/* Module entry point — must return pointer with server lifetime */
const OAuthValidatorCallbacks *
_PG_oauth_validator_module_init(void) {
    static OAuthValidatorCallbacks callbacks = {
        .magic = PG_OAUTH_VALIDATOR_MAGIC,
        .startup_cb = my_startup,
        .validate_cb = my_validate,
        .shutdown_cb = my_shutdown
    };
    return &callbacks;
}
```

---

## Client Configuration

### Connection Parameters (from libpq docs)

| Parameter | Required | Description |
|-----------|----------|-------------|
| `oauth_issuer` | **Yes** | HTTPS URL of trusted issuer to contact if server requests OAuth token. Must **exactly** match server's `issuer` setting. |
| `oauth_client_id` | **Yes** | OAuth 2.0 client identifier issued by authorization server. Required if server requests OAuth token and no custom hook installed. |
| `oauth_client_secret` | Conditional | Client password for contacting authorization server. Required for "confidential" clients; "public" clients generally don't use one. Provider-dependent. |
| `oauth_scope` | No | Space-separated scope list. **Advanced**: If set, server-requested scope list is ignored. If client scopes don't include server-required scopes, connection will fail. Empty list behavior is provider-dependent. |

### Security Warning (from docs)

> Issuers are highly privileged during the OAuth connection handshake. As a rule of thumb, if you would not trust the operator of a URL to handle access to your servers, or to impersonate you directly, that URL should not be trusted as an `oauth_issuer`.

### Well-Known Endpoints Supported

libpq supports setting `oauth_issuer` to these well-known URIs directly:
- `/.well-known/openid-configuration`
- `/.well-known/oauth-authorization-server`

In this case, if server asks for different URL, connection fails, but custom OAuth flow may speed up handshake using cached tokens. (Recommended to also set `oauth_scope` since client won't ask server for correct scope.)

### Discovery Document Validation

From docs: As part of standard authentication handshake, libpq asks server for discovery document URL. Server must provide URL directly constructed from `oauth_issuer` components, and this value must **exactly** match issuer identifier in discovery document itself, or connection fails. This prevents "mix-up attacks" on OAuth clients.

### psql / libpq

Connection string:

```bash
psql "postgres://myuser@host:5432/mydb?oauth_issuer=https://idp.example.com&oauth_client_id=my-app"
```

With client secret (confidential client):

```bash
psql "dbname=mydb oauth_issuer=https://idp.example.com oauth_client_id=my-app oauth_client_secret=secret"
```

### Device Authorization Flow

libpq implements RFC 8628 Device Authorization Grant by default (requires `--with-libcurl` build):

```bash
$ psql "dbname=postgres oauth_issuer=https://example.com oauth_client_id=..."
Visit https://example.com/device and enter the code: ABCD-EFGH
```

**Flow**:
1. libpq requests device code from IdP
2. Displays `verification_uri` and `user_code`
3. User visits URL in browser, enters code, authenticates
4. libpq polls token endpoint until authorization complete
5. Sends bearer token to PostgreSQL

**⚠️ Note**: Built-in Device Authorization flow is not supported on Windows. Use custom OAuth hooks instead.

### Custom OAuth Hooks (libpq C API)

For custom token acquisition (e.g., browser-based flows, cached tokens):

```c
/* Hook callback signature */
int hook_fn(PGauthData type, PGconn *conn, void *data);

/* Set custom hook */
void PQsetAuthDataHook(PQauthDataHook_type hook);

/* Get current hook */
PQauthDataHook_type PQgetAuthDataHook(void);
```

#### Hook Types

| Type | Data | Purpose |
|------|------|---------|
| `PQAUTHDATA_OAUTH_BEARER_TOKEN` | `PGoauthBearerRequest*` | Custom token acquisition |
| `PQAUTHDATA_PROMPT_OAUTH_DEVICE` | `PGpromptOAuthDevice*` | Custom device flow prompt |

#### Token Request Structure

```c
typedef struct PGoauthBearerRequest {
    /* Inputs (read-only) */
    const char *openid_configuration;  /* OIDC discovery URL */
    const char *scope;                 /* Required scope(s) */
    
    /* Outputs */
    PostgresPollingStatusType (*async)(PGconn *conn,
                                       struct PGoauthBearerRequest *request,
                                       SOCKTYPE *altsock);
    void (*cleanup)(PGconn *conn, struct PGoauthBearerRequest *request);
    
    char *token;   /* Set this to acquired bearer token */
    void *user;    /* Hook-defined data */
} PGoauthBearerRequest;
```

#### Device Prompt Structure

```c
typedef struct PGpromptOAuthDevice {
    const char *verification_uri;           /* URL to visit */
    const char *user_code;                  /* Code to enter */
    const char *verification_uri_complete;  /* Combined URI+code (for QR) */
    int         expires_in;                 /* Seconds until code expires */
} PGpromptOAuthDevice;
```

#### Custom Hook Example

```c
static int my_oauth_hook(PGauthData type, PGconn *conn, void *data) {
    if (type == PQAUTHDATA_OAUTH_BEARER_TOKEN) {
        PGoauthBearerRequest *req = (PGoauthBearerRequest *)data;
        
        /* Return cached/pre-obtained token */
        req->token = strdup(getenv("OAUTH_TOKEN"));
        return 1;  /* Success */
    }
    return 0;  /* Use default handler */
}

/* Install hook before connecting */
PQsetAuthDataHook(my_oauth_hook);
conn = PQconnectdb("dbname=mydb oauth_issuer=...");
```

---

## SASL OAUTHBEARER Protocol

### Message Flow

```text
Client                              Server
  |                                    |
  |<---- AuthenticationSASL -----------|  "OAUTHBEARER" mechanism offered
  |                                    |
  |----- SASLInitialResponse --------->|  Token or discovery request
  |                                    |
  |<---- AuthenticationSASLContinue ---|  Discovery info (if requested)
  |                                    |
  |----- SASLResponse ---------------->|  Bearer token
  |                                    |
  |<---- AuthenticationSASLFinal ------|  Success outcome
  |<---- AuthenticationOk -------------|
```

### Initial Response Formats (RFC 7628)

**Discovery request** (empty kvpairs):

```text
n,,\x01\x01
```

**Token submission**:

```text
n,,\x01auth=Bearer <token>\x01\x01
```

Where:
- `n,,` — No channel binding, no authorization identity
- `\x01` — Key-value pair separator
- `auth=Bearer <token>` — The bearer token

---

## Common IdP Configurations

### Keycloak

```text
# pg_hba.conf
host all all 0.0.0.0/0 oauth \
  issuer=https://keycloak.example.com/realms/myrealm \
  scope="openid profile" \
  validator=jwt_validator
```

### Azure AD / Entra ID

```text
host all all 0.0.0.0/0 oauth \
  issuer=https://login.microsoftonline.com/<tenant-id>/v2.0 \
  scope="api://<app-id>/.default" \
  validator=azure_validator
```

### Auth0

```text
host all all 0.0.0.0/0 oauth \
  issuer=https://<tenant>.auth0.com/ \
  scope="openid profile" \
  validator=auth0_validator
```

### Google

```text
host all all 0.0.0.0/0 oauth \
  issuer=https://accounts.google.com \
  scope="openid email" \
  validator=google_validator
```

### Okta

```text
host all all 0.0.0.0/0 oauth \
  issuer=https://<domain>.okta.com \
  scope="openid profile" \
  validator=okta_validator
```

---

## Best Practices

### Security

- [ ] Use HTTPS for issuer URL
- [ ] Validate token scopes in validator
- [ ] Set appropriate `authentication_timeout`
- [ ] Use `delegate_ident_mapping` only with trusted, carefully-implemented validators
- [ ] Require SSL (`hostssl`) for OAuth connections
- [ ] Never log tokens or token fragments (even for invalid tokens)
- [ ] Use strong transport security for introspection calls

### Token Management

- [ ] For offline validation: use short token TTL (5-15 min) since revocation cannot be checked
- [ ] For online validation: ensure network calls complete within `authentication_timeout`
- [ ] Implement token refresh in connection pools
- [ ] Handle `AuthenticationSASLContinue` for re-auth

### Validator Implementation

- [ ] Implement comprehensive negative testing
- [ ] Document authenticated ID format (email? UUID? org ID?)
- [ ] Document `delegate_ident_mapping` safety and requirements
- [ ] Use `COMMERROR` level for validation problems, not `ERROR`/`FATAL`
- [ ] Ensure interruptibility (`CHECK_FOR_INTERRUPTS()`, non-blocking I/O)

---

## Troubleshooting

### Common Errors

| Error | Cause | Fix |
|-------|-------|-----|
| `issuer mismatch` | Server/client/discovery issuer URLs differ | Ensure **exact** match (case-sensitive, no trailing slash variations) |
| `validator not found` | Module not in `oauth_validator_libraries` | Add to postgresql.conf |
| `token expired` | Token lifetime exceeded | Implement refresh logic; use shorter TTL |
| `SCRAM selected` | OAuth not configured for connection | Check pg_hba.conf rules order |
| Unresponsive backend | Validator not handling interrupts | Use non-blocking I/O, call `CHECK_FOR_INTERRUPTS()` |

### Debug

```bash
# libpq debug (WARNING: logs sensitive data)
PGOAUTHDEBUG=UNSAFE psql "..."

# Server logs
log_connections = on
```

---

## See Also

- [authentication.md](authentication.md) — General authentication methods (pg_hba.conf)
- [user-management.md](user-management.md) — Role management after authentication
- [protocol.md](protocol.md) — SASL authentication protocol details

### RFCs

- [RFC 6749 - OAuth 2.0](https://tools.ietf.org/html/rfc6749)
- [RFC 6750 - Bearer Tokens](https://tools.ietf.org/html/rfc6750)
- [RFC 7628 - SASL OAUTHBEARER](https://tools.ietf.org/html/rfc7628)
- [RFC 8628 - Device Authorization Grant](https://tools.ietf.org/html/rfc8628)

### PostgreSQL Documentation

- [Chapter 20.15: OAuth Authorization/Authentication](https://www.postgresql.org/docs/current/auth-oauth.html)
- [Chapter 50: OAuth Validator Modules](https://www.postgresql.org/docs/current/oauth-validators.html)
- [libpq Connection Parameters](https://www.postgresql.org/docs/current/libpq-connect.html)

```

### references/user-management.md

```markdown
# PostgreSQL Database Roles & User Management

Role-based access control in PostgreSQL: create, alter, drop roles, membership, and privilege inheritance.

## Core Concepts

- **Role** — unified concept for users and groups
- **User** — role with LOGIN privilege
- **Group** — role without LOGIN, used for privilege grouping

## Creating Roles

### Basic Syntax

```sql
CREATE ROLE name [ WITH option ... ];
CREATE USER name [ WITH option ... ];  -- Implies LOGIN
CREATE GROUP name [ WITH option ... ]; -- Alias for CREATE ROLE
```

### Common Patterns

```sql
-- Application user with password
CREATE ROLE app_user LOGIN PASSWORD 'secure_password';

-- Read-only user
CREATE ROLE readonly_user LOGIN PASSWORD 'pass' NOSUPERUSER NOCREATEDB;

-- Admin with role management
CREATE ROLE admin_user LOGIN PASSWORD 'pass' CREATEROLE CREATEDB;

-- Service account for replication
CREATE ROLE repl_user REPLICATION LOGIN PASSWORD 'pass';

-- Password with expiration
CREATE ROLE temp_user LOGIN PASSWORD 'pass' VALID UNTIL '2025-12-31 23:59:59';
```

## Role Attributes

| Attribute | Effect |
|-----------|--------|
| `LOGIN` / `NOLOGIN` | Can connect to database |
| `SUPERUSER` / `NOSUPERUSER` | Bypass all permission checks |
| `CREATEDB` / `NOCREATEDB` | Can create databases |
| `CREATEROLE` / `NOCREATEROLE` | Can create/alter/drop roles |
| `REPLICATION` / `NOREPLICATION` | Can initiate streaming replication |
| `BYPASSRLS` / `NOBYPASSRLS` | Bypass row-level security |
| `INHERIT` / `NOINHERIT` | Inherit privileges from member roles |
| `CONNECTION LIMIT n` | Max concurrent connections (-1 = unlimited) |
| `PASSWORD 'pass'` | Set encrypted password |
| `PASSWORD NULL` | Remove password |
| `VALID UNTIL 'timestamp'` | Password expiration |

### Create Role Examples

```sql
-- Superuser (dangerous!)
CREATE ROLE dba SUPERUSER LOGIN PASSWORD 'pass';

-- App user bypassing RLS (for admin tools)
CREATE ROLE admin_api BYPASSRLS LOGIN PASSWORD 'pass';

-- Limited connections
CREATE ROLE batch_user LOGIN CONNECTION LIMIT 5;

-- No inheritance (explicit SET ROLE required)
CREATE ROLE strict_user NOINHERIT LOGIN;
```

## Altering Roles

```sql
-- Change password
ALTER ROLE app_user PASSWORD 'new_password';

-- Remove password
ALTER ROLE app_user PASSWORD NULL;

-- Add privileges
ALTER ROLE app_user CREATEDB CREATEROLE;

-- Remove privileges
ALTER ROLE app_user NOCREATEDB NOCREATEROLE;

-- Rename role
ALTER ROLE old_name RENAME TO new_name;

-- Set connection limit
ALTER ROLE app_user CONNECTION LIMIT 100;

-- Set role-specific defaults
ALTER ROLE analytics_user SET work_mem = '256MB';
ALTER ROLE analytics_user SET statement_timeout = '5min';

-- Reset to system default
ALTER ROLE analytics_user RESET work_mem;
ALTER ROLE analytics_user RESET ALL;
```

## Role Membership (Groups)

### Grant Membership

```sql
-- Basic membership
GRANT admin_group TO app_user;

-- Multiple roles
GRANT readers, writers TO app_user;

-- With options
GRANT admin_group TO app_user WITH ADMIN OPTION;  -- Can grant to others
GRANT admin_group TO app_user WITH INHERIT TRUE;  -- Inherit privileges
GRANT admin_group TO app_user WITH SET FALSE;     -- Cannot SET ROLE to it
```

### Membership Options

| Option | Effect |
|--------|--------|
| `ADMIN` | Can grant/revoke this role to others |
| `INHERIT` | Automatically inherit role's privileges |
| `SET` | Can use `SET ROLE` to assume this role |

### Revoke Membership

```sql
REVOKE admin_group FROM app_user;
REVOKE ADMIN OPTION FOR admin_group FROM app_user;
```

### Switching Roles

```sql
SET ROLE admin;       -- Switch to admin role
SET ROLE NONE;        -- Reset to login role
RESET ROLE;           -- Same as SET ROLE NONE
```

## Dropping Roles

### Pre-Drop Checklist

Before dropping, handle owned objects:

```sql
-- 1. Reassign owned objects to another role
REASSIGN OWNED BY doomed_role TO successor_role;

-- 2. Drop remaining objects and revoke privileges
DROP OWNED BY doomed_role;

-- 3. Repeat in each database
-- 4. Then drop
DROP ROLE doomed_role;
```

### Drop Commands

```sql
DROP ROLE role_name;
DROP ROLE IF EXISTS role_name;
DROP USER username;    -- Alias
DROP GROUP groupname;  -- Alias
```

## Object Privileges (GRANT/REVOKE)

### Table Privileges

```sql
-- Grant specific
GRANT SELECT, INSERT ON table_name TO app_user;
GRANT UPDATE (column1, column2) ON table_name TO app_user;

-- Grant all
GRANT ALL PRIVILEGES ON table_name TO admin_user;

-- Grant to all roles
GRANT SELECT ON public_data TO PUBLIC;

-- With grant option
GRANT SELECT ON table_name TO lead_user WITH GRANT OPTION;

-- All tables in schema
GRANT SELECT ON ALL TABLES IN SCHEMA myschema TO readonly_user;
```

### Revoke Privileges

```sql
REVOKE INSERT ON table_name FROM app_user;
REVOKE ALL PRIVILEGES ON table_name FROM PUBLIC;
REVOKE GRANT OPTION FOR SELECT ON table_name FROM lead_user;
```

### Schema Privileges

```sql
GRANT USAGE ON SCHEMA app_schema TO app_user;
GRANT CREATE ON SCHEMA app_schema TO admin_user;
GRANT ALL ON SCHEMA app_schema TO owner_role;

-- Remove public access
REVOKE CREATE ON SCHEMA public FROM PUBLIC;
```

### Function Privileges

```sql
GRANT EXECUTE ON FUNCTION my_func(integer) TO app_user;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA myschema TO admin;
```

## Default Privileges

Set privileges for future objects:

```sql
-- Grant SELECT on all future tables in schema
ALTER DEFAULT PRIVILEGES IN SCHEMA myschema
    GRANT SELECT ON TABLES TO readonly_user;

-- Grant INSERT to webuser
ALTER DEFAULT PRIVILEGES IN SCHEMA myschema
    GRANT INSERT ON TABLES TO webuser;

-- For objects created by specific role
ALTER DEFAULT PRIVILEGES FOR ROLE creator_role IN SCHEMA myschema
    GRANT SELECT ON TABLES TO reader_role;

-- Revoke default privileges
ALTER DEFAULT PRIVILEGES IN SCHEMA myschema
    REVOKE SELECT ON TABLES FROM readonly_user;
```

## Security Best Practices

### Principle of Least Privilege

```sql
-- Create minimal role
CREATE ROLE app_user LOGIN PASSWORD 'pass'
    NOSUPERUSER NOCREATEDB NOCREATEROLE NOREPLICATION;

-- Grant only needed privileges
GRANT USAGE ON SCHEMA app TO app_user;
GRANT SELECT, INSERT, UPDATE, DELETE ON app.orders TO app_user;
GRANT USAGE, SELECT ON SEQUENCE app.orders_id_seq TO app_user;
```

### SECURITY DEFINER Functions

```sql
-- Safe pattern: revoke PUBLIC, grant specific
BEGIN;
CREATE FUNCTION admin_operation() RETURNS void
    SECURITY DEFINER
    SET search_path = pg_catalog, public
AS $$ ... $$;

REVOKE ALL ON FUNCTION admin_operation() FROM PUBLIC;
GRANT EXECUTE ON FUNCTION admin_operation() TO admin_role;
COMMIT;
```

### Password Management

```sql
-- Use SCRAM-SHA-256 (check pg_hba.conf)
ALTER ROLE app_user PASSWORD 'strong_password_here';

-- Temporary access
ALTER ROLE temp_user VALID UNTIL 'now + 1 day';

-- Revoke access immediately
ALTER ROLE compromised_user NOLOGIN;
```

## Predefined Roles (PG14+)

| Role | Purpose |
|------|---------|
| `pg_read_all_data` | SELECT on all tables |
| `pg_write_all_data` | INSERT/UPDATE/DELETE on all tables |
| `pg_read_all_settings` | Read all config parameters |
| `pg_read_all_stats` | Read all statistics views |
| `pg_stat_scan_tables` | Execute functions scanning tables for stats |
| `pg_monitor` | Read various monitoring views |
| `pg_signal_backend` | Send signals to other backends |
| `pg_checkpoint` | Execute CHECKPOINT |

```sql
GRANT pg_read_all_data TO readonly_admin;
GRANT pg_monitor TO monitoring_user;
```

## Querying Role Information

```sql
-- List all roles
SELECT rolname, rolsuper, rolcanlogin, rolcreaterole
FROM pg_roles;

-- Check role membership
SELECT r.rolname AS role, m.rolname AS member
FROM pg_auth_members am
JOIN pg_roles r ON am.roleid = r.oid
JOIN pg_roles m ON am.member = m.oid;

-- Check privileges
SELECT pg_has_role('user', 'role', 'MEMBER');
SELECT pg_has_role('user', 'role', 'USAGE');
```

## Command-Line Tools

```bash
# Create role
createuser -P app_user

# Create superuser
createuser -s admin_user

# Drop role
dropuser app_user

# List roles
psql -c "\\du"
```

## Quick Reference

### Role Lifecycle

```sql
-- 1. Create
CREATE ROLE app_user LOGIN PASSWORD 'pass';

-- 2. Grant membership/privileges
GRANT app_group TO app_user;
GRANT SELECT ON schema.table TO app_user;

-- 3. Maintain
ALTER ROLE app_user PASSWORD 'new_pass';

-- 4. Remove (cleanup first!)
REASSIGN OWNED BY app_user TO admin;
DROP OWNED BY app_user;
DROP ROLE app_user;
```

### Common Mistakes

| Mistake | Fix |
|---------|-----|
| `DROP ROLE` with owned objects | Run `REASSIGN OWNED` + `DROP OWNED` first |
| No `USAGE` on schema | Grant `USAGE ON SCHEMA` before table grants |
| PUBLIC has CREATE on public schema | `REVOKE CREATE ON SCHEMA public FROM PUBLIC` |
| Password not encrypted | Use `ENCRYPTED PASSWORD` or SCRAM-SHA-256 |

## See Also

- [authentication.md](authentication.md) — pg_hba.conf and auth methods
- [PostgreSQL Role Attributes](https://www.postgresql.org/docs/current/role-attributes.html)
- [PostgreSQL GRANT](https://www.postgresql.org/docs/current/sql-grant.html)

```

### references/connection-settings.md

```markdown
# PostgreSQL Connection Settings

Runtime configuration parameters for connections and authentication.

## Connection Settings

### listen_addresses

```sql
-- Default: 'localhost' (local connections only)
listen_addresses = '*'              -- All interfaces
listen_addresses = '0.0.0.0'        -- All IPv4
listen_addresses = '192.168.1.10'   -- Specific IP
listen_addresses = 'localhost,192.168.1.10'  -- Multiple
```

**Note:** Requires restart. Use `pg_hba.conf` for fine-grained access control.

### port

```sql
port = 5432  -- Default
```

### max_connections

Maximum concurrent connections:

```sql
max_connections = 100  -- Default
```

**Sizing:**
- Each connection uses ~5-10MB RAM
- Formula: `max_connections < (RAM - shared_buffers) / per_connection_memory`
- Use connection pooling (PgBouncer) for high-connection workloads

### superuser_reserved_connections

Reserved connections for superusers when `max_connections` is reached:

```sql
superuser_reserved_connections = 3  -- Default
```

### reserved_connections (PG16+)

Reserved connections for roles with `pg_use_reserved_connections`:

```sql
reserved_connections = 0  -- Default
```

### Unix Socket Settings

```sql
unix_socket_directories = '/var/run/postgresql'  -- Default varies
unix_socket_group = ''
unix_socket_permissions = 0777  -- Default
```

## TCP Settings

### TCP Keepalives

Detect dead connections:

```sql
tcp_keepalives_idle = 0       -- Use OS default (seconds)
tcp_keepalives_interval = 0   -- Use OS default
tcp_keepalives_count = 0      -- Use OS default
```

**Recommended for cloud/firewalls:**

```sql
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 6
```

### tcp_user_timeout (PG12+)

```sql
tcp_user_timeout = 0  -- Milliseconds, 0 = OS default
```

### client_connection_check_interval (PG14+)

```sql
client_connection_check_interval = 0  -- Milliseconds
```

## Authentication Timeouts

### authentication_timeout

Time limit for authentication:

```sql
authentication_timeout = 1min  -- Default
```

### password_encryption

```sql
password_encryption = 'scram-sha-256'  -- Recommended (default in PG14+)
password_encryption = 'md5'            -- Legacy
```

## SSL Settings

### ssl

```sql
ssl = off  -- Default
ssl = on   -- Enable SSL
```

### Certificate Files

```sql
ssl_cert_file = 'server.crt'
ssl_key_file = 'server.key'
ssl_ca_file = ''                  -- For client cert verification
ssl_crl_file = ''                 -- Certificate revocation list
ssl_crl_dir = ''                  -- CRL directory (PG16+)
```

### SSL Protocols and Ciphers

```sql
ssl_min_protocol_version = 'TLSv1.2'  -- Default (PG12+)
ssl_max_protocol_version = ''          -- Empty = no maximum
ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL'
ssl_prefer_server_ciphers = on
```

### ssl_passphrase_command

For encrypted private keys:

```sql
ssl_passphrase_command = ''
ssl_passphrase_command_supports_reload = off
```

## Application Settings

### application_name

Set by client, visible in `pg_stat_activity`:

```sql
application_name = ''  -- Default
```

### update_process_title

```sql
update_process_title = on  -- Show query in process title
```

## Quick Reference

### Production Defaults

```sql
# postgresql.conf
listen_addresses = '*'
max_connections = 100
superuser_reserved_connections = 3

# TCP keepalives (recommended)
tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 6

# Authentication
authentication_timeout = 1min
password_encryption = 'scram-sha-256'

# SSL (if enabled)
ssl = on
ssl_min_protocol_version = 'TLSv1.2'
```

### Memory per Connection

| Component | Typical Size |
|-----------|--------------|
| Base overhead | ~5-10 MB |
| work_mem (per sort/hash) | configurable |
| temp_buffers | 8 MB default |

### Connection Limits

Check current connections:

```sql
SELECT count(*) FROM pg_stat_activity;
SELECT * FROM pg_stat_activity WHERE state = 'active';
```

Set per-role limits:

```sql
ALTER ROLE myuser CONNECTION LIMIT 10;
ALTER DATABASE mydb CONNECTION LIMIT 50;
```

## Parameter Context

| Parameter | Requires |
|-----------|----------|
| `listen_addresses` | Restart |
| `port` | Restart |
| `max_connections` | Restart |
| `ssl` | Restart |
| `ssl_*` files | Reload (SIGHUP) |
| `tcp_keepalives_*` | Reload |
| `authentication_timeout` | Reload |

## See Also

- [authentication.md](authentication.md) — pg_hba.conf and auth methods
- [query-tuning.md](query-tuning.md) — Query planner settings
- [PostgreSQL Docs](https://www.postgresql.org/docs/current/runtime-config-connection.html)

```

### references/query-tuning.md

```markdown
# PostgreSQL Query Tuning Settings

Runtime parameters affecting query planning and execution.

## Planner Method Switches

Enable/disable specific plan types for debugging or forcing behavior:

```sql
enable_seqscan = on          -- Sequential scans
enable_indexscan = on        -- Index scans
enable_indexonlyscan = on    -- Index-only scans
enable_bitmapscan = on       -- Bitmap scans
enable_tidscan = on          -- TID scans
enable_sort = on             -- Explicit sorts
enable_hashagg = on          -- Hash aggregation
enable_hashjoin = on         -- Hash joins
enable_mergejoin = on        -- Merge joins
enable_nestloop = on         -- Nested loop joins
enable_material = on         -- Materialization
enable_memoize = on          -- Memoize (PG14+)
enable_partitionwise_join = off
enable_partitionwise_aggregate = off
enable_parallel_hash = on
enable_parallel_append = on
enable_partition_pruning = on
```

**Tip:** Use `SET enable_seqscan = off` temporarily to force index usage for debugging, but don't leave disabled in production.

## Cost Constants

### Page Costs

```sql
seq_page_cost = 1.0           -- Sequential page fetch cost
random_page_cost = 4.0        -- Random page fetch cost
```

**SSD tuning:** Lower `random_page_cost` for SSDs:

```sql
random_page_cost = 1.1        -- SSD: almost same as sequential
```

### CPU Costs

```sql
cpu_tuple_cost = 0.01         -- Processing one row
cpu_index_tuple_cost = 0.005  -- Processing one index entry
cpu_operator_cost = 0.0025    -- Processing one operator
```

### effective_cache_size

Hint to planner about available OS cache:

```sql
effective_cache_size = 4GB    -- Default varies
```

**Sizing:** Set to ~50-75% of total RAM (includes OS cache + shared_buffers):

```sql
-- 16GB RAM system
effective_cache_size = 12GB
```

### effective_io_concurrency

For SSDs and RAID arrays:

```sql
effective_io_concurrency = 1    -- HDD default
effective_io_concurrency = 200  -- SSD recommended
```

## Memory Settings

### work_mem

Memory per sort/hash operation (per query, per operation):

```sql
work_mem = 4MB  -- Default
```

**Warning:** A single query with multiple sorts can use `N × work_mem`. Be conservative.

**Sizing guidance:**
- `(RAM - shared_buffers) / max_connections / 4`
- Increase for complex analytical queries
- Set per-session for specific workloads

```sql
SET work_mem = '256MB';  -- For complex query
```

### maintenance_work_mem

Memory for maintenance operations (VACUUM, CREATE INDEX):

```sql
maintenance_work_mem = 64MB   -- Default
maintenance_work_mem = 1GB    -- For large indexes
```

### hash_mem_multiplier (PG13+)

```sql
hash_mem_multiplier = 2.0     -- Hash can use 2x work_mem
```

## Parallel Query Settings

### Workers

```sql
max_parallel_workers_per_gather = 2  -- Max workers per Gather node
max_parallel_workers = 8             -- Total parallel workers
max_parallel_maintenance_workers = 2 -- For CREATE INDEX
parallel_leader_participation = on   -- Leader also does work
```

### Parallel Thresholds

```sql
min_parallel_table_scan_size = 8MB   -- Min table size for parallel
min_parallel_index_scan_size = 512kB -- Min index size for parallel
```

### Parallel Costs

```sql
parallel_tuple_cost = 0.1     -- Cost per tuple passed to leader
parallel_setup_cost = 1000    -- Cost to launch worker
```

## Join and Aggregation Limits

### Join Planning Limits

```sql
from_collapse_limit = 8       -- Max FROM items before explicit join order
join_collapse_limit = 8       -- Max items in explicit JOIN before fixing order
```

**Note:** For complex queries with many joins, increasing may improve plans but increases planning time.

### GEQO (Genetic Query Optimizer)

For queries with many tables:

```sql
geqo = on
geqo_threshold = 12           -- Use GEQO for 12+ FROM items
geqo_effort = 5               -- 1-10, higher = better plans, slower
```

## Statistics and Estimation

### Statistics Target

Controls histogram detail:

```sql
default_statistics_target = 100  -- Default
```

Per-column override:

```sql
ALTER TABLE t ALTER COLUMN c SET STATISTICS 1000;
ANALYZE t;
```

### Planner Estimates

```sql
cursor_tuple_fraction = 0.1   -- Expected fraction of cursor rows fetched
```

## Plan Caching

### plan_cache_mode (PG12+)

```sql
plan_cache_mode = auto        -- Default
plan_cache_mode = force_custom_plan
plan_cache_mode = force_generic_plan
```

### JIT Compilation (PG11+)

```sql
jit = on                      -- Enable JIT
jit_above_cost = 100000       -- Min cost to use JIT
jit_inline_above_cost = 500000
jit_optimize_above_cost = 500000
```

## Quick Reference

### Production Defaults (SSD)

```sql
# Storage costs (SSD)
random_page_cost = 1.1
effective_io_concurrency = 200

# Memory
effective_cache_size = 12GB   # ~75% of RAM
work_mem = 64MB               # Adjust based on workload
maintenance_work_mem = 1GB

# Parallelism
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
max_parallel_maintenance_workers = 4
```

### Per-Session Tuning

```sql
-- For complex analytical query
SET work_mem = '256MB';
SET enable_seqscan = off;  -- Debug only

-- Reset
RESET work_mem;
RESET enable_seqscan;
```

### EXPLAIN Analysis

```sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;
```

Look for:
- Seq Scan on large tables (may need index)
- High `actual time` vs `estimated`
- Sort spilling to disk (`Sort Method: external merge`)

## Parameter Context

| Parameter | Requires |
|-----------|----------|
| `max_parallel_workers` | Restart |
| `effective_cache_size` | Reload |
| `work_mem` | Session/Reload |
| `random_page_cost` | Session/Reload |
| `enable_*` | Session/Reload |

## See Also

- [table-design.md](table-design.md) — Indexing patterns
- [PostgreSQL EXPLAIN docs](https://www.postgresql.org/docs/current/using-explain.html)
- [PostgreSQL Query Planning docs](https://www.postgresql.org/docs/current/runtime-config-query.html)

```

### references/replication.md

```markdown
# PostgreSQL Replication Settings

Runtime configuration for streaming replication and standbys.

## Primary Server Settings

### WAL Level

```sql
wal_level = replica    -- Default (PG10+), required for replication
wal_level = minimal    -- No replication support
wal_level = logical    -- Required for logical replication
```

### WAL Senders

```sql
max_wal_senders = 10   -- Max concurrent replication connections
```

**Sizing:** Number of standbys + pg_basebackup + buffer

### WAL Retention

```sql
wal_keep_size = 0           -- MB of WAL to retain (PG13+)
max_slot_wal_keep_size = -1 -- Max WAL size per slot (-1 = unlimited)
```

### Replication Slots

```sql
max_replication_slots = 10  -- Max slots
```

**Warning:** Slots prevent WAL removal. Monitor for inactive slots.

### Synchronous Replication

```sql
synchronous_commit = on                -- Wait for local WAL
synchronous_commit = remote_write      -- Wait for standby to receive
synchronous_commit = remote_apply      -- Wait for standby to apply
synchronous_commit = off               -- Async (risk data loss)

synchronous_standby_names = 'standby1,standby2'  -- Priority list
synchronous_standby_names = 'FIRST 2 (s1,s2,s3)' -- Quorum
synchronous_standby_names = 'ANY 2 (s1,s2,s3)'   -- Any 2 of 3
```

## Standby Server Settings

### primary_conninfo

Connection string to primary:

```sql
primary_conninfo = 'host=primary port=5432 user=repl_user password=secret application_name=standby1'
```

### primary_slot_name

Replication slot on primary:

```sql
primary_slot_name = 'standby1_slot'
```

### Hot Standby

```sql
hot_standby = on                    -- Allow read queries on standby
hot_standby_feedback = off          -- Send standby position to primary
```

### Conflict Handling

```sql
max_standby_streaming_delay = 30s   -- Max replay delay before canceling queries
max_standby_archive_delay = 30s     -- Max archive replay delay
```

### Recovery Target (Point-in-Time)

```sql
recovery_target_time = '2024-01-15 10:30:00'
recovery_target_xid = '12345'
recovery_target_lsn = '0/1000000'
recovery_target_name = 'backup_point'
recovery_target_inclusive = true
recovery_target_timeline = 'latest'
recovery_target_action = 'pause'    -- pause, promote, shutdown
```

## Archive Settings

### WAL Archiving

```sql
archive_mode = on
archive_command = 'cp %p /archive/%f'
archive_timeout = 0                  -- Force archive every N seconds (0 = disabled)
```

### Archive Recovery

```sql
restore_command = 'cp /archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /archive %r'
recovery_end_command = ''
```

## Logical Replication

### Publisher

```sql
wal_level = logical                  -- Required
max_replication_slots = 10
max_wal_senders = 10
```

```sql
CREATE PUBLICATION mypub FOR TABLE t1, t2;
CREATE PUBLICATION allpub FOR ALL TABLES;
```

### Subscriber

```sql
CREATE SUBSCRIPTION mysub
  CONNECTION 'host=publisher dbname=mydb user=repl'
  PUBLICATION mypub;
```

### Logical Decoding

```sql
max_logical_replication_workers = 4
max_sync_workers_per_subscription = 2
```

## Quick Reference

### Streaming Replication Setup

**Primary postgresql.conf:**

```sql
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
synchronous_commit = on
```

**Primary pg_hba.conf:**

```text
host replication repl_user standby_ip/32 scram-sha-256
```

**Standby postgresql.conf:**

```sql
primary_conninfo = 'host=primary_ip user=repl_user password=...'
primary_slot_name = 'standby1_slot'
hot_standby = on
```

### Create Replication Slot

```sql
-- Physical
SELECT pg_create_physical_replication_slot('standby1_slot');

-- Logical
SELECT pg_create_logical_replication_slot('mysub', 'pgoutput');
```

### Monitor Replication

```sql
-- On primary
SELECT * FROM pg_stat_replication;
SELECT * FROM pg_replication_slots;

-- Lag calculation
SELECT client_addr,
       pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) AS send_lag,
       pg_wal_lsn_diff(sent_lsn, flush_lsn) AS flush_lag,
       pg_wal_lsn_diff(flush_lsn, replay_lsn) AS replay_lag
FROM pg_stat_replication;
```

### Promote Standby

```sql
SELECT pg_promote();
```

Or:

```bash
pg_ctl promote -D $PGDATA
```

## Parameter Context

| Parameter | Requires |
|-----------|----------|
| `wal_level` | Restart |
| `max_wal_senders` | Restart |
| `max_replication_slots` | Restart |
| `synchronous_commit` | Reload |
| `synchronous_standby_names` | Reload |
| `primary_conninfo` | Reload |
| `hot_standby` | Restart |

## See Also

- [PostgreSQL Streaming Replication docs](https://www.postgresql.org/docs/current/warm-standby.html)
- [PostgreSQL Logical Replication docs](https://www.postgresql.org/docs/current/logical-replication.html)
- [Replication Runtime Config](https://www.postgresql.org/docs/current/runtime-config-replication.html)

```

### references/vacuum.md

```markdown
# PostgreSQL Vacuum Settings

Runtime parameters for manual VACUUM and autovacuum.

## Why Vacuum Matters

VACUUM is essential for PostgreSQL because:
- Reclaims space from dead tuples (deleted/updated rows)
- Updates visibility map for index-only scans
- Prevents transaction ID wraparound
- Updates planner statistics (ANALYZE)

## Cost-Based Vacuum Throttling

VACUUM uses a cost model to limit I/O impact on running queries.

### Cost Settings

```sql
vacuum_cost_delay = 0           -- Delay in ms when cost limit reached (0 = disabled)
vacuum_cost_limit = 200         -- Accumulated cost before sleeping
vacuum_cost_page_hit = 1        -- Cost of page found in shared buffers
vacuum_cost_page_miss = 2       -- Cost of page read from OS cache
vacuum_cost_page_dirty = 20     -- Cost of dirtying a clean page
```

**Tip:** For dedicated maintenance windows, set `vacuum_cost_delay = 0` to run full speed.

## Freeze Settings

Prevent transaction ID wraparound:

```sql
vacuum_freeze_min_age = 50000000      -- Min XID age before freezing
vacuum_freeze_table_age = 150000000   -- Whole-table scan threshold
vacuum_multixact_freeze_min_age = 5000000
vacuum_multixact_freeze_table_age = 150000000
vacuum_failsafe_age = 1600000000      -- Emergency freeze threshold (PG14+)
```

**Warning:** If `vacuum_failsafe_age` is reached, autovacuum becomes aggressive and ignores cost limits.

## Autovacuum Settings

### Enable/Workers

```sql
autovacuum = on                    -- Enable autovacuum
autovacuum_max_workers = 3         -- Max concurrent workers
```

### Timing

```sql
autovacuum_naptime = 1min          -- Time between launcher runs
```

### Thresholds

When to trigger VACUUM:

```sql
autovacuum_vacuum_threshold = 50           -- Min dead tuples
autovacuum_vacuum_scale_factor = 0.2       -- Fraction of table size
autovacuum_vacuum_insert_threshold = 1000  -- Inserts (PG13+)
autovacuum_vacuum_insert_scale_factor = 0.2
```

**Formula:** `threshold + scale_factor * table_size`

With defaults, a 10,000-row table triggers at: `50 + 0.2 * 10000 = 2050` dead tuples.

### ANALYZE Thresholds

```sql
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.1
```

### Autovacuum Cost Limits

```sql
autovacuum_vacuum_cost_delay = 2ms    -- Delay between cost limit hits
autovacuum_vacuum_cost_limit = -1     -- -1 = use vacuum_cost_limit
```

**Note:** Cost limit is shared among all autovacuum workers.

## Per-Table Settings

Override autovacuum settings for specific tables:

```sql
ALTER TABLE hot_table SET (
  autovacuum_vacuum_threshold = 100,
  autovacuum_vacuum_scale_factor = 0.01,
  autovacuum_vacuum_cost_delay = 0,
  autovacuum_analyze_threshold = 100
);

-- Disable autovacuum for specific table (rarely needed)
ALTER TABLE archive_table SET (autovacuum_enabled = false);
```

## Quick Reference

### Production Defaults (OLTP)

```sql
# Increase workers for high-update workloads
autovacuum_max_workers = 5

# More aggressive thresholds
autovacuum_vacuum_scale_factor = 0.05
autovacuum_vacuum_threshold = 50
autovacuum_analyze_scale_factor = 0.02
autovacuum_analyze_threshold = 50

# Faster vacuum (if I/O permits)
autovacuum_vacuum_cost_delay = 2ms
autovacuum_vacuum_cost_limit = 1000
```

### Large Tables

For tables with millions of rows:

```sql
ALTER TABLE large_table SET (
  autovacuum_vacuum_scale_factor = 0.01,  -- 1% instead of 20%
  autovacuum_analyze_scale_factor = 0.005
);
```

### Monitor Vacuum Progress

```sql
-- Current vacuum operations
SELECT * FROM pg_stat_progress_vacuum;

-- Last vacuum times
SELECT relname, last_vacuum, last_autovacuum, 
       last_analyze, last_autoanalyze
FROM pg_stat_user_tables;

-- Dead tuples count
SELECT relname, n_dead_tup, n_live_tup,
       n_dead_tup::float / NULLIF(n_live_tup, 0) AS dead_ratio
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

-- Transaction ID age
SELECT relname, age(relfrozenxid) AS xid_age
FROM pg_class
WHERE relkind = 'r'
ORDER BY age(relfrozenxid) DESC;
```

### Manual Vacuum

```sql
-- Standard vacuum (concurrent)
VACUUM tablename;

-- Vacuum with analyze
VACUUM ANALYZE tablename;

-- Full vacuum (exclusive lock, rewrites table)
VACUUM FULL tablename;

-- Freeze all rows
VACUUM FREEZE tablename;

-- Verbose output
VACUUM VERBOSE tablename;
```

## Troubleshooting

### Autovacuum Not Running

Check:
```sql
SHOW autovacuum;                    -- Is it enabled?
SELECT * FROM pg_stat_activity 
WHERE backend_type = 'autovacuum worker';
```

### Dead Tuples Growing

Possible causes:
- Long-running transactions blocking vacuum
- Thresholds too high for table size
- Autovacuum cost limits too aggressive

Check bloat:
```sql
SELECT relname, n_dead_tup
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;
```

### Transaction ID Wraparound Warning

If you see "database is not accepting commands to avoid wraparound":

```sql
-- Check age
SELECT datname, age(datfrozenxid) FROM pg_database;

-- Emergency vacuum
VACUUM FREEZE;
```

## Parameter Context

| Parameter | Requires |
|-----------|----------|
| `autovacuum` | Reload |
| `autovacuum_max_workers` | Restart |
| `autovacuum_naptime` | Reload |
| `autovacuum_vacuum_*` | Reload |
| `vacuum_cost_*` | Session/Reload |
| `vacuum_freeze_*` | Session/Reload |

## See Also

- [PostgreSQL Vacuum docs](https://www.postgresql.org/docs/current/routine-vacuuming.html)
- [Runtime Config Vacuum](https://www.postgresql.org/docs/current/runtime-config-vacuum.html)

```

### references/error-handling.md

```markdown
# PostgreSQL Error Handling Settings

Runtime parameters controlling error behavior and crash recovery.

## Error Termination

### exit_on_error

Terminate session on any error:

```sql
exit_on_error = off  -- Default
```

When `on`, any ERROR (not just FATAL) terminates the session. Useful for scripts that should abort on first failure.

**Usage:**

```bash
psql -v ON_ERROR_STOP=1 -f script.sql  # psql variable
```

Or per-session:

```sql
SET exit_on_error = on;
```

## Crash Recovery

### restart_after_crash

Auto-restart after backend crash:

```sql
restart_after_crash = on  -- Default
```

Set to `off` when:
- Using external cluster management (Patroni, Pacemaker)
- Clusterware needs to control failover decisions

### data_sync_retry

Retry fsync failures:

```sql
data_sync_retry = off  -- Default
```

**Warning:** Set to `off` (default) in most cases. When `on`, PostgreSQL retries failed fsync calls, which can mask serious storage issues.

Set to `on` only if your storage guarantees data isn't lost on fsync failure (rare).

### recovery_init_sync_method (PG14+)

How to sync data directory before crash recovery:

```sql
recovery_init_sync_method = fsync      -- Default: recursive fsync
recovery_init_sync_method = syncfs     -- Faster on some filesystems
```

`syncfs` is faster but less portable. Use `fsync` for safety.

## Memory Errors

### shared_memory_size_in_huge_pages (PG17+)

Report shared memory allocation:

```sql
SHOW shared_memory_size_in_huge_pages;
```

## Logging (Related)

For error logging configuration, see the logging settings:

```sql
log_min_messages = warning        -- Min severity to log
log_min_error_statement = error   -- Log statement on this severity
log_error_verbosity = default     -- terse, default, verbose
```

## Quick Reference

### Default Settings

```sql
exit_on_error = off              -- Don't exit on ERROR
restart_after_crash = on         -- Auto-restart postmaster
data_sync_retry = off            -- Don't retry fsync
recovery_init_sync_method = fsync
```

### With Cluster Management

When using Patroni, pg_auto_failover, or similar:

```sql
restart_after_crash = off        -- Let clusterware handle restart
```

### Script Execution

For scripts that should stop on first error:

```bash
# psql approach
psql -v ON_ERROR_STOP=1 -c "SELECT 1/0;" -c "SELECT 'this wont run';"

# In script, check $? or use set -e
```

## Parameter Context

| Parameter | Requires |
|-----------|----------|
| `exit_on_error` | Session |
| `restart_after_crash` | Reload |
| `data_sync_retry` | Restart |
| `recovery_init_sync_method` | Restart |

## See Also

- [PostgreSQL Error Handling docs](https://www.postgresql.org/docs/current/runtime-config-error-handling.html)

```

### references/internals.md

```markdown
# PostgreSQL Internals Reference

High-level overview of PostgreSQL internal architecture.

## Query Processing Pipeline

```
SQL Query → Parser → Rewriter → Planner → Executor → Results
```

| Stage | Input | Output | Purpose |
|-------|-------|--------|---------|
| Parser | SQL text | Parse tree | Syntax validation, tokenization |
| Transformation | Parse tree | Query tree | Semantic analysis, type resolution |
| Rewriter | Query tree | Query tree(s) | Rule/view expansion |
| Planner | Query tree | Plan tree | Cost-based optimization |
| Executor | Plan tree | Tuples | Demand-driven execution |

## Connection Model

PostgreSQL uses **process-per-user** model:
- Postmaster listens on TCP port (default 5432)
- Each client connection gets a dedicated backend process
- Backends communicate via shared memory + semaphores

## Parser Stage

Two-phase architecture:
1. **Lexer/Parser** (scan.l + gram.y): Pure syntax, no database access
2. **Transformation** (analyze.c): Semantic analysis, requires catalog lookups

## Planner/Optimizer

Cost-based optimization with:
- **Path generation**: Sequential scans, index scans, bitmap scans
- **Join planning**: Nested loop, merge join, hash join
- **GEQO**: Genetic algorithm for queries with many tables (>12 by default)

### Cost Parameters

| Parameter | Default | Purpose |
|-----------|---------|---------|
| `seq_page_cost` | 1.0 | Sequential page read |
| `random_page_cost` | 4.0 | Random page read |
| `cpu_tuple_cost` | 0.01 | Per-tuple processing |

## Executor

Demand-driven ("volcano") model:
- Plan tree of nodes (Seq Scan, Index Scan, Hash Join, etc.)
- Each node returns one tuple at a time
- Parent nodes call children recursively

## System Catalogs

Metadata stored in regular PostgreSQL tables:

| Catalog | Purpose |
|---------|---------|
| `pg_class` | Tables, indexes, sequences, views |
| `pg_attribute` | Table columns |
| `pg_type` | Data types |
| `pg_proc` | Functions/procedures |
| `pg_index` | Index information |

**Warning**: Direct modification can corrupt the database. Use DDL commands.

## Index Access Methods

| Type | Use Case |
|------|----------|
| B-tree | Equality and range queries (default) |
| GIN | JSONB, arrays, full-text search |
| GiST | Geometric types, ranges |
| SP-GiST | Space-partitioned data |
| BRIN | Large naturally-ordered tables |
| Hash | Equality only |

## Extension Points

| Extension Type | Purpose |
|----------------|---------|
| **Index AM** | Custom index types |
| **Table AM** | Custom storage engines |
| **FDW** | Foreign data wrappers |
| **PL Handler** | Procedural languages |
| **Custom WAL** | Extension logging |

## Error Reporting

```c
ereport(ERROR,
        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
         errmsg("invalid value: %d", value),
         errhint("Value must be positive.")));
```

| Level | Behavior |
|-------|----------|
| `DEBUG1-5` | Debug messages |
| `LOG` | Server log only |
| `WARNING` | Warning, continues |
| `ERROR` | Abort transaction |
| `FATAL` | Abort session |
| `PANIC` | Crash recovery |

## Key Source Directories

| Directory | Contents |
|-----------|----------|
| `src/backend/parser/` | Parser and transformation |
| `src/backend/optimizer/` | Query planner |
| `src/backend/executor/` | Query executor |
| `src/backend/access/` | Access methods |
| `src/backend/catalog/` | System catalogs |

## See Also

- [protocol.md](protocol.md) — Wire protocol details
- [PostgreSQL Internals Docs](https://www.postgresql.org/docs/current/internals.html)

```

### references/protocol.md

```markdown
# PostgreSQL Wire Protocol Reference

Overview of PostgreSQL frontend/backend protocol (version 3.2).

## Protocol Overview

| Property | Value |
|----------|-------|
| Version | 3.2 (PostgreSQL 18+), backward compatible to 3.0 |
| Transport | TCP/IP, Unix domain sockets |
| Default Port | 5432 |
| Byte Order | Big-endian (network order) |

## Message Format

```
┌──────────┬────────────────┬─────────────────────────────────┐
│  Type    │    Length      │           Payload               │
│  (1B)    │    (4B)        │         (Length - 4)            │
└──────────┴────────────────┴─────────────────────────────────┘
```

Length includes itself (4 bytes) but NOT the type byte.

## Connection Startup

```
Client                              Server
   │── StartupMessage ─────────────────►│
   │◄── Authentication* ────────────────│
   │── Password/SASL* ─────────────────►│
   │◄── AuthenticationOk ───────────────│
   │◄── ParameterStatus* ───────────────│
   │◄── BackendKeyData ─────────────────│
   │◄── ReadyForQuery ──────────────────│
```

### ReadyForQuery Status

| Status | Meaning |
|--------|---------|
| `I` | Idle (not in transaction) |
| `T` | In transaction block |
| `E` | Failed transaction |

## Simple Query Protocol

```
Client                              Server
   │── Query ('Q') ─────────────────────►│
   │◄── RowDescription ('T') ───────────│
   │◄── DataRow ('D') × N ──────────────│
   │◄── CommandComplete ('C') ──────────│
   │◄── ReadyForQuery ('Z') ────────────│
```

## Extended Query Protocol

Supports prepared statements and pipelining:

```
Client                              Server
   │── Parse ('P') ─────────────────────►│  (prepare)
   │◄── ParseComplete ('1') ────────────│
   │── Bind ('B') ──────────────────────►│  (bind params)
   │◄── BindComplete ('2') ─────────────│
   │── Execute ('E') ───────────────────►│  (run)
   │◄── DataRow ('D')... ───────────────│
   │◄── CommandComplete ('C') ──────────│
   │── Sync ('S') ──────────────────────►│  (end pipeline)
   │◄── ReadyForQuery ('Z') ────────────│
```

## Authentication Methods

| Subtype | Method |
|---------|--------|
| 0 | AuthenticationOk |
| 3 | CleartextPassword |
| 5 | MD5Password |
| 10 | SASL (SCRAM-SHA-256) |
| 11 | SASLContinue |
| 12 | SASLFinal |

## COPY Protocol

**COPY TO (Server → Client):**
```
CopyOutResponse → CopyData × N → CopyDone → CommandComplete
```

**COPY FROM (Client → Server):**
```
CopyInResponse → CopyData × N → CopyDone/CopyFail → CommandComplete
```

## Cancellation

Cancel requests use a **new connection**:
1. Client opens new connection
2. Sends CancelRequest with PID + secret key (from BackendKeyData)
3. Connection closed immediately (no response)

## SSL/TLS

```
Client                              Server
   │── SSLRequest ──────────────────────►│
   │◄── 'S' or 'N' ─────────────────────│
   │   [TLS handshake if 'S'] ──────────│
   │── StartupMessage ──────────────────►│
```

## Error Response Fields

| Field | Meaning |
|-------|---------|
| `S` | Severity (ERROR, FATAL, WARNING) |
| `C` | SQLSTATE code |
| `M` | Message |
| `D` | Detail |
| `H` | Hint |
| `P` | Position in query |

## Common Message Types

### Frontend → Backend

| Type | Name | Purpose |
|------|------|---------|
| `Q` | Query | Simple query |
| `P` | Parse | Prepare statement |
| `B` | Bind | Bind parameters |
| `E` | Execute | Execute portal |
| `S` | Sync | End pipeline |
| `X` | Terminate | Close connection |

### Backend → Frontend

| Type | Name | Purpose |
|------|------|---------|
| `R` | Authentication | Auth request/response |
| `T` | RowDescription | Column metadata |
| `D` | DataRow | Row data |
| `C` | CommandComplete | Query done |
| `Z` | ReadyForQuery | Ready for command |
| `E` | ErrorResponse | Error |

## See Also

- [internals.md](internals.md) — Query processing overview
- [authentication.md](authentication.md) — pg_hba.conf
- [PostgreSQL Protocol Docs](https://www.postgresql.org/docs/current/protocol.html)

```