SkillHub ClubWrite Technical DocsFull StackTech WriterDesigner

Design Doc Creator

Create AILANG design documents in the correct format and location. Use when user asks to create a design doc, plan a feature, or document a design. Handles both planned/ and implemented/ docs with proper structure.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

Hot score

Updated

March 20, 2026

Overall rating

C2.6

Composite score

2.6

Best-practice grade

C55.6

Install command

npx @skill-hub/cli install sunholo-data-ailang-design-doc-creator

documentationproject-managementautomationformattingsearch

Repository

sunholo-data/ailang

Skill path: .claude/skills/design-doc-creator

Open repository

Best for

Primary workflow: Write Technical Docs.

Technical facets: Full Stack, Tech Writer, Designer.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: sunholo-data.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install Design Doc Creator into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/sunholo-data/ailang before adding Design Doc Creator to shared team environments
Use Design Doc Creator for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: Design Doc Creator
description: Create AILANG design documents in the correct format and location. Use when user asks to create a design doc, plan a feature, or document a design. Handles both planned/ and implemented/ docs with proper structure.
---

# Design Doc Creator

Create well-structured design documents for AILANG features following the project's conventions.

## Quick Start

**Most common usage:**
```bash
# User says: "Create a design doc for better error messages"
# This skill will:
# 1. AUTO-SEARCH for related design docs (Ollama neural embeddings)
# 2. Show matches from implemented/ and planned/ directories
# 3. Auto-populate "Related Documents" section in template
# 4. Proceed automatically (no confirmation needed)
# 5. Create design_docs/planned/better-error-messages.md
# 6. Fill template with proper structure
```

**Automatic Related Doc Search (v0.6.3+):**

When you run the create script, it automatically:
1. Converts doc name to search query (e.g., `m-dx2-better-errors` → `"better errors"`)
2. Runs **both** SimHash (instant) and Neural (better quality) searches
3. Shows results from both methods so you can compare
4. Merges unique results (neural preferred) for the template
5. Auto-populates the "Related Documents" section

```bash
$ .claude/skills/design-doc-creator/scripts/create_planned_doc.sh m-semantic-caching

🔍 Searching for related design docs...

Implemented docs matching "semantic caching":
  [SimHash - instant]
  1. design_docs/implemented/v0_4_0/monomorphization.md (1.00)
  2. design_docs/implemented/v0_3_18/M-DX4-SPRINT-PLAN.md (0.95)
  [Neural - semantic matching]
  1. design_docs/implemented/v0_6_0/m-doc-sem-lazy-embeddings.md (0.45)
  2. design_docs/implemented/v0_6_0/semantic-caching-complete.md (0.42)

Planned docs matching "semantic caching":
  [SimHash - instant]
  1. design_docs/planned/v0_7_0/M-REPL1_persistent_bindings.md (1.00)
  [Neural - semantic matching]
  1. design_docs/planned/v0_7_0/semantic-caching-future.md (0.50)

ℹ Related docs found above - review them after creation if needed.
```

**Why dual search?** SimHash is instant but keyword-dependent. Neural finds semantically related docs even when keywords don't match. You see both so you can judge which results are more relevant.

## When to Use This Skill

Invoke this skill when:
- User asks to "create a design doc" or "write a design doc"
- User says "plan a feature" or "design a feature"
- User mentions "document the design" or "create a spec"
- Before starting implementation of a new feature
- After completing a feature (to move to implemented/)

## Coordinator Integration

**When invoked by the AILANG Coordinator** (detected by GitHub issue reference in the prompt), you MUST output this marker at the end of your response:

```
DESIGN_DOC_PATH: design_docs/planned/vX_Y/design-doc-name.md
```

**Why?** The coordinator uses this marker to:
1. Read the design doc content for GitHub comments
2. Track artifacts across pipeline stages
3. Provide visibility to humans reviewing the issue

**Example completion:**
```
## Design Document Created

I've created the design document...

**DESIGN_DOC_PATH**: `design_docs/planned/v0_6_3/m-feature-design.md`
```

## Available Scripts

### `scripts/create_planned_doc.sh <doc-name> [version]`
Create a new design document in `design_docs/planned/`.

**Version Auto-Detection:**
The script automatically detects the current AILANG version from `CHANGELOG.md` and suggests the next version folder. This prevents accidentally placing docs in wrong version folders.

**Usage:**
```bash
# See current version and suggested target
.claude/skills/design-doc-creator/scripts/create_planned_doc.sh
# Output: Current AILANG version: v0.5.6
#         Suggested next version: v0_5_7

# Create doc in planned/ root (no version)
.claude/skills/design-doc-creator/scripts/create_planned_doc.sh m-dx2-better-errors

# Create doc in next version folder (recommended)
.claude/skills/design-doc-creator/scripts/create_planned_doc.sh reflection-system v0_5_7
```

**What it does:**
- **Searches for related docs** using `ailang docs search --neural` (Ollama embeddings)
- Shows top 3 matches from both `implemented/` and `planned/`
- Auto-populates "Related Documents" section with clickable links
- Detects current version from CHANGELOG.md
- Suggests next patch version for targeting
- Creates design doc from template
- Places in correct directory (planned/ or planned/VERSION/)
- Fills in creation date
- Shows version context in output

### `scripts/move_to_implemented.sh <doc-name> <version>`
Move a design document from planned/ to implemented/ after completion.

**Usage:**
```bash
.claude/skills/design-doc-creator/scripts/move_to_implemented.sh m-dx1-developer-experience v0_3_10
```

**What it does:**
- Finds doc in planned/
- Copies to implemented/VERSION/
- Updates status to "Implemented"
- Updates last modified date
- Provides template for implementation report
- Keeps original until you verify and commit

## Workflow

### Creating a Planned Design Doc

**1. Gather Requirements**

Ask user:
- What feature are you designing?
- What version is this targeted for? (e.g., v0.4.0)
- What priority? (P0/P1/P2)
- Estimated effort? (e.g., 2 days, 1 week)
- Any dependencies on other features?

**⚠️ CRITICAL: Audit for Systemic Issues FIRST**

**Before writing a design doc for a bug fix, ALWAYS ask: "Is this part of a larger pattern?"**

**The Anti-Pattern (incremental special-casing):**
```
v1: Add feature for case A
v2: Bug! Add special case for B
v3: Bug! Add special case for C
v4: Bug! Add special case for D
...forever patching
```

**The Pattern to Follow (unified solutions):**
```
v1: Bug report for case B
    BEFORE writing design doc:
    1. Search for similar code paths
    2. Check if A, C, D have same gap
    3. Design ONE fix covering ALL cases
v2: Unified fix - no future bugs in this area
```

**Concrete Example (M-CODEGEN-UNIFIED-SLICE-CONVERTERS, Dec 2025):**
```
Bug reported: [SolarPlanet] return type panics

❌ Quick fix design doc: Add ConvertToSolarPlanetSlice
   (Will need ConvertToAnotherRecordSlice later...)

✅ Systemic design doc: Audit ALL slice types
   Found: []float64 ALSO broken!
   Found: []*ADTType partially broken!
   One unified fix covers all 3 gaps.
```

**Analysis Checklist (do BEFORE writing design doc):**
- [ ] Is this a one-off or part of a pattern?
- [ ] Search codebase for similar code paths
- [ ] Check if other types/cases have the same gap
- [ ] Look at git history - has this area been patched repeatedly?
- [ ] Design fix to cover ALL cases, not just the reported one

**🔍 Use `ailang docs search` to Check Existing Work:**

Before creating a design doc, search for existing implementations. **Always use `--neural` for best results** - it finds semantically related docs even when keywords don't match exactly.

```bash
# RECOMMENDED: Neural search (default for script, best quality)
ailang docs search --stream implemented --neural "feature description"
ailang docs search --stream planned --neural "feature description"

# Fallback: SimHash search (instant, but keyword-dependent)
ailang docs search --stream implemented "exact keywords"

# Search with JSON output for programmatic use
ailang docs search --stream implemented --neural --json "keywords"
```

**Why neural search is better:**
| Query | SimHash Result | Neural Result |
|-------|---------------|---------------|
| "semantic caching" | Irrelevant docs (keyword mismatch) | `semantic-caching-future.md` (exact match) |
| "type inference" | Docs with "type" in name | Conceptually related type system docs |

**Example workflow:**
```bash
# Before creating "lazy embeddings" design doc:
$ ailang docs search --stream implemented --neural "embedding cache"
🔍 Neural search: "embedding cache"
   SimHash candidates: 200 (from 298 total docs)
   Embeddings: 0 computed, 200 reused (model: ollama:embeddinggemma)

1. design_docs/implemented/v0_6_0/m-doc-sem-lazy-embeddings.md (0.45)
# ⚠️ Already implemented - review existing doc first

$ ailang docs search --stream planned --neural "lazy embedding"
🔍 Neural search: "lazy embedding"

No matching documents found.
# ✅ Safe to create - not planned yet
```

**Key flags:**
- `--stream implemented` - Only search implemented/ directory
- `--stream planned` - Only search planned/ directory
- `--neural` - **Recommended** - semantic embeddings find conceptually similar docs
- `--limit N` - Maximum results to return (default: 10)
- `--json` - JSON output for scripting

**Performance:** Neural search is ~11s on first run (computing embeddings), then ~0.2s cached. Worth the wait for better results.

**Warning Signs of Fragmented Design:**
- Multiple maps tracking similar things (`adtSliceTypes`, `recordTypes`...)
- Switch statements with growing case lists
- Functions named `handleX`, `handleY`, `handleZ` instead of unified `handle`
- Bug fixes that add `|| specialCase` conditions

**When these signs appear:** Expand scope of design doc to unify the system.

**⚠️ IMPORTANT: Axiom Compliance Check**

**Every feature must align with AILANG's 12 Design Axioms.**

The axioms are the non-negotiable principles that define AILANG's semantics. Any feature that violates an axiom must justify why the axiom itself should change.

📖 **Canonical Reference:** [Design Axioms](/docs/references/axioms) on the website

### The 12 Axioms (Quick Reference)

| # | Axiom | Core Principle |
|---|-------|----------------|
| 1 | **Determinism** | Execution must be replayable; nondeterminism is explicit, typed, localized |
| 2 | **Replayability** | Traces are inspectable, serializable, and suitable for verification |
| 3 | **Effects Are Legible** | Side effects are explicit, typed, and machine-readable |
| 4 | **Explicit Authority** | No implicit access; capabilities are declared and constrained |
| 5 | **Bounded Verification** | Local, automatable checks; not whole-program proofs |
| 6 | **Safe Concurrency** | Parallelism must not introduce scheduling-dependent meaning |
| 7 | **Machines First** | Semantic compression, decidable structure, toolability |
| 8 | **Syntax Is Liability** | Only add syntax that reduces ambiguity or improves analysis |
| 9 | **Cost Is Meaning** | Resource implications visible in types and to tools |
| 10 | **Composability** | Features must compose without semantic blind spots |
| 11 | **Failure Is Data** | Errors are structured, typed, inspectable, composable |
| 12 | **System Boundary** | Crossing between intent/execution, authority/action is explicit |

### 🧭 Axiom Scoring Matrix

**Score the proposed feature against each applicable axiom:**

| Axiom | −1 (violates) | 0 (neutral) | +1 (strengthens) |
|-------|---------------|-------------|------------------|
| **A1: Determinism** | adds implicit nondeterminism | — | increases reproducibility |
| **A2: Replayability** | breaks trace reconstruction | — | improves audit/replay |
| **A3: Effect Legibility** | hides side effects | — | makes effects more visible |
| **A4: Explicit Authority** | grants ambient access | — | enforces capability bounds |
| **A5: Bounded Verification** | requires global reasoning | — | enables local checks |
| **A6: Safe Concurrency** | introduces races | — | preserves determinism |
| **A7: Machines First** | optimizes for human prose | — | improves machine analysis |
| **A8: Minimal Syntax** | adds syntactic surface | — | reduces ambiguity |
| **A9: Cost Visibility** | hides resource costs | — | makes costs explicit |
| **A10: Composability** | breaks when combined | — | composes cleanly |
| **A11: Structured Failure** | uses unstructured exceptions | — | typed error handling |
| **A12: System Boundary** | implicit boundary crossing | — | explicit transitions |

### Decision Rules

**Hard violations (automatic rejection):**
- Any score of **−1** on Axioms 1, 3, 4, or 7 → **REJECT** (core invariants)

**Soft scoring:**
- Net score across all axioms **≥ +2**: Move forward to draft
- Net score **0 to +1**: Needs stronger justification
- Net score **< 0**: Reject or redesign

### 💡 Scoring Examples

| Feature | Axiom Scores | Net | Decision |
|---------|--------------|-----|----------|
| ✅ Entry-module Prelude | A7:+1, A8:+1, A1:+1 | **+3** | Accept |
| ✅ Auto-cap inference | A3:+1, A8:+1, A4:0 | **+2** | Accept |
| ✅ Structured error traces | A2:+1, A11:+1, A7:+1 | **+3** | Accept |
| ❌ Global mutable state | A1:−1, A4:−1, A6:−1 | **−3** | Reject (violates A1) |
| ❌ Implicit DB connections | A3:−1, A4:−1, A12:−1 | **−3** | Reject (violates A3, A4) |
| ⚠️ Optional type annotations | A7:0, A5:+1, A8:−1 | **0** | Needs justification |

### 🧠 Philosophical Grounding

AILANG treats computation as **navigation through a fixed semantic structure**, not creation of new possibilities.

Key implications:
- **Effects are permissions**, not actions
- **Traces are primary artifacts**, not debugging aids
- **Cost is physical reality**, not incidental overhead

📖 See [Philosophical Foundations](/docs/references/philosophical-foundations) for the full motivation.

### What AILANG Deliberately Excludes

These are design choices, not limitations:

- ❌ **LSP/IDE servers** — AIs use CLI/API, not text editors
- ❌ **Multiple syntaxes** — one canonical way per concept
- ❌ **Implicit behaviors** — all effects are explicit
- ❌ **Human concurrency patterns** — CSP channels → static task graphs
- ❌ **Familiar syntax** — if it adds entropy, reject it

### Reference Documents

- [Design Axioms](/docs/references/axioms) - The 12 non-negotiable principles
- [Philosophical Foundations](/docs/references/philosophical-foundations) - Block-universe determinism
- [Design Lineage](/docs/references/design-lineage) - What we adopted/rejected and why
- [VISION_BENCHMARKS.md](../../../benchmarks/VISION_BENCHMARKS.md) - Vision goals and success metrics

**2. Choose Document Name**

**Naming conventions:**
- Use lowercase with hyphens: `feature-name.md`
- For milestone features: `m-XXX-feature-name.md` (e.g., `m-dx2-better-errors.md`)
- Be specific and descriptive
- Avoid generic names like `improvements.md`

**3. Run Create Script**

```bash
# If version is known (most cases)
.claude/skills/design-doc-creator/scripts/create_planned_doc.sh feature-name v0_4_0

# If version not decided yet
.claude/skills/design-doc-creator/scripts/create_planned_doc.sh feature-name
```

**4. Read Related Docs Found by Search**

**IMPORTANT:** Before filling in the template, read the top 2-3 related docs found by the search. This ensures your design:
- Builds on existing patterns and conventions
- Avoids duplicating work already done
- References relevant prior art
- Identifies potential conflicts early

```bash
# The script outputs related docs like:
# Implemented docs matching "feature name":
#   [Neural - semantic matching]
#   1. design_docs/implemented/v0_6_0/similar-feature.md (0.45)
#   2. design_docs/implemented/v0_5_0/related-work.md (0.38)

# READ these docs before proceeding:
# - Look at their structure and patterns
# - Note any design decisions that apply
# - Check for overlap with your feature
# - Reference them in your "Related Documents" section
```

**What to look for in related docs:**
- Architecture patterns used
- Testing strategies employed
- Edge cases already handled
- Implementation trade-offs documented
- Success/failure metrics to compare against

**5. Customize the Template**

The script creates a comprehensive template. Fill in:

**Header section:**
- Feature name (replace `[Feature Name]`)
- Status: Leave as "Planned"
- Target: Version number (e.g., v0.4.0)
- Priority: P0 (High), P1 (Medium), or P2 (Low)
- Estimated: Time estimate (e.g., "3 days", "1 week")
- Dependencies: List prerequisite features or "None"

**Problem Statement:**
- Describe current pain points
- Include metrics if available (e.g., "takes 7.5 hours")
- Explain who is affected and how

**Goals:**
- Primary goal: One-sentence main objective
- Success metrics: 3-5 measurable outcomes

**Solution Design:**
- Overview: High-level approach
- Architecture: Technical design
- Implementation plan: Break into phases with tasks
- Files to modify: List new/changed files with LOC estimates

**Examples:**
- Show before/after code or workflows
- Make examples concrete and runnable

**Success Criteria:**
- Checkboxes for acceptance tests
- Include "All tests passing" and "Documentation updated"

**Timeline:**
- Week-by-week breakdown
- Realistic estimates (2x your initial guess!)

**6. Review and Commit**

```bash
git add design_docs/planned/feature-name.md
git commit -m "Add design doc for feature-name"
```

### Moving to Implemented

**When to move:**
- Feature is complete and shipped
- Tests are passing
- Documentation is updated
- Version is tagged/released

**1. Run Move Script**

```bash
.claude/skills/design-doc-creator/scripts/move_to_implemented.sh feature-name v0_3_14
```

**2. Add Implementation Report**

The script provides a template. Add:

**What Was Built:**
- Summary of actual implementation
- Any deviations from plan

**Code Locations:**
- New files created (with LOC)
- Modified files (with +/- LOC)

**Test Coverage:**
- Number of tests
- Coverage percentage
- Test file locations

**Metrics:**
- Before/after comparison table
- Show improvements achieved

**Known Limitations:**
- What's not yet implemented
- Edge cases not handled
- Performance limitations

**3. Update design_docs/README.md**

Add entry under appropriate version:

```markdown
### v0.3.14 - Feature Name (October 2024)
- Brief description of what shipped
- Key improvements
- [CHANGELOG](../CHANGELOG.md#v0314)
```

**4. Commit Changes**

```bash
git add design_docs/implemented/v0_3_14/feature-name.md design_docs/README.md
git commit -m "Move feature-name design doc to implemented (v0.3.14)"
git rm design_docs/planned/feature-name.md
git commit -m "Remove feature-name from planned (moved to implemented)"
```

## Design Doc Structure

See [resources/design_doc_structure.md](resources/design_doc_structure.md) for:
- Complete template breakdown
- Section-by-section guide
- Best practices for each section
- Common mistakes to avoid

## Best Practices

### 1. Be Specific

**Good:**
```markdown
**Primary Goal:** Reduce builtin development time from 7.5h to 2.5h (-67%)
```

**Bad:**
```markdown
**Primary Goal:** Make development easier
```

### 2. Include Metrics

**Good:**
```markdown
**Current State:**
- Development time: 7.5 hours per builtin
- Files to edit: 4
- Type construction: 35 lines
```

**Bad:**
```markdown
**Current State:**
- Development takes a long time
```

### 3. Break Into Phases

**Good:**
```markdown
**Phase 1: Core Registry** (~4 hours)
- [ ] Create BuiltinSpec struct
- [ ] Implement validation
- [ ] Add unit tests

**Phase 2: Type Builder** (~3 hours)
- [ ] Create DSL methods
- [ ] Add fluent API
- [ ] Test with existing builtins
```

**Bad:**
```markdown
**Implementation:**
- Build everything
```

### 4. Link to Examples

**Good:**
```markdown
See existing M-DX1 implementation:
- design_docs/implemented/v0_3_10/M-DX1_developer_experience.md
- internal/builtins/spec.go
```

**Bad:**
```markdown
Similar to other features
```

### 5. Realistic Estimates

**Rule of thumb:**
- 2x your initial estimate (things always take longer)
- Add buffer for testing and documentation
- Include time for unexpected issues

**Good:**
```markdown
**Estimated**: 3 days (6h implementation + 4h testing + 2h docs + buffer)
```

**Bad:**
```markdown
**Estimated**: Should be quick, maybe 2 hours
```

## Document Locations

```
design_docs/
├── planned/              # Future features
│   ├── feature.md        # Unversioned (version TBD)
│   └── v0_4_0/           # Targeted for v0.4.0
│       └── feature.md
└── implemented/          # Completed features
    ├── v0_3_10/          # Shipped in v0.3.10
    │   └── feature.md
    └── v0_3_14/          # Shipped in v0.3.14
        └── feature.md
```

**Version folder naming:**
- Use underscores: `v0_3_14` not `v0.3.14`
- Match CHANGELOG.md tags exactly
- Create folder when first doc needs it

## Progressive Disclosure

This skill loads information progressively:

1. **Always loaded**: This SKILL.md file (workflow overview)
2. **Execute as needed**: Scripts create/move docs
3. **Load on demand**: `resources/design_doc_structure.md` (detailed guide)

## Coordinator Integration (v0.6.2+)

The design-doc-creator skill integrates with the AILANG Coordinator for automated workflows.

### Autonomous Workflow

When configured in `~/.ailang/config.yaml`, the design-doc-creator agent can:
1. Automatically receive GitHub issues via `github_sync`
2. Create design docs from issue descriptions
3. Hand off to sprint-planner on completion

```yaml
coordinator:
  agents:
    - id: design-doc-creator
      inbox: design-doc-creator
      workspace: /path/to/ailang
      capabilities: [research, docs]
      trigger_on_complete: [sprint-planner]
      auto_approve_handoffs: false
      session_continuity: true

  github_sync:
    enabled: true
    interval_secs: 300
    target_inbox: design-doc-creator
```

### Sending Tasks to design-doc-creator

```bash
# Via CLI
ailang messages send design-doc-creator "Create design doc for semantic caching" \
  --title "Feature: Semantic Caching" --from "user"

# The coordinator will:
# 1. Pick up the message
# 2. Create a git worktree
# 3. Execute the design-doc-creator workflow
# 4. Create approval request for human review
# 5. On approval, trigger sprint-planner with handoff message
```

### Handoff Message to sprint-planner

On completion, design-doc-creator sends:
```json
{
  "type": "design_doc_ready",
  "correlation_id": "task-123",
  "design_doc_path": "design_docs/planned/v0_6_3/m-semantic-caching.md",
  "session_id": "claude-session-abc",
  "discovery": "Key findings from GitHub issue analysis"
}
```

### Human-in-the-Loop

With `auto_approve_handoffs: false`:
1. Design doc is created in worktree
2. Approval request shows git diff in dashboard
3. Human reviews design doc quality
4. Approve → Merges to main, triggers sprint-planner
5. Reject → Worktree preserved for manual fixes

## Notes

- All design docs should follow the template structure
- Update CHANGELOG.md when features ship (separate from design doc)
- Link design docs from README.md under version history
- Keep design docs focused - split large features into multiple docs
- Use M-XXX naming for milestone/major features


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### ../../../benchmarks/VISION_BENCHMARKS.md

```markdown
# Vision-Aligned Benchmarks

This document explains how benchmarks test AILANG's vision goals and what metrics track progress toward those goals.

## Vision Goals → Benchmark Mapping

### 1. Explicit State Management (vs Python's Implicit State)

**Vision Problem**: Python's global state makes AI reasoning difficult. Same input → different output.

**Benchmark**: `explicit_state_threading.yml`

**What it tests**:
- ✅ **Behavioral**: Functions return (newState, result) tuples
- ✅ **Observable**: Correct state threading produces correct outputs
- 🔮 **Structural** (future): No global variables used, all state is explicit

**Current eval**: Checks stdout matches expected state progression.

**Enhanced eval**: Could parse AST to verify no global state mutations.

---

### 2. Deterministic Generation (One Canonical Form)

**Vision Problem**: Python has 5+ ways to transform a list. AI picks inconsistently.

**Benchmark**: `deterministic_list_transform.yml`

**What it tests**:
- ✅ **Behavioral**: Correct transformation output
- 🔮 **Consistency** (future): Same prompt → same code structure across runs

**Current eval**: Checks stdout for correct transformed list.

**Enhanced eval**:
- Run same benchmark 10 times with same model/seed
- Measure: `hash(generated_code)` consistency
- Target: ≥95% identical code structure

---

### 3. Effect System Safety (!: IO, FS, Net)

**Benchmarks**:
- `effect_tracking_io_fs.yml` - Basic effect declarations
- `effect_pure_separation.yml` - Pure/effectful separation
- `effect_composition.yml` - Effect propagation

**What they test**:
- ✅ **Behavioral**: Correct execution with effects
- 🔮 **Type-level** (future): Effect signatures match actual behavior

**Current eval**: Checks stdout and file creation.

**Enhanced eval**:
```bash
# Parse generated code to verify:
ailang check-effects generated.ail --verify-signatures
# Returns:
# ✅ computeSum: pure (no effects)
# ✅ printSum: !: IO (correct)
# ✅ processUser: !: IO,FS (correct)
# ❌ readFile: declared pure but does FS (ERROR)
```

**Metrics**:
- **Effect Safety Rate**: % of functions with correct effect declarations
- Target: ≥94% (from vision doc)

---

### 4. Total Functions (Exhaustive Pattern Matching)

**Benchmarks**:
- `exhaustive_pattern_matching.yml`
- `no_runtime_crashes_option.yml`

**What they test**:
- ✅ **Behavioral**: All cases handled, no crashes
- 🔮 **Compile-time** (future): Exhaustiveness verified by type checker

**Current eval**: Checks stdout for all expected outputs.

**Enhanced eval**:
```bash
# Compile generated code, verify exhaustiveness warnings:
ailang compile generated.ail --check-exhaustiveness
# Should report: "Pattern match on Status is exhaustive"
```

**Metrics**:
- **Totality Rate**: % of pattern matches that are exhaustive
- **Crash Rate**: % of runs that crash (should be 0%)

---

### 5. Type Safety (vs Python's Runtime Errors)

**Benchmark**: `type_safe_record_access.yml`

**What it tests**:
- ✅ **Behavioral**: Correct nested field access
- 🔮 **Compile-time** (future): Invalid field access rejected by compiler

**Current eval**: Checks stdout for correct value.

**Enhanced eval**:
```bash
# Test with intentionally broken code:
# Replace: user.profile.city
# With:    user.profile.country (doesn't exist)
ailang compile broken.ail
# Should fail with: "Field 'country' not found in Profile record"
```

**Metrics**:
- **Type Correctness Rate**: % of generated code that type-checks
- Target: ≥98% (from vision doc)

---

### 6. Referential Transparency

**Benchmark**: `referential_transparency.yml`

**What it tests**:
- ✅ **Behavioral**: Same input → same output (3 calls produce identical results)
- 🔮 **Purity** (future): No side effects in pure functions

**Current eval**: Checks that all three calls return same value.

**Enhanced eval**:
```bash
# Run same code 100 times, verify:
# - Same inputs always produce same outputs
# - No randomness, no time dependencies
# - Hash of output is consistent
```

**Metrics**:
- **Determinism Rate**: % of runs with identical outputs
- Target: ≥95% (from vision doc)

---

### 7. Canonical Code Structure

**Benchmark**: `canonical_normalization.yml`

**What it tests**:
- ✅ **Behavioral**: Correct functional composition
- 🔮 **Structural** (future): Code normalizes to identical form

**Current eval**: Checks stdout for correct result.

**Enhanced eval** (v0.4+):
```bash
# Generate code 10 times, normalize all variants:
for i in {1..10}; do
  ailang eval --benchmark canonical_normalization --model gpt5
  ailang normalize generated_$i.ail > normalized_$i.ail
done

# Check: all normalized versions are identical
diff normalized_*.ail
# Should show: 0 differences
```

**Metrics**:
- **Normalizer Lift**: % improvement in consistency after normalization
- Target: +20pp (from vision doc)

---

### 8. Immutable Data Structures

**Benchmark**: `immutable_data_structures.yml`

**What it tests**:
- ✅ **Behavioral**: Updates create new records, originals unchanged
- 🔮 **Immutability** (future): No in-place mutations

**Current eval**: Verifies all three records print correctly.

**Enhanced eval**:
```python
# Static analysis to detect mutations:
detect_mutations(generated_code)
# Flags: record.field = value (mutation!)
# Allows: new_record = {record | field: value} (immutable update)
```

---

## Current Evaluation Capabilities

### ✅ What Works Today (v0.3.14)

**Output Validation**:
```bash
ailang eval-suite --benchmark explicit_state_threading
# ✅ Compares stdout to expected_stdout
# ✅ Measures tokens, duration, cost
# ✅ Tracks success/failure per model
```

**Metrics Available**:
- Final success rate (compile + correct output)
- Zero-shot success (first attempt)
- Repair success rate (after error feedback)
- Token efficiency (AILANG vs Python)

### 🔮 Future Enhancements (v0.4+)

**Structural Validation**:
```bash
ailang eval-suite --validate-structure
# Check:
# - Effect declarations match usage
# - Pattern matches are exhaustive
# - No global state mutations
# - Type safety (all field accesses valid)
```

**Determinism Tracking**:
```bash
ailang eval-suite --runs=10 --measure-determinism
# Run each benchmark 10x with same seed
# Report: code hash consistency, output consistency
```

**Normalization Testing**:
```bash
ailang eval-suite --test-normalization
# Generate code → normalize → compare
# Measure: % of semantically equivalent code that normalizes identically
```

---

## Recommended Eval Enhancements

### Phase 1: Static Analysis (v0.3.15)

Add validation hooks to check generated code properties:

```go
// internal/eval_harness/validators.go
type StructuralValidator interface {
    ValidateEffects(code string) (bool, []string)
    ValidateExhaustiveness(code string) (bool, []string)
    ValidateImmutability(code string) (bool, []string)
}
```

Usage:
```bash
ailang eval-suite --validate=effects,exhaustiveness,immutability
```

### Phase 2: Multi-Run Consistency (v0.3.16)

```bash
ailang eval-suite --determinism-runs=10
# Generates:
# - determinism_report.json (per-benchmark consistency scores)
# - code_variance.md (shows variations in generated code)
```

### Phase 3: Normalization Baseline (v0.4.0)

```bash
ailang eval-baseline --with-normalization
# For each benchmark:
# 1. Generate code (all models)
# 2. Normalize each variant
# 3. Measure semantic equivalence
# 4. Report: normalizer lift (+Xpp improvement)
```

---

## Metrics Tracking

### Current Dashboard (`docs/static/benchmarks/latest.json`)

```json
{
  "aggregates": {
    "finalSuccess": 0.64,
    "zeroShotSuccess": 0.59,
    "repairSuccessRate": 0.14
  }
}
```

### Enhanced Dashboard (Future)

```json
{
  "aggregates": {
    "finalSuccess": 0.64,
    "typeCorrectness": 0.89,      // NEW: % passing type check
    "effectSafety": 0.94,          // NEW: % with correct effects
    "determinismRate": 0.91,       // NEW: consistent outputs
    "normalizerLift": 0.20         // NEW: consistency improvement
  },
  "visionAlignment": {
    "explicitState": 0.85,         // explicit_state_threading success
    "canonicalForm": 0.78,         // deterministic_list_transform success
    "effectTracking": 0.72,        // effect_* benchmarks avg
    "totalFunctions": 0.88,        // exhaustive_* benchmarks avg
    "typeSafety": 0.89,            // type_safe_* benchmarks avg
    "referentialTransparency": 0.91 // referential_transparency success
  }
}
```

---

## Running Vision Benchmarks

### Test All Vision Goals

```bash
# Run all vision-aligned benchmarks:
ailang eval-suite \
  --benchmarks explicit_state_threading,deterministic_list_transform,effect_tracking_io_fs,effect_pure_separation,effect_composition,exhaustive_pattern_matching,type_safe_record_access,referential_transparency,canonical_normalization,no_runtime_crashes_option,immutable_data_structures \
  --models gpt5,claude-sonnet-4-5,gemini-2-5-pro

# Generate vision-specific report:
ailang eval-report eval_results/vision_baseline v0.3.14 --format=vision
```

### Compare AILANG vs Python on Vision Goals

```bash
# Run with both languages:
ailang eval-suite --benchmarks explicit_state_threading --langs python,ailang

# Expected results:
# Python: May use global state (non-idiomatic but works)
# AILANG: Forces explicit state threading (only way to do it)
```

---

## Success Criteria

Based on vision doc targets:

| Metric | Target | Benchmark | Status |
|--------|--------|-----------|--------|
| **Compile Success** | ≥67% | All benchmarks | ⏳ Measuring |
| **Type Correctness** | ≥98% | type_safe_* | 🔮 Need validation |
| **Effect Safety** | ≥94% | effect_* | 🔮 Need validation |
| **Determinism Rate** | ≥95% | referential_transparency | 🔮 Need multi-run |
| **Model Pass Rate** | ≥70% | All benchmarks | ⏳ Measuring |
| **Normalizer Lift** | +20pp | canonical_normalization | 🔮 Need v0.4 |

**Next Steps**:
1. ✅ Run current eval suite on new vision benchmarks
2. 🔮 Add structural validators (v0.3.15)
3. 🔮 Implement multi-run determinism testing (v0.3.16)
4. 🔮 Add normalization baseline (v0.4.0)

---

## Interpreting Results

### High Success = Vision Goal Achieved

If a vision benchmark has >90% success:
- ✅ AI models understand the concept
- ✅ AILANG syntax supports it well
- ✅ Teaching prompt is effective

### Low Success = Opportunity for Improvement

If a vision benchmark has <50% success:
- 🔧 May need syntax improvements
- 🔧 May need better teaching examples
- 🔧 May need tooling support (error messages, suggestions)
- 📊 Still valuable: shows gap between vision and current AI capability

### Python vs AILANG Comparison

**Metrics to watch**:
1. **Token Efficiency**: AILANG should use fewer tokens (more concise)
2. **First-Attempt Success**: AILANG may be lower initially (unfamiliar syntax)
3. **Type Safety**: AILANG should catch errors at compile time (Python: runtime)
4. **Determinism**: AILANG code should be more consistent across runs

---

## Adding New Vision Benchmarks

Template:
```yaml
id: vision_goal_name
description: "Brief explanation of vision goal being tested"
languages: ["python", "ailang"]
entrypoint: "main"
caps: ["IO"]  # or ["IO", "FS", "Net"] as needed
difficulty: "easy|medium|hard"
expected_gain: "low|medium|high|very_high"
task_prompt: |
  Clearly describe the task.

  Emphasize the vision property being tested:
  - Explicit state
  - Effect tracking
  - Exhaustiveness
  - Type safety
  - Referential transparency

  Output only the code, no explanations.
expected_stdout: |
  Expected output
```

**Guidelines**:
- Focus on observable behavior (stdout)
- Highlight properties that differentiate AILANG from Python
- Make success/failure unambiguous
- Include edge cases that test totality

```

### resources/design_doc_structure.md

```markdown
# Design Document Structure Guide

Complete reference for AILANG design documents.

## Template Breakdown

### Header Section

```markdown
# [Feature Name]

**Status**: Planned | Implemented
**Target**: v0.4.0
**Priority**: P0 (High) | P1 (Medium) | P2 (Low)
**Estimated**: 3 days
**Dependencies**: None | Feature X, Feature Y
```

**Purpose**: Quick metadata for understanding scope and priority.

**Tips:**
- Use title case for feature name
- Status starts as "Planned", changes to "Implemented" when done
- Target should be specific version (v0.X.Y)
- Priority determines scheduling (P0 = next release, P1 = soon, P2 = eventually)
- Estimated should be realistic (2x initial guess)
- Dependencies help with scheduling

### Problem Statement

```markdown
## Problem Statement

[What problem does this solve? Why is it needed?]

**Current State:**
- Pain point 1 (with metrics)
- Pain point 2 (with data)
- Pain point 3 (with impact)

**Impact:**
- Who is affected? (developers, users, AI models)
- How significant? (blocker, annoyance, nice-to-have)
```

**Purpose**: Justify why this feature matters.

**Tips:**
- Include real metrics (hours, lines of code, error rates)
- Be specific about pain points
- Quantify impact when possible
- Link to issues or discussions if available

**Good example:**
```markdown
Adding a builtin function currently takes **7.5 hours** due to:
- Scattered registration across 4 files (2+ hours debugging)
- Verbose type construction (1+ hour trial-and-error)
- Poor error messages (1+ hour debugging)

**Impact:** Slows language evolution, hard for new contributors.
```

**Bad example:**
```markdown
Development is slow and hard.
```

### Goals

```markdown
## Goals

**Primary Goal:** [One sentence main objective with measurable outcome]

**Success Metrics:**
- Metric 1 (e.g., reduce time from X to Y)
- Metric 2 (e.g., improve coverage from X% to Y%)
- Metric 3 (e.g., reduce files from X to Y)
```

**Purpose**: Define what success looks like.

**Tips:**
- Primary goal should be measurable
- Success metrics should be objective
- Use before/after comparisons
- Include 3-5 metrics maximum

**Good example:**
```markdown
**Primary Goal:** Reduce builtin development time from 7.5h to 2.5h (-67%)

**Success Metrics:**
- Files touched: 4 → 1 (-75%)
- Type construction LOC: 35 → 10 (-71%)
- Registration errors caught at compile/startup
```

**Bad example:**
```markdown
**Primary Goal:** Make things better

**Success Metrics:**
- Easier to use
- Faster
```

### Solution Design

```markdown
## Solution Design

### Overview

[High-level description in 2-3 paragraphs]

### Architecture

[Technical approach, component breakdown]

**Components:**
1. **Component 1**: Description + responsibility
2. **Component 2**: Description + responsibility
3. **Component 3**: Description + responsibility

### Implementation Plan

**Phase 1: Foundation** (~X hours)
- [ ] Task 1 with clear deliverable
- [ ] Task 2 with clear deliverable
- [ ] Task 3 with clear deliverable

**Phase 2: Integration** (~X hours)
- [ ] Task 1
- [ ] Task 2
- [ ] Task 3

**Phase 3: Polish** (~X hours)
- [ ] Task 1
- [ ] Task 2
- [ ] Task 3
```

**Purpose**: Explain how you'll solve the problem.

**Tips:**
- Overview should be understandable by non-experts
- Break into 3-5 phases maximum
- Each task should have a clear definition of done
- Estimate each phase separately
- Use checkboxes for tracking

**Good example:**
```markdown
### Overview

Create a central builtin registry where all builtins are registered once with:
- Function implementation
- Type signature (using builder DSL)
- Arity and effect metadata
- Validation on startup

### Architecture

**Components:**
1. **BuiltinSpec**: Struct holding all builtin metadata
2. **Registry**: Map of name → spec with validation
3. **Type Builder**: Fluent DSL for type construction
4. **Validator**: Startup checks for consistency

### Implementation Plan

**Phase 1: Registry Core** (~4 hours)
- [ ] Create BuiltinSpec struct
- [ ] Implement Register() function
- [ ] Add validation logic
- [ ] Write unit tests (100% coverage)
```

### Files to Modify/Create

```markdown
### Files to Modify/Create

**New files:**
- `internal/builtins/spec.go` (~300 LOC) - BuiltinSpec struct and registry
- `internal/builtins/validator.go` (~150 LOC) - Validation logic
- `internal/types/builder.go` (~200 LOC) - Type builder DSL

**Modified files:**
- `internal/runtime/builtins.go` (+50/-100 LOC) - Use registry instead of manual registration
- `internal/link/builtin_module.go` (+20/-200 LOC) - Read types from registry
```

**Purpose**: Help estimate scope and identify affected areas.

**Tips:**
- List all files that will change
- Estimate LOC for new files
- Show +/- LOC for modified files
- Include purpose for each file
- Group by package

### Examples

```markdown
## Examples

### Example 1: Adding a String Builtin

**Before (scattered across 4 files):**
```go
// internal/effects/string.go
func strReverse(...) { ... }

// internal/runtime/builtins.go
case "_str_reverse": result = effects.strReverse(...)

// internal/builtins/registry.go
"_str_reverse": {NumArgs: 1, IsPure: true}

// internal/link/builtin_module.go
"_str_reverse": &types.TFunc2{...} // 35 lines
```

**After (single file):**
```go
// internal/builtins/register.go
RegisterEffectBuiltin(BuiltinSpec{
    Name: "_str_reverse",
    NumArgs: 1,
    IsPure: true,
    Type: func() types.Type {
        T := types.NewBuilder()
        return T.Func(T.String()).Returns(T.String())
    },
    Impl: strReverseImpl,
})
```
```

**Purpose**: Show concrete impact of the change.

**Tips:**
- Use real code examples
- Show before/after comparison
- Include 2-3 different use cases
- Make examples runnable
- Highlight key improvements

### Success Criteria

```markdown
## Success Criteria

- [ ] All builtins migrated to new registry
- [ ] `ailang doctor builtins` validates registration
- [ ] Type Builder DSL reduces construction from 35→10 LOC
- [ ] Development time measured at <3 hours for new builtin
- [ ] All tests passing (90%+ coverage on new code)
- [ ] Documentation updated (CLAUDE.md, ADDING_BUILTINS.md)
- [ ] Examples working (`make verify-examples`)
```

**Purpose**: Define what "done" means.

**Tips:**
- Use checkboxes
- Make criteria objective and testable
- Include test coverage requirements
- Always include "All tests passing"
- Always include "Documentation updated"
- Include performance/metric targets

### Testing Strategy

```markdown
## Testing Strategy

**Unit tests:**
- Registry validation (invalid specs rejected)
- Type Builder DSL (all type combinations)
- Each builtin impl (hermetic, mocked effects)

**Integration tests:**
- End-to-end builtin calls from REPL
- Cross-module builtin usage
- Error propagation

**Manual testing:**
- Add new builtin in <3 hours
- Run `ailang doctor builtins`
- Check REPL `:type` output
```

**Purpose**: Ensure quality and catch regressions.

**Tips:**
- Break into unit/integration/manual
- List what each test level covers
- Include acceptance tests
- Specify coverage targets
- Mention test files to create

### Non-Goals

```markdown
## Non-Goals

**Not in this feature:**
- Auto-generated documentation from specs - [Deferred to M-DX2]
- Hot-reload builtins for development - [Out of scope]
- Performance benchmarks - [Not critical]
- CI checks for legacy patterns - [Nice-to-have]
```

**Purpose**: Set boundaries and manage expectations.

**Tips:**
- List what you're NOT doing
- Explain why (deferred, out of scope, not valuable)
- Link to future work if applicable
- Prevents scope creep

### Timeline

```markdown
## Timeline

**Week 1** (12 hours):
- Phase 1: Registry core (4h)
- Phase 2: Type Builder (3h)
- Phase 3: Validation (2h)
- Testing + Polish (3h)

**Week 2** (8 hours):
- Migrate 10 builtins (5h)
- Documentation (2h)
- Final testing (1h)

**Total: ~20 hours across 2 weeks**
```

**Purpose**: Realistic scheduling and resource planning.

**Tips:**
- Break by week or phase
- Include buffer time
- Account for testing and docs
- 2x your initial estimates
- Show cumulative total

### Risks & Mitigations

```markdown
## Risks & Mitigations

| Risk | Impact | Mitigation |
|------|--------|-----------|
| Migration breaks existing builtins | High | Incremental migration, test after each builtin |
| Type Builder too complex | Medium | Start simple, add features as needed |
| Performance regression | Low | Benchmark before/after, registry is cached |
```

**Purpose**: Identify potential issues early.

**Tips:**
- List 3-5 main risks
- Rate impact (High/Medium/Low)
- Include concrete mitigation plan
- Update as risks change

### Axiom Compliance

```markdown
## Axiom Compliance

**Canonical reference:** [Design Axioms](/docs/references/axioms)

### Scoring

| Axiom | Score | Justification |
|-------|-------|---------------|
| A1: Determinism | +1 | Enables reproducible X |
| A2: Replayability | 0 | No impact |
| A3: Effect Legibility | +1 | Makes Y effects explicit |
| A4: Explicit Authority | 0 | No capability changes |
| A5: Bounded Verification | +1 | Enables local Z checks |
| A6: Safe Concurrency | 0 | No concurrency impact |
| A7: Machines First | +1 | Reduces token cost for AI |
| A8: Minimal Syntax | 0 | No new syntax |
| A9: Cost Visibility | 0 | No resource changes |
| A10: Composability | +1 | Composes with existing W |
| A11: Structured Failure | 0 | No error handling changes |
| A12: System Boundary | 0 | No boundary changes |

**Net Score: +5** ✅ Proceed to implementation

### Hard Violation Check

- [ ] A1 (Determinism): No implicit nondeterminism introduced
- [ ] A3 (Effects): No hidden side effects
- [ ] A4 (Authority): No ambient access granted
- [ ] A7 (Machines First): Not optimizing for human convenience over machine analysis
```

**Purpose**: Verify feature aligns with AILANG's core principles.

**Tips:**
- Score EVERY axiom (use 0 for no impact)
- Justify non-zero scores with specific reasoning
- Hard violations (−1 on A1, A3, A4, A7) require redesign
- Net score ≥ +2 to proceed
- Include the hard violation checklist

**Decision thresholds:**
- **≥ +2**: Proceed to draft
- **0 to +1**: Needs stronger justification
- **< 0**: Reject or redesign
- **Any −1 on A1, A3, A4, A7**: Automatic rejection

### References

```markdown
## References

- **Motivation**: `design_docs/planned/easier-ailang-dev.md`
- **Prior art**: Ruby DSL for types, Rust procedural macros
- **Related issues**: #42, #58
- **Discussion**: Slack thread (Oct 10, 2024)
- **Axiom reference**: [Design Axioms](/docs/references/axioms)
```

**Purpose**: Link to context and background.

**Tips:**
- Link to related design docs
- Reference issues/PRs
- Cite prior art or research
- Include discussion links
- Always include axiom reference link

### Future Work

```markdown
## Future Work

[Features that build on this:]
- Auto-documentation generation from BuiltinSpec
- LSP integration for builtin type hints
- Hot-reload for faster development iteration
```

**Purpose**: Capture ideas without committing to them.

**Tips:**
- List logical next steps
- Don't commit to timeline
- Link to non-goals if applicable
- Keep brief (1-2 sentences each)

## Implementation Report (for Implemented docs)

Add this section when moving to `implemented/`:

```markdown
## Implementation Report

**Completed**: 2024-10-15
**Version**: v0.3.10

### What Was Built

[Summary of actual implementation vs plan]
- What changed from original design
- Why changes were made
- What worked well

### Code Locations

**New files:**
- `internal/builtins/spec.go` (342 LOC) - BuiltinSpec and registry
- `internal/builtins/validator.go` (187 LOC) - Validation
- `internal/types/builder.go` (234 LOC) - Type Builder DSL

**Modified files:**
- `internal/runtime/builtins.go` (+45/-120 LOC) - Uses registry
- `internal/link/builtin_module.go` (+18/-215 LOC) - Reads from registry

### Test Coverage

- Unit tests: 57 passing
- Integration tests: 12 passing
- Coverage: 92% on new code
- Test files: `internal/builtins/*_test.go`, `internal/types/builder_test.go`

### Metrics

| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| Development time | 7.5h | 2.5h | -67% |
| Files to edit | 4 | 1 | -75% |
| Type construction LOC | 35 | 10 | -71% |

### Known Limitations

- REPL `:type` command not yet implemented (deferred to M-DX1.6)
- Enhanced diagnostics pending (M-DX1.7)
- `_json_encode` migration pending (complex ADT handling)

### Future Work

See design_docs/planned/m-dx1-future-polish.md for:
- M-DX1.6: REPL `:type` command
- M-DX1.7: Enhanced error diagnostics
- M-DX1.8: Comprehensive documentation
```

**Purpose**: Document what actually happened.

**Tips:**
- Include actual LOC and metrics
- Show before/after data
- List known limitations
- Link to future work
- Reference test files
- Note deviations from plan

## Common Mistakes to Avoid

### 1. Vague Problem Statements

❌ Bad:
```markdown
Development is hard.
```

✅ Good:
```markdown
Adding a builtin takes 7.5 hours due to scattered registration (4 files),
verbose types (35 LOC), and poor errors (1+ hour debugging).
```

### 2. Unmeasurable Goals

❌ Bad:
```markdown
Make development easier and faster.
```

✅ Good:
```markdown
Reduce builtin development time from 7.5h to 2.5h (-67%)
```

### 3. Missing Implementation Details

❌ Bad:
```markdown
Implement the feature.
```

✅ Good:
```markdown
**Phase 1**: Create BuiltinSpec struct (~4h)
- [ ] Define struct in internal/builtins/spec.go
- [ ] Add Register() function
- [ ] Implement validation
- [ ] Write tests (100% coverage)
```

### 4. No Examples

❌ Bad:
```markdown
The new system is better.
```

✅ Good:
```markdown
**Before (4 files, 35 LOC):**
[code example]

**After (1 file, 10 LOC):**
[code example]
```

### 5. Unrealistic Estimates

❌ Bad:
```markdown
Should take about 2 hours.
```

✅ Good:
```markdown
**Estimated**: 3 days (6h implementation + 4h testing + 2h docs + buffer)
```

### 6. Missing Success Criteria

❌ Bad:
```markdown
Feature is done when it works.
```

✅ Good:
```markdown
- [ ] All 49 builtins migrated
- [ ] `ailang doctor builtins` passes
- [ ] Development time < 3 hours (measured)
- [ ] 90%+ test coverage
- [ ] Documentation updated
```

## Version Folder Organization

```
design_docs/
├── planned/
│   ├── unversioned-feature.md       # Version TBD
│   └── v0_4_0/                      # Targeted for v0.4.0
│       ├── reflection-system.md
│       └── schema-registry.md
└── implemented/
    ├── v0_3_10/                     # Shipped in v0.3.10
    │   ├── M-DX1_developer_experience.md
    │   └── M-DASH.md
    ├── v0_3_12/                     # Shipped in v0.3.12
    │   └── show-function-recovery.md
    └── v0_3_14/                     # Shipped in v0.3.14
        └── feature-x.md
```

**Rules:**
- Use underscores in folder names: `v0_3_14`
- Match CHANGELOG.md tags exactly
- Create version folder when first doc needs it
- Move entire folder when version ships
- Keep planned/ docs unversioned if target unclear

## Checklist

Use this when creating a new design doc:

**Before creating:**
- [ ] Feature is well-understood
- [ ] Priority and version target are known
- [ ] Similar features reviewed for patterns
- [ ] Dependencies identified
- [ ] **Axiom quick-check**: Does this obviously violate A1, A3, A4, or A7?

**When creating:**
- [ ] Run `create_planned_doc.sh`
- [ ] Fill in all header metadata
- [ ] Write specific problem statement with metrics
- [ ] Define measurable success criteria
- [ ] Break implementation into phases
- [ ] Include concrete before/after examples
- [ ] Estimate realistically (2x initial guess)
- [ ] List files to create/modify
- [ ] Add testing strategy
- [ ] Define non-goals
- [ ] **Complete Axiom Compliance section** (score all 12 axioms)
- [ ] **Verify net score ≥ +2** (or provide justification)
- [ ] **Check hard violations** (A1, A3, A4, A7 must not be −1)

**After creating:**
- [ ] Review for clarity and completeness
- [ ] Verify axiom scores are justified
- [ ] Get feedback from others if major feature
- [ ] Commit to git
- [ ] Link from related docs if needed

**When moving to implemented:**
- [ ] Run `move_to_implemented.sh`
- [ ] Add implementation report
- [ ] Include actual metrics
- [ ] List known limitations
- [ ] Update design_docs/README.md
- [ ] Update CHANGELOG.md
- [ ] Commit and delete from planned/

```

### scripts/create_planned_doc.sh

```bash
#!/usr/bin/env bash
set -euo pipefail

# Create a new design document in design_docs/planned/
#
# Usage: create_planned_doc.sh <doc-name> [version]
#   doc-name: Lowercase with hyphens (e.g., m-dx2-feature-name)
#   version:  Optional version folder (e.g., v0_4_0)

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)"
DESIGN_DOCS_DIR="$PROJECT_ROOT/design_docs"

# Colors for output
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
CYAN='\033[0;36m'
NC='\033[0m' # No Color

# Get current version from CHANGELOG.md
get_current_version() {
    local CHANGELOG="$PROJECT_ROOT/CHANGELOG.md"
    if [ -f "$CHANGELOG" ]; then
        grep -oE 'v[0-9]+\.[0-9]+\.[0-9]+' "$CHANGELOG" | head -1
    else
        echo "unknown"
    fi
}

# Compute next patch version (e.g., v0.5.6 -> v0_5_7)
get_next_version_folder() {
    local current="$1"
    # Extract major.minor.patch
    local version="${current#v}"  # Remove 'v' prefix
    local major=$(echo "$version" | cut -d. -f1)
    local minor=$(echo "$version" | cut -d. -f2)
    local patch=$(echo "$version" | cut -d. -f3)
    # Increment patch
    local next_patch=$((patch + 1))
    # Return folder format (v0_5_7)
    echo "v${major}_${minor}_${next_patch}"
}

CURRENT_VERSION=$(get_current_version)
NEXT_VERSION_FOLDER=$(get_next_version_folder "$CURRENT_VERSION")

if [ $# -lt 1 ]; then
    echo -e "${RED}✗ Error: Missing required argument${NC}"
    echo ""
    echo -e "${CYAN}Current AILANG version: $CURRENT_VERSION${NC}"
    echo -e "${CYAN}Suggested next version: $NEXT_VERSION_FOLDER${NC}"
    echo ""
    echo "Usage: create_planned_doc.sh <doc-name> [version]"
    echo ""
    echo "Arguments:"
    echo "  doc-name   Document name (lowercase-with-hyphens, e.g., m-dx2-better-errors)"
    echo "  version    Optional version folder (e.g., $NEXT_VERSION_FOLDER)"
    echo ""
    echo "Examples:"
    echo "  create_planned_doc.sh m-dx2-better-errors"
    echo "  create_planned_doc.sh reflection-system $NEXT_VERSION_FOLDER"
    exit 1
fi

DOC_NAME="$1"
VERSION="${2:-}"

# Convert doc-name to search query (replace hyphens with spaces, remove m-prefix)
search_query() {
    local name="$1"
    # Remove common prefixes like m-dx1, m-perf2, etc.
    local cleaned="${name#m-}"
    cleaned="${cleaned#[a-z]*[0-9]-}"
    # Replace hyphens with spaces
    echo "${cleaned//-/ }"
}

SEARCH_QUERY=$(search_query "$DOC_NAME")

# Check for related design docs before creating
# Run both SimHash (instant) and Neural (better quality), combine unique results
echo -e "${CYAN}🔍 Searching for related design docs...${NC}"
echo ""

# Check if ailang is available
if command -v ailang &> /dev/null || [ -x "$PROJECT_ROOT/bin/ailang" ]; then
    AILANG_CMD="${PROJECT_ROOT}/bin/ailang"
    if ! [ -x "$AILANG_CMD" ]; then
        AILANG_CMD="ailang"
    fi

    # Function to merge and deduplicate results (neural results take priority)
    merge_results() {
        local simhash="$1"
        local neural="$2"
        # Combine, dedupe by path. Neural listed first so its scores take priority.
        # Format: "1. path/to/file.md (0.85)" - $2 is the path
        { echo "$neural"; echo "$simhash"; } | grep -E "^[0-9]+\." | \
            awk '{ path = $2; if (path && !seen[path]++) print }' | head -5
    }

    # --- IMPLEMENTED DOCS ---
    echo -e "${YELLOW}Implemented docs matching \"$SEARCH_QUERY\":${NC}"

    # SimHash first (instant feedback)
    echo -e "  ${CYAN}[SimHash - instant]${NC}"
    IMPL_SIMHASH=$("$AILANG_CMD" docs search --stream implemented --limit 5 "$SEARCH_QUERY" 2>/dev/null | grep -E "^\d+\." || echo "")
    if [ -n "$IMPL_SIMHASH" ]; then
        echo "$IMPL_SIMHASH" | head -3 | sed 's/^/  /'
    else
        echo "    (none found)"
    fi

    # Neural search (better quality)
    echo -e "  ${CYAN}[Neural - semantic matching]${NC}"
    IMPL_NEURAL=$("$AILANG_CMD" docs search --stream implemented --neural --limit 5 "$SEARCH_QUERY" 2>/dev/null | grep -E "^\d+\." || echo "")
    if [ -n "$IMPL_NEURAL" ]; then
        echo "$IMPL_NEURAL" | head -3 | sed 's/^/  /'
    else
        echo "    (none found)"
    fi

    # Merge for template (neural preferred)
    IMPLEMENTED=$(merge_results "$IMPL_SIMHASH" "$IMPL_NEURAL")
    [ -z "$IMPLEMENTED" ] && IMPLEMENTED="  (none found)"
    echo ""

    # --- PLANNED DOCS ---
    echo -e "${YELLOW}Planned docs matching \"$SEARCH_QUERY\":${NC}"

    # SimHash first (instant feedback)
    echo -e "  ${CYAN}[SimHash - instant]${NC}"
    PLAN_SIMHASH=$("$AILANG_CMD" docs search --stream planned --limit 5 "$SEARCH_QUERY" 2>/dev/null | grep -E "^\d+\." || echo "")
    if [ -n "$PLAN_SIMHASH" ]; then
        echo "$PLAN_SIMHASH" | head -3 | sed 's/^/  /'
    else
        echo "    (none found)"
    fi

    # Neural search (better quality)
    echo -e "  ${CYAN}[Neural - semantic matching]${NC}"
    PLAN_NEURAL=$("$AILANG_CMD" docs search --stream planned --neural --limit 5 "$SEARCH_QUERY" 2>/dev/null | grep -E "^\d+\." || echo "")
    if [ -n "$PLAN_NEURAL" ]; then
        echo "$PLAN_NEURAL" | head -3 | sed 's/^/  /'
    else
        echo "    (none found)"
    fi

    # Merge for template (neural preferred)
    PLANNED=$(merge_results "$PLAN_SIMHASH" "$PLAN_NEURAL")
    [ -z "$PLANNED" ] && PLANNED="  (none found)"
    echo ""

    # Show info if matches found (no confirmation required - proceed automatically)
    if [ "$IMPLEMENTED" != "  (none found)" ] || [ "$PLANNED" != "  (none found)" ]; then
        echo -e "${CYAN}ℹ Related docs found above - review them after creation if needed.${NC}"
        echo ""
    fi
else
    echo -e "${YELLOW}⚠ ailang not found - skipping related doc search${NC}"
    echo "  Build with: make build"
    echo ""
fi

# Determine target directory
if [ -n "$VERSION" ]; then
    TARGET_DIR="$DESIGN_DOCS_DIR/planned/$VERSION"
    mkdir -p "$TARGET_DIR"
else
    TARGET_DIR="$DESIGN_DOCS_DIR/planned"
fi

DOC_PATH="$TARGET_DIR/${DOC_NAME}.md"

# Check if document already exists
if [ -f "$DOC_PATH" ]; then
    echo -e "${YELLOW}⚠ Warning: Document already exists at $DOC_PATH${NC}"
    echo ""
    read -p "Overwrite? (y/N) " -n 1 -r
    echo
    if [[ ! $REPLY =~ ^[Yy]$ ]]; then
        echo "Cancelled."
        exit 1
    fi
fi

# Get current date
CURRENT_DATE=$(date +%Y-%m-%d)

# Create document from template
cat > "$DOC_PATH" <<'EOF'
# [Feature Name]

**Status**: Planned
**Target**: [Version, e.g., v0.4.0]
**Priority**: [P0/P1/P2 - High/Medium/Low]
**Estimated**: [Time estimate, e.g., 2 days]
**Dependencies**: [None or list other features]

## Axiom Compliance

**Canonical reference:** [Design Axioms](/docs/references/axioms)

Every feature must align with AILANG's 12 Design Axioms. Score each axiom and verify no hard violations.

### Axiom Scoring

| Axiom | Score | Justification |
|-------|-------|---------------|
| A1: Determinism | [+1/0/−1] | [e.g., "Enables reproducible traces"] |
| A2: Replayability | [+1/0/−1] | [e.g., "No impact on traces"] |
| A3: Effect Legibility | [+1/0/−1] | [e.g., "Makes IO effects explicit"] |
| A4: Explicit Authority | [+1/0/−1] | [e.g., "Enforces capability constraints"] |
| A5: Bounded Verification | [+1/0/−1] | [e.g., "Enables local type checks"] |
| A6: Safe Concurrency | [+1/0/−1] | [e.g., "No concurrency changes"] |
| A7: Machines First | [+1/0/−1] | [e.g., "Reduces AI token cost"] |
| A8: Minimal Syntax | [+1/0/−1] | [e.g., "No new syntax required"] |
| A9: Cost Visibility | [+1/0/−1] | [e.g., "Resource costs remain visible"] |
| A10: Composability | [+1/0/−1] | [e.g., "Composes with existing effects"] |
| A11: Structured Failure | [+1/0/−1] | [e.g., "Errors remain typed"] |
| A12: System Boundary | [+1/0/−1] | [e.g., "Boundary crossings explicit"] |

**Net Score: [Total]** → **Decision: [Move forward / Reject / Redesign]**

### Hard Violation Check

**These axioms cannot have −1 scores (automatic rejection):**

- [ ] A1 (Determinism): No implicit nondeterminism introduced
- [ ] A3 (Effects): No hidden side effects
- [ ] A4 (Authority): No ambient access granted
- [ ] A7 (Machines First): Not optimizing for human convenience over machine analysis

### Decision Thresholds

| Net Score | Decision |
|-----------|----------|
| ≥ +2 | ✅ Proceed to implementation |
| 0 to +1 | ⚠️ Needs stronger justification |
| < 0 | ❌ Reject or redesign |
| Any −1 on A1/A3/A4/A7 | ❌ Automatic rejection |

## Problem Statement

[What problem does this solve? Why is it needed?]

**Current State:**
- [Describe current pain points]
- [Include metrics if available]

**Impact:**
- [Who is affected?]
- [How significant is the problem?]

## Goals

**Primary Goal:** [Main objective in one sentence]

**Success Metrics:**
- [Measurable outcome 1]
- [Measurable outcome 2]
- [Measurable outcome 3]

## Solution Design

### Overview

[High-level description of the solution]

### Architecture

[Describe the technical approach]

**Components:**
1. **Component 1**: [Description]
2. **Component 2**: [Description]
3. **Component 3**: [Description]

### Implementation Plan

**Phase 1: [Name]** (~X hours)
- [ ] Task 1
- [ ] Task 2
- [ ] Task 3

**Phase 2: [Name]** (~X hours)
- [ ] Task 1
- [ ] Task 2
- [ ] Task 3

**Phase 3: [Name]** (~X hours)
- [ ] Task 1
- [ ] Task 2
- [ ] Task 3

### Files to Modify/Create

**New files:**
- `path/to/new_file.go` - [Purpose, ~XXX LOC]

**Modified files:**
- `path/to/existing_file.go` - [Changes needed, ~XXX LOC]

## Examples

### Example 1: [Use Case]

**Before:**
```
[Code or workflow before the change]
```

**After:**
```
[Code or workflow after the change]
```

### Example 2: [Use Case]

[Additional examples as needed]

## Success Criteria

- [ ] Criterion 1 (with acceptance test)
- [ ] Criterion 2 (with acceptance test)
- [ ] Criterion 3 (with acceptance test)
- [ ] All tests passing
- [ ] Documentation updated
- [ ] Examples added

## Testing Strategy

**Unit tests:**
- [What to test]

**Integration tests:**
- [What to test]

**Manual testing:**
- [What to verify manually]

## Non-Goals

**Not in this feature:**
- [Thing 1] - [Why deferred]
- [Thing 2] - [Why out of scope]

## Timeline

**Week 1** (X hours):
- Phase 1 implementation

**Week 2** (X hours):
- Phase 2 implementation
- Testing

**Week 3** (X hours):
- Phase 3 implementation
- Documentation
- Release

**Total: ~X hours across Y weeks**

## Risks & Mitigations

| Risk | Impact | Mitigation |
|------|--------|-----------|
| [Risk 1] | [High/Med/Low] | [How to address] |
| [Risk 2] | [High/Med/Low] | [How to address] |

## Related Documents

<!-- Auto-populated by Ollama neural search on "SEARCH_QUERY_PLACEHOLDER" -->

**Implemented (may inform design):**
RELATED_IMPLEMENTED_PLACEHOLDER

**Planned (check for overlap):**
RELATED_PLANNED_PLACEHOLDER

## References

- [Design Axioms](/docs/references/axioms) - The 12 non-negotiable principles
- [Philosophical Foundations](/docs/references/philosophical-foundations) - Block-universe determinism
- [Design Lineage](/docs/references/design-lineage) - What we adopted/rejected and why
- [Link to related design docs]
- [Link to issues or discussions]

## Future Work

[Features that build on this but are out of scope for now]

---

**Document created**: CURRENT_DATE
**Last updated**: CURRENT_DATE
EOF

# Replace CURRENT_DATE with actual date
sed -i.bak "s/CURRENT_DATE/$CURRENT_DATE/g" "$DOC_PATH"
rm "${DOC_PATH}.bak"

# Replace search query placeholder
sed -i.bak "s/SEARCH_QUERY_PLACEHOLDER/$SEARCH_QUERY/g" "$DOC_PATH"
rm "${DOC_PATH}.bak"

# Format search results for markdown
format_results() {
    local results="$1"
    if [ "$results" = "  (none found)" ] || [ -z "$results" ]; then
        echo "- (none found)"
    else
        # Convert numbered list to markdown links
        echo "$results" | head -3 | while read -r line; do
            # Extract path and score from format "1. path/to/doc.md (0.85)"
            local path=$(echo "$line" | sed 's/^[0-9]*\. //' | sed 's/ ([0-9.]*)//')
            local score=$(echo "$line" | grep -oE '\([0-9.]+\)' || echo "")
            if [ -n "$path" ]; then
                echo "- [$path]($path) $score"
            fi
        done
    fi
}

# Replace related docs placeholders
IMPL_FORMATTED=$(format_results "$IMPLEMENTED")
PLAN_FORMATTED=$(format_results "$PLANNED")

# Use perl for multi-line replacement (more portable than sed)
perl -i -pe "s|RELATED_IMPLEMENTED_PLACEHOLDER|$IMPL_FORMATTED|g" "$DOC_PATH" 2>/dev/null || \
    sed -i.bak "s|RELATED_IMPLEMENTED_PLACEHOLDER|$IMPL_FORMATTED|g" "$DOC_PATH" && rm -f "${DOC_PATH}.bak"

perl -i -pe "s|RELATED_PLANNED_PLACEHOLDER|$PLAN_FORMATTED|g" "$DOC_PATH" 2>/dev/null || \
    sed -i.bak "s|RELATED_PLANNED_PLACEHOLDER|$PLAN_FORMATTED|g" "$DOC_PATH" && rm -f "${DOC_PATH}.bak"

# Convert to relative path for coordinator marker
RELATIVE_PATH="${DOC_PATH#$PROJECT_ROOT/}"

# Success message
echo -e "${GREEN}✓ Created design document:${NC}"
echo "  $DOC_PATH"
echo ""
echo -e "${CYAN}Version context:${NC}"
echo "  Current AILANG: $CURRENT_VERSION"
echo "  Next version:   $NEXT_VERSION_FOLDER"
if [ -n "$VERSION" ]; then
    echo "  Doc target:     $VERSION"
fi
echo ""
echo -e "${GREEN}Next steps:${NC}"
echo "  1. Edit $DOC_PATH to fill in the template"
echo "  2. Replace [placeholders] with actual content"
echo "  3. Commit when ready: git add $DOC_PATH"
echo ""
echo -e "${YELLOW}Pro tips:${NC}"
echo "  - Complete the Axiom Compliance section (score all 12 axioms)"
echo "  - Hard violations on A1/A3/A4/A7 = automatic rejection"
echo "  - Net axiom score must be ≥ +2 to proceed"
echo "  - Use M-XXX naming for milestone features"
echo "  - Include concrete examples and metrics"
echo "  - Keep estimates realistic (2x your initial guess)"
echo ""
# Output coordinator markers (deterministic - script knows exactly what was created)
echo "---"
echo "DESIGN_DOC_PATH: $RELATIVE_PATH"

```

### scripts/move_to_implemented.sh

```bash
#!/usr/bin/env bash
set -euo pipefail

# Move a design document from planned/ to implemented/
#
# Usage: move_to_implemented.sh <doc-name> <version>
#   doc-name: Name of the doc in planned/ (without .md extension)
#   version:  Target version folder (e.g., v0_3_14)

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)"
DESIGN_DOCS_DIR="$PROJECT_ROOT/design_docs"

# Colors for output
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
NC='\033[0m' # No Color

if [ $# -lt 2 ]; then
    echo -e "${RED}✗ Error: Missing required arguments${NC}"
    echo ""
    echo "Usage: move_to_implemented.sh <doc-name> <version>"
    echo ""
    echo "Arguments:"
    echo "  doc-name   Document name in planned/ (without .md)"
    echo "  version    Version folder (e.g., v0_3_14)"
    echo ""
    echo "Examples:"
    echo "  move_to_implemented.sh m-dx1-developer-experience v0_3_10"
    echo "  move_to_implemented.sh reflection-system v0_4_0"
    exit 1
fi

DOC_NAME="$1"
VERSION="$2"

# Find source document (check both planned/ root and version folders)
SOURCE_PATH=""
if [ -f "$DESIGN_DOCS_DIR/planned/${DOC_NAME}.md" ]; then
    SOURCE_PATH="$DESIGN_DOCS_DIR/planned/${DOC_NAME}.md"
elif [ -f "$DESIGN_DOCS_DIR/planned/v0_4_0/${DOC_NAME}.md" ]; then
    SOURCE_PATH="$DESIGN_DOCS_DIR/planned/v0_4_0/${DOC_NAME}.md"
else
    # Search for it
    FOUND=$(find "$DESIGN_DOCS_DIR/planned" -name "${DOC_NAME}.md" 2>/dev/null | head -1)
    if [ -n "$FOUND" ]; then
        SOURCE_PATH="$FOUND"
    else
        echo -e "${RED}✗ Error: Document not found in planned/: ${DOC_NAME}.md${NC}"
        echo ""
        echo "Available docs in planned/:"
        find "$DESIGN_DOCS_DIR/planned" -name "*.md" -type f | sed 's|.*/||' | sort
        exit 1
    fi
fi

# Create target directory
TARGET_DIR="$DESIGN_DOCS_DIR/implemented/$VERSION"
mkdir -p "$TARGET_DIR"

TARGET_PATH="$TARGET_DIR/${DOC_NAME}.md"

# Check if already exists in implemented
if [ -f "$TARGET_PATH" ]; then
    echo -e "${YELLOW}⚠ Warning: Document already exists in implemented/$VERSION/${NC}"
    echo ""
    read -p "Overwrite? (y/N) " -n 1 -r
    echo
    if [[ ! $REPLY =~ ^[Yy]$ ]]; then
        echo "Cancelled."
        exit 1
    fi
fi

# Get current date for metadata update
CURRENT_DATE=$(date +%Y-%m-%d)

# Copy file (don't delete from planned yet - let user review first)
cp "$SOURCE_PATH" "$TARGET_PATH"

# Update status in the document
if grep -q "^\*\*Status\*\*:" "$TARGET_PATH"; then
    # Update existing status line
    sed -i.bak "s/^\*\*Status\*\*:.*/\*\*Status\*\*: Implemented/" "$TARGET_PATH"
    rm "${TARGET_PATH}.bak" 2>/dev/null || true
fi

# Update last updated date
if grep -q "^\*\*Last updated\*\*:" "$TARGET_PATH"; then
    sed -i.bak "s/^\*\*Last updated\*\*:.*/\*\*Last updated\*\*: $CURRENT_DATE/" "$TARGET_PATH"
    rm "${TARGET_PATH}.bak" 2>/dev/null || true
fi

# Success message
echo -e "${GREEN}✓ Copied design document to implemented:${NC}"
echo "  From: $SOURCE_PATH"
echo "  To:   $TARGET_PATH"
echo ""
echo -e "${YELLOW}Next steps:${NC}"
echo "  1. Review the document at $TARGET_PATH"
echo "  2. Add implementation report section:"
echo "     - What was actually built"
echo "     - Code locations (internal/...)"
echo "     - Test coverage metrics"
echo "     - Known limitations"
echo "  3. Update design_docs/README.md with version history"
echo "  4. Commit changes: git add $TARGET_PATH design_docs/README.md"
echo "  5. AFTER committing, delete original:"
echo "     git rm $SOURCE_PATH"
echo ""
echo -e "${GREEN}Template for implementation report:${NC}"
echo ""
cat <<'REPORT'
## Implementation Report

**Completed**: [Date]
**Version**: [Version number from CHANGELOG]

### What Was Built

[Summary of what was actually implemented vs planned]

### Code Locations

**New files:**
- `internal/path/file.go` (XXX LOC) - [Purpose]

**Modified files:**
- `internal/path/existing.go` (+XX/-YY LOC) - [Changes]

### Test Coverage

- Unit tests: XX passing
- Integration tests: YY passing
- Coverage: ZZ%
- Test files: `internal/path/*_test.go`

### Metrics

| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| [Metric 1] | X | Y | +Z% |
| [Metric 2] | X | Y | +Z% |

### Known Limitations

- [Limitation 1] - [Why/when to address]
- [Limitation 2] - [Why/when to address]

### Future Work

[Features deferred to later versions]

REPORT

```