Design Doc Creator
Create AILANG design documents in the correct format and location. Use when user asks to create a design doc, plan a feature, or document a design. Handles both planned/ and implemented/ docs with proper structure.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install sunholo-data-ailang-design-doc-creator
Repository
Skill path: .claude/skills/design-doc-creator
Create AILANG design documents in the correct format and location. Use when user asks to create a design doc, plan a feature, or document a design. Handles both planned/ and implemented/ docs with proper structure.
Open repositoryBest for
Primary workflow: Write Technical Docs.
Technical facets: Full Stack, Tech Writer, Designer.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: sunholo-data.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install Design Doc Creator into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/sunholo-data/ailang before adding Design Doc Creator to shared team environments
- Use Design Doc Creator for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: Design Doc Creator
description: Create AILANG design documents in the correct format and location. Use when user asks to create a design doc, plan a feature, or document a design. Handles both planned/ and implemented/ docs with proper structure.
---
# Design Doc Creator
Create well-structured design documents for AILANG features following the project's conventions.
## Quick Start
**Most common usage:**
```bash
# User says: "Create a design doc for better error messages"
# This skill will:
# 1. AUTO-SEARCH for related design docs (Ollama neural embeddings)
# 2. Show matches from implemented/ and planned/ directories
# 3. Auto-populate "Related Documents" section in template
# 4. Proceed automatically (no confirmation needed)
# 5. Create design_docs/planned/better-error-messages.md
# 6. Fill template with proper structure
```
**Automatic Related Doc Search (v0.6.3+):**
When you run the create script, it automatically:
1. Converts doc name to search query (e.g., `m-dx2-better-errors` ā `"better errors"`)
2. Runs **both** SimHash (instant) and Neural (better quality) searches
3. Shows results from both methods so you can compare
4. Merges unique results (neural preferred) for the template
5. Auto-populates the "Related Documents" section
```bash
$ .claude/skills/design-doc-creator/scripts/create_planned_doc.sh m-semantic-caching
š Searching for related design docs...
Implemented docs matching "semantic caching":
[SimHash - instant]
1. design_docs/implemented/v0_4_0/monomorphization.md (1.00)
2. design_docs/implemented/v0_3_18/M-DX4-SPRINT-PLAN.md (0.95)
[Neural - semantic matching]
1. design_docs/implemented/v0_6_0/m-doc-sem-lazy-embeddings.md (0.45)
2. design_docs/implemented/v0_6_0/semantic-caching-complete.md (0.42)
Planned docs matching "semantic caching":
[SimHash - instant]
1. design_docs/planned/v0_7_0/M-REPL1_persistent_bindings.md (1.00)
[Neural - semantic matching]
1. design_docs/planned/v0_7_0/semantic-caching-future.md (0.50)
ā¹ Related docs found above - review them after creation if needed.
```
**Why dual search?** SimHash is instant but keyword-dependent. Neural finds semantically related docs even when keywords don't match. You see both so you can judge which results are more relevant.
## When to Use This Skill
Invoke this skill when:
- User asks to "create a design doc" or "write a design doc"
- User says "plan a feature" or "design a feature"
- User mentions "document the design" or "create a spec"
- Before starting implementation of a new feature
- After completing a feature (to move to implemented/)
## Coordinator Integration
**When invoked by the AILANG Coordinator** (detected by GitHub issue reference in the prompt), you MUST output this marker at the end of your response:
```
DESIGN_DOC_PATH: design_docs/planned/vX_Y/design-doc-name.md
```
**Why?** The coordinator uses this marker to:
1. Read the design doc content for GitHub comments
2. Track artifacts across pipeline stages
3. Provide visibility to humans reviewing the issue
**Example completion:**
```
## Design Document Created
I've created the design document...
**DESIGN_DOC_PATH**: `design_docs/planned/v0_6_3/m-feature-design.md`
```
## Available Scripts
### `scripts/create_planned_doc.sh <doc-name> [version]`
Create a new design document in `design_docs/planned/`.
**Version Auto-Detection:**
The script automatically detects the current AILANG version from `CHANGELOG.md` and suggests the next version folder. This prevents accidentally placing docs in wrong version folders.
**Usage:**
```bash
# See current version and suggested target
.claude/skills/design-doc-creator/scripts/create_planned_doc.sh
# Output: Current AILANG version: v0.5.6
# Suggested next version: v0_5_7
# Create doc in planned/ root (no version)
.claude/skills/design-doc-creator/scripts/create_planned_doc.sh m-dx2-better-errors
# Create doc in next version folder (recommended)
.claude/skills/design-doc-creator/scripts/create_planned_doc.sh reflection-system v0_5_7
```
**What it does:**
- **Searches for related docs** using `ailang docs search --neural` (Ollama embeddings)
- Shows top 3 matches from both `implemented/` and `planned/`
- Auto-populates "Related Documents" section with clickable links
- Detects current version from CHANGELOG.md
- Suggests next patch version for targeting
- Creates design doc from template
- Places in correct directory (planned/ or planned/VERSION/)
- Fills in creation date
- Shows version context in output
### `scripts/move_to_implemented.sh <doc-name> <version>`
Move a design document from planned/ to implemented/ after completion.
**Usage:**
```bash
.claude/skills/design-doc-creator/scripts/move_to_implemented.sh m-dx1-developer-experience v0_3_10
```
**What it does:**
- Finds doc in planned/
- Copies to implemented/VERSION/
- Updates status to "Implemented"
- Updates last modified date
- Provides template for implementation report
- Keeps original until you verify and commit
## Workflow
### Creating a Planned Design Doc
**1. Gather Requirements**
Ask user:
- What feature are you designing?
- What version is this targeted for? (e.g., v0.4.0)
- What priority? (P0/P1/P2)
- Estimated effort? (e.g., 2 days, 1 week)
- Any dependencies on other features?
**ā ļø CRITICAL: Audit for Systemic Issues FIRST**
**Before writing a design doc for a bug fix, ALWAYS ask: "Is this part of a larger pattern?"**
**The Anti-Pattern (incremental special-casing):**
```
v1: Add feature for case A
v2: Bug! Add special case for B
v3: Bug! Add special case for C
v4: Bug! Add special case for D
...forever patching
```
**The Pattern to Follow (unified solutions):**
```
v1: Bug report for case B
BEFORE writing design doc:
1. Search for similar code paths
2. Check if A, C, D have same gap
3. Design ONE fix covering ALL cases
v2: Unified fix - no future bugs in this area
```
**Concrete Example (M-CODEGEN-UNIFIED-SLICE-CONVERTERS, Dec 2025):**
```
Bug reported: [SolarPlanet] return type panics
ā Quick fix design doc: Add ConvertToSolarPlanetSlice
(Will need ConvertToAnotherRecordSlice later...)
ā
Systemic design doc: Audit ALL slice types
Found: []float64 ALSO broken!
Found: []*ADTType partially broken!
One unified fix covers all 3 gaps.
```
**Analysis Checklist (do BEFORE writing design doc):**
- [ ] Is this a one-off or part of a pattern?
- [ ] Search codebase for similar code paths
- [ ] Check if other types/cases have the same gap
- [ ] Look at git history - has this area been patched repeatedly?
- [ ] Design fix to cover ALL cases, not just the reported one
**š Use `ailang docs search` to Check Existing Work:**
Before creating a design doc, search for existing implementations. **Always use `--neural` for best results** - it finds semantically related docs even when keywords don't match exactly.
```bash
# RECOMMENDED: Neural search (default for script, best quality)
ailang docs search --stream implemented --neural "feature description"
ailang docs search --stream planned --neural "feature description"
# Fallback: SimHash search (instant, but keyword-dependent)
ailang docs search --stream implemented "exact keywords"
# Search with JSON output for programmatic use
ailang docs search --stream implemented --neural --json "keywords"
```
**Why neural search is better:**
| Query | SimHash Result | Neural Result |
|-------|---------------|---------------|
| "semantic caching" | Irrelevant docs (keyword mismatch) | `semantic-caching-future.md` (exact match) |
| "type inference" | Docs with "type" in name | Conceptually related type system docs |
**Example workflow:**
```bash
# Before creating "lazy embeddings" design doc:
$ ailang docs search --stream implemented --neural "embedding cache"
š Neural search: "embedding cache"
SimHash candidates: 200 (from 298 total docs)
Embeddings: 0 computed, 200 reused (model: ollama:embeddinggemma)
1. design_docs/implemented/v0_6_0/m-doc-sem-lazy-embeddings.md (0.45)
# ā ļø Already implemented - review existing doc first
$ ailang docs search --stream planned --neural "lazy embedding"
š Neural search: "lazy embedding"
No matching documents found.
# ā
Safe to create - not planned yet
```
**Key flags:**
- `--stream implemented` - Only search implemented/ directory
- `--stream planned` - Only search planned/ directory
- `--neural` - **Recommended** - semantic embeddings find conceptually similar docs
- `--limit N` - Maximum results to return (default: 10)
- `--json` - JSON output for scripting
**Performance:** Neural search is ~11s on first run (computing embeddings), then ~0.2s cached. Worth the wait for better results.
**Warning Signs of Fragmented Design:**
- Multiple maps tracking similar things (`adtSliceTypes`, `recordTypes`...)
- Switch statements with growing case lists
- Functions named `handleX`, `handleY`, `handleZ` instead of unified `handle`
- Bug fixes that add `|| specialCase` conditions
**When these signs appear:** Expand scope of design doc to unify the system.
**ā ļø IMPORTANT: Axiom Compliance Check**
**Every feature must align with AILANG's 12 Design Axioms.**
The axioms are the non-negotiable principles that define AILANG's semantics. Any feature that violates an axiom must justify why the axiom itself should change.
š **Canonical Reference:** [Design Axioms](/docs/references/axioms) on the website
### The 12 Axioms (Quick Reference)
| # | Axiom | Core Principle |
|---|-------|----------------|
| 1 | **Determinism** | Execution must be replayable; nondeterminism is explicit, typed, localized |
| 2 | **Replayability** | Traces are inspectable, serializable, and suitable for verification |
| 3 | **Effects Are Legible** | Side effects are explicit, typed, and machine-readable |
| 4 | **Explicit Authority** | No implicit access; capabilities are declared and constrained |
| 5 | **Bounded Verification** | Local, automatable checks; not whole-program proofs |
| 6 | **Safe Concurrency** | Parallelism must not introduce scheduling-dependent meaning |
| 7 | **Machines First** | Semantic compression, decidable structure, toolability |
| 8 | **Syntax Is Liability** | Only add syntax that reduces ambiguity or improves analysis |
| 9 | **Cost Is Meaning** | Resource implications visible in types and to tools |
| 10 | **Composability** | Features must compose without semantic blind spots |
| 11 | **Failure Is Data** | Errors are structured, typed, inspectable, composable |
| 12 | **System Boundary** | Crossing between intent/execution, authority/action is explicit |
### š§ Axiom Scoring Matrix
**Score the proposed feature against each applicable axiom:**
| Axiom | ā1 (violates) | 0 (neutral) | +1 (strengthens) |
|-------|---------------|-------------|------------------|
| **A1: Determinism** | adds implicit nondeterminism | ā | increases reproducibility |
| **A2: Replayability** | breaks trace reconstruction | ā | improves audit/replay |
| **A3: Effect Legibility** | hides side effects | ā | makes effects more visible |
| **A4: Explicit Authority** | grants ambient access | ā | enforces capability bounds |
| **A5: Bounded Verification** | requires global reasoning | ā | enables local checks |
| **A6: Safe Concurrency** | introduces races | ā | preserves determinism |
| **A7: Machines First** | optimizes for human prose | ā | improves machine analysis |
| **A8: Minimal Syntax** | adds syntactic surface | ā | reduces ambiguity |
| **A9: Cost Visibility** | hides resource costs | ā | makes costs explicit |
| **A10: Composability** | breaks when combined | ā | composes cleanly |
| **A11: Structured Failure** | uses unstructured exceptions | ā | typed error handling |
| **A12: System Boundary** | implicit boundary crossing | ā | explicit transitions |
### Decision Rules
**Hard violations (automatic rejection):**
- Any score of **ā1** on Axioms 1, 3, 4, or 7 ā **REJECT** (core invariants)
**Soft scoring:**
- Net score across all axioms **ā„ +2**: Move forward to draft
- Net score **0 to +1**: Needs stronger justification
- Net score **< 0**: Reject or redesign
### š” Scoring Examples
| Feature | Axiom Scores | Net | Decision |
|---------|--------------|-----|----------|
| ā
Entry-module Prelude | A7:+1, A8:+1, A1:+1 | **+3** | Accept |
| ā
Auto-cap inference | A3:+1, A8:+1, A4:0 | **+2** | Accept |
| ā
Structured error traces | A2:+1, A11:+1, A7:+1 | **+3** | Accept |
| ā Global mutable state | A1:ā1, A4:ā1, A6:ā1 | **ā3** | Reject (violates A1) |
| ā Implicit DB connections | A3:ā1, A4:ā1, A12:ā1 | **ā3** | Reject (violates A3, A4) |
| ā ļø Optional type annotations | A7:0, A5:+1, A8:ā1 | **0** | Needs justification |
### š§ Philosophical Grounding
AILANG treats computation as **navigation through a fixed semantic structure**, not creation of new possibilities.
Key implications:
- **Effects are permissions**, not actions
- **Traces are primary artifacts**, not debugging aids
- **Cost is physical reality**, not incidental overhead
š See [Philosophical Foundations](/docs/references/philosophical-foundations) for the full motivation.
### What AILANG Deliberately Excludes
These are design choices, not limitations:
- ā **LSP/IDE servers** ā AIs use CLI/API, not text editors
- ā **Multiple syntaxes** ā one canonical way per concept
- ā **Implicit behaviors** ā all effects are explicit
- ā **Human concurrency patterns** ā CSP channels ā static task graphs
- ā **Familiar syntax** ā if it adds entropy, reject it
### Reference Documents
- [Design Axioms](/docs/references/axioms) - The 12 non-negotiable principles
- [Philosophical Foundations](/docs/references/philosophical-foundations) - Block-universe determinism
- [Design Lineage](/docs/references/design-lineage) - What we adopted/rejected and why
- [VISION_BENCHMARKS.md](../../../benchmarks/VISION_BENCHMARKS.md) - Vision goals and success metrics
**2. Choose Document Name**
**Naming conventions:**
- Use lowercase with hyphens: `feature-name.md`
- For milestone features: `m-XXX-feature-name.md` (e.g., `m-dx2-better-errors.md`)
- Be specific and descriptive
- Avoid generic names like `improvements.md`
**3. Run Create Script**
```bash
# If version is known (most cases)
.claude/skills/design-doc-creator/scripts/create_planned_doc.sh feature-name v0_4_0
# If version not decided yet
.claude/skills/design-doc-creator/scripts/create_planned_doc.sh feature-name
```
**4. Read Related Docs Found by Search**
**IMPORTANT:** Before filling in the template, read the top 2-3 related docs found by the search. This ensures your design:
- Builds on existing patterns and conventions
- Avoids duplicating work already done
- References relevant prior art
- Identifies potential conflicts early
```bash
# The script outputs related docs like:
# Implemented docs matching "feature name":
# [Neural - semantic matching]
# 1. design_docs/implemented/v0_6_0/similar-feature.md (0.45)
# 2. design_docs/implemented/v0_5_0/related-work.md (0.38)
# READ these docs before proceeding:
# - Look at their structure and patterns
# - Note any design decisions that apply
# - Check for overlap with your feature
# - Reference them in your "Related Documents" section
```
**What to look for in related docs:**
- Architecture patterns used
- Testing strategies employed
- Edge cases already handled
- Implementation trade-offs documented
- Success/failure metrics to compare against
**5. Customize the Template**
The script creates a comprehensive template. Fill in:
**Header section:**
- Feature name (replace `[Feature Name]`)
- Status: Leave as "Planned"
- Target: Version number (e.g., v0.4.0)
- Priority: P0 (High), P1 (Medium), or P2 (Low)
- Estimated: Time estimate (e.g., "3 days", "1 week")
- Dependencies: List prerequisite features or "None"
**Problem Statement:**
- Describe current pain points
- Include metrics if available (e.g., "takes 7.5 hours")
- Explain who is affected and how
**Goals:**
- Primary goal: One-sentence main objective
- Success metrics: 3-5 measurable outcomes
**Solution Design:**
- Overview: High-level approach
- Architecture: Technical design
- Implementation plan: Break into phases with tasks
- Files to modify: List new/changed files with LOC estimates
**Examples:**
- Show before/after code or workflows
- Make examples concrete and runnable
**Success Criteria:**
- Checkboxes for acceptance tests
- Include "All tests passing" and "Documentation updated"
**Timeline:**
- Week-by-week breakdown
- Realistic estimates (2x your initial guess!)
**6. Review and Commit**
```bash
git add design_docs/planned/feature-name.md
git commit -m "Add design doc for feature-name"
```
### Moving to Implemented
**When to move:**
- Feature is complete and shipped
- Tests are passing
- Documentation is updated
- Version is tagged/released
**1. Run Move Script**
```bash
.claude/skills/design-doc-creator/scripts/move_to_implemented.sh feature-name v0_3_14
```
**2. Add Implementation Report**
The script provides a template. Add:
**What Was Built:**
- Summary of actual implementation
- Any deviations from plan
**Code Locations:**
- New files created (with LOC)
- Modified files (with +/- LOC)
**Test Coverage:**
- Number of tests
- Coverage percentage
- Test file locations
**Metrics:**
- Before/after comparison table
- Show improvements achieved
**Known Limitations:**
- What's not yet implemented
- Edge cases not handled
- Performance limitations
**3. Update design_docs/README.md**
Add entry under appropriate version:
```markdown
### v0.3.14 - Feature Name (October 2024)
- Brief description of what shipped
- Key improvements
- [CHANGELOG](../CHANGELOG.md#v0314)
```
**4. Commit Changes**
```bash
git add design_docs/implemented/v0_3_14/feature-name.md design_docs/README.md
git commit -m "Move feature-name design doc to implemented (v0.3.14)"
git rm design_docs/planned/feature-name.md
git commit -m "Remove feature-name from planned (moved to implemented)"
```
## Design Doc Structure
See [resources/design_doc_structure.md](resources/design_doc_structure.md) for:
- Complete template breakdown
- Section-by-section guide
- Best practices for each section
- Common mistakes to avoid
## Best Practices
### 1. Be Specific
**Good:**
```markdown
**Primary Goal:** Reduce builtin development time from 7.5h to 2.5h (-67%)
```
**Bad:**
```markdown
**Primary Goal:** Make development easier
```
### 2. Include Metrics
**Good:**
```markdown
**Current State:**
- Development time: 7.5 hours per builtin
- Files to edit: 4
- Type construction: 35 lines
```
**Bad:**
```markdown
**Current State:**
- Development takes a long time
```
### 3. Break Into Phases
**Good:**
```markdown
**Phase 1: Core Registry** (~4 hours)
- [ ] Create BuiltinSpec struct
- [ ] Implement validation
- [ ] Add unit tests
**Phase 2: Type Builder** (~3 hours)
- [ ] Create DSL methods
- [ ] Add fluent API
- [ ] Test with existing builtins
```
**Bad:**
```markdown
**Implementation:**
- Build everything
```
### 4. Link to Examples
**Good:**
```markdown
See existing M-DX1 implementation:
- design_docs/implemented/v0_3_10/M-DX1_developer_experience.md
- internal/builtins/spec.go
```
**Bad:**
```markdown
Similar to other features
```
### 5. Realistic Estimates
**Rule of thumb:**
- 2x your initial estimate (things always take longer)
- Add buffer for testing and documentation
- Include time for unexpected issues
**Good:**
```markdown
**Estimated**: 3 days (6h implementation + 4h testing + 2h docs + buffer)
```
**Bad:**
```markdown
**Estimated**: Should be quick, maybe 2 hours
```
## Document Locations
```
design_docs/
āāā planned/ # Future features
ā āāā feature.md # Unversioned (version TBD)
ā āāā v0_4_0/ # Targeted for v0.4.0
ā āāā feature.md
āāā implemented/ # Completed features
āāā v0_3_10/ # Shipped in v0.3.10
ā āāā feature.md
āāā v0_3_14/ # Shipped in v0.3.14
āāā feature.md
```
**Version folder naming:**
- Use underscores: `v0_3_14` not `v0.3.14`
- Match CHANGELOG.md tags exactly
- Create folder when first doc needs it
## Progressive Disclosure
This skill loads information progressively:
1. **Always loaded**: This SKILL.md file (workflow overview)
2. **Execute as needed**: Scripts create/move docs
3. **Load on demand**: `resources/design_doc_structure.md` (detailed guide)
## Coordinator Integration (v0.6.2+)
The design-doc-creator skill integrates with the AILANG Coordinator for automated workflows.
### Autonomous Workflow
When configured in `~/.ailang/config.yaml`, the design-doc-creator agent can:
1. Automatically receive GitHub issues via `github_sync`
2. Create design docs from issue descriptions
3. Hand off to sprint-planner on completion
```yaml
coordinator:
agents:
- id: design-doc-creator
inbox: design-doc-creator
workspace: /path/to/ailang
capabilities: [research, docs]
trigger_on_complete: [sprint-planner]
auto_approve_handoffs: false
session_continuity: true
github_sync:
enabled: true
interval_secs: 300
target_inbox: design-doc-creator
```
### Sending Tasks to design-doc-creator
```bash
# Via CLI
ailang messages send design-doc-creator "Create design doc for semantic caching" \
--title "Feature: Semantic Caching" --from "user"
# The coordinator will:
# 1. Pick up the message
# 2. Create a git worktree
# 3. Execute the design-doc-creator workflow
# 4. Create approval request for human review
# 5. On approval, trigger sprint-planner with handoff message
```
### Handoff Message to sprint-planner
On completion, design-doc-creator sends:
```json
{
"type": "design_doc_ready",
"correlation_id": "task-123",
"design_doc_path": "design_docs/planned/v0_6_3/m-semantic-caching.md",
"session_id": "claude-session-abc",
"discovery": "Key findings from GitHub issue analysis"
}
```
### Human-in-the-Loop
With `auto_approve_handoffs: false`:
1. Design doc is created in worktree
2. Approval request shows git diff in dashboard
3. Human reviews design doc quality
4. Approve ā Merges to main, triggers sprint-planner
5. Reject ā Worktree preserved for manual fixes
## Notes
- All design docs should follow the template structure
- Update CHANGELOG.md when features ship (separate from design doc)
- Link design docs from README.md under version history
- Keep design docs focused - split large features into multiple docs
- Use M-XXX naming for milestone/major features
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### ../../../benchmarks/VISION_BENCHMARKS.md
```markdown
# Vision-Aligned Benchmarks
This document explains how benchmarks test AILANG's vision goals and what metrics track progress toward those goals.
## Vision Goals ā Benchmark Mapping
### 1. Explicit State Management (vs Python's Implicit State)
**Vision Problem**: Python's global state makes AI reasoning difficult. Same input ā different output.
**Benchmark**: `explicit_state_threading.yml`
**What it tests**:
- ā
**Behavioral**: Functions return (newState, result) tuples
- ā
**Observable**: Correct state threading produces correct outputs
- š® **Structural** (future): No global variables used, all state is explicit
**Current eval**: Checks stdout matches expected state progression.
**Enhanced eval**: Could parse AST to verify no global state mutations.
---
### 2. Deterministic Generation (One Canonical Form)
**Vision Problem**: Python has 5+ ways to transform a list. AI picks inconsistently.
**Benchmark**: `deterministic_list_transform.yml`
**What it tests**:
- ā
**Behavioral**: Correct transformation output
- š® **Consistency** (future): Same prompt ā same code structure across runs
**Current eval**: Checks stdout for correct transformed list.
**Enhanced eval**:
- Run same benchmark 10 times with same model/seed
- Measure: `hash(generated_code)` consistency
- Target: ā„95% identical code structure
---
### 3. Effect System Safety (!: IO, FS, Net)
**Benchmarks**:
- `effect_tracking_io_fs.yml` - Basic effect declarations
- `effect_pure_separation.yml` - Pure/effectful separation
- `effect_composition.yml` - Effect propagation
**What they test**:
- ā
**Behavioral**: Correct execution with effects
- š® **Type-level** (future): Effect signatures match actual behavior
**Current eval**: Checks stdout and file creation.
**Enhanced eval**:
```bash
# Parse generated code to verify:
ailang check-effects generated.ail --verify-signatures
# Returns:
# ā
computeSum: pure (no effects)
# ā
printSum: !: IO (correct)
# ā
processUser: !: IO,FS (correct)
# ā readFile: declared pure but does FS (ERROR)
```
**Metrics**:
- **Effect Safety Rate**: % of functions with correct effect declarations
- Target: ā„94% (from vision doc)
---
### 4. Total Functions (Exhaustive Pattern Matching)
**Benchmarks**:
- `exhaustive_pattern_matching.yml`
- `no_runtime_crashes_option.yml`
**What they test**:
- ā
**Behavioral**: All cases handled, no crashes
- š® **Compile-time** (future): Exhaustiveness verified by type checker
**Current eval**: Checks stdout for all expected outputs.
**Enhanced eval**:
```bash
# Compile generated code, verify exhaustiveness warnings:
ailang compile generated.ail --check-exhaustiveness
# Should report: "Pattern match on Status is exhaustive"
```
**Metrics**:
- **Totality Rate**: % of pattern matches that are exhaustive
- **Crash Rate**: % of runs that crash (should be 0%)
---
### 5. Type Safety (vs Python's Runtime Errors)
**Benchmark**: `type_safe_record_access.yml`
**What it tests**:
- ā
**Behavioral**: Correct nested field access
- š® **Compile-time** (future): Invalid field access rejected by compiler
**Current eval**: Checks stdout for correct value.
**Enhanced eval**:
```bash
# Test with intentionally broken code:
# Replace: user.profile.city
# With: user.profile.country (doesn't exist)
ailang compile broken.ail
# Should fail with: "Field 'country' not found in Profile record"
```
**Metrics**:
- **Type Correctness Rate**: % of generated code that type-checks
- Target: ā„98% (from vision doc)
---
### 6. Referential Transparency
**Benchmark**: `referential_transparency.yml`
**What it tests**:
- ā
**Behavioral**: Same input ā same output (3 calls produce identical results)
- š® **Purity** (future): No side effects in pure functions
**Current eval**: Checks that all three calls return same value.
**Enhanced eval**:
```bash
# Run same code 100 times, verify:
# - Same inputs always produce same outputs
# - No randomness, no time dependencies
# - Hash of output is consistent
```
**Metrics**:
- **Determinism Rate**: % of runs with identical outputs
- Target: ā„95% (from vision doc)
---
### 7. Canonical Code Structure
**Benchmark**: `canonical_normalization.yml`
**What it tests**:
- ā
**Behavioral**: Correct functional composition
- š® **Structural** (future): Code normalizes to identical form
**Current eval**: Checks stdout for correct result.
**Enhanced eval** (v0.4+):
```bash
# Generate code 10 times, normalize all variants:
for i in {1..10}; do
ailang eval --benchmark canonical_normalization --model gpt5
ailang normalize generated_$i.ail > normalized_$i.ail
done
# Check: all normalized versions are identical
diff normalized_*.ail
# Should show: 0 differences
```
**Metrics**:
- **Normalizer Lift**: % improvement in consistency after normalization
- Target: +20pp (from vision doc)
---
### 8. Immutable Data Structures
**Benchmark**: `immutable_data_structures.yml`
**What it tests**:
- ā
**Behavioral**: Updates create new records, originals unchanged
- š® **Immutability** (future): No in-place mutations
**Current eval**: Verifies all three records print correctly.
**Enhanced eval**:
```python
# Static analysis to detect mutations:
detect_mutations(generated_code)
# Flags: record.field = value (mutation!)
# Allows: new_record = {record | field: value} (immutable update)
```
---
## Current Evaluation Capabilities
### ā
What Works Today (v0.3.14)
**Output Validation**:
```bash
ailang eval-suite --benchmark explicit_state_threading
# ā
Compares stdout to expected_stdout
# ā
Measures tokens, duration, cost
# ā
Tracks success/failure per model
```
**Metrics Available**:
- Final success rate (compile + correct output)
- Zero-shot success (first attempt)
- Repair success rate (after error feedback)
- Token efficiency (AILANG vs Python)
### š® Future Enhancements (v0.4+)
**Structural Validation**:
```bash
ailang eval-suite --validate-structure
# Check:
# - Effect declarations match usage
# - Pattern matches are exhaustive
# - No global state mutations
# - Type safety (all field accesses valid)
```
**Determinism Tracking**:
```bash
ailang eval-suite --runs=10 --measure-determinism
# Run each benchmark 10x with same seed
# Report: code hash consistency, output consistency
```
**Normalization Testing**:
```bash
ailang eval-suite --test-normalization
# Generate code ā normalize ā compare
# Measure: % of semantically equivalent code that normalizes identically
```
---
## Recommended Eval Enhancements
### Phase 1: Static Analysis (v0.3.15)
Add validation hooks to check generated code properties:
```go
// internal/eval_harness/validators.go
type StructuralValidator interface {
ValidateEffects(code string) (bool, []string)
ValidateExhaustiveness(code string) (bool, []string)
ValidateImmutability(code string) (bool, []string)
}
```
Usage:
```bash
ailang eval-suite --validate=effects,exhaustiveness,immutability
```
### Phase 2: Multi-Run Consistency (v0.3.16)
```bash
ailang eval-suite --determinism-runs=10
# Generates:
# - determinism_report.json (per-benchmark consistency scores)
# - code_variance.md (shows variations in generated code)
```
### Phase 3: Normalization Baseline (v0.4.0)
```bash
ailang eval-baseline --with-normalization
# For each benchmark:
# 1. Generate code (all models)
# 2. Normalize each variant
# 3. Measure semantic equivalence
# 4. Report: normalizer lift (+Xpp improvement)
```
---
## Metrics Tracking
### Current Dashboard (`docs/static/benchmarks/latest.json`)
```json
{
"aggregates": {
"finalSuccess": 0.64,
"zeroShotSuccess": 0.59,
"repairSuccessRate": 0.14
}
}
```
### Enhanced Dashboard (Future)
```json
{
"aggregates": {
"finalSuccess": 0.64,
"typeCorrectness": 0.89, // NEW: % passing type check
"effectSafety": 0.94, // NEW: % with correct effects
"determinismRate": 0.91, // NEW: consistent outputs
"normalizerLift": 0.20 // NEW: consistency improvement
},
"visionAlignment": {
"explicitState": 0.85, // explicit_state_threading success
"canonicalForm": 0.78, // deterministic_list_transform success
"effectTracking": 0.72, // effect_* benchmarks avg
"totalFunctions": 0.88, // exhaustive_* benchmarks avg
"typeSafety": 0.89, // type_safe_* benchmarks avg
"referentialTransparency": 0.91 // referential_transparency success
}
}
```
---
## Running Vision Benchmarks
### Test All Vision Goals
```bash
# Run all vision-aligned benchmarks:
ailang eval-suite \
--benchmarks explicit_state_threading,deterministic_list_transform,effect_tracking_io_fs,effect_pure_separation,effect_composition,exhaustive_pattern_matching,type_safe_record_access,referential_transparency,canonical_normalization,no_runtime_crashes_option,immutable_data_structures \
--models gpt5,claude-sonnet-4-5,gemini-2-5-pro
# Generate vision-specific report:
ailang eval-report eval_results/vision_baseline v0.3.14 --format=vision
```
### Compare AILANG vs Python on Vision Goals
```bash
# Run with both languages:
ailang eval-suite --benchmarks explicit_state_threading --langs python,ailang
# Expected results:
# Python: May use global state (non-idiomatic but works)
# AILANG: Forces explicit state threading (only way to do it)
```
---
## Success Criteria
Based on vision doc targets:
| Metric | Target | Benchmark | Status |
|--------|--------|-----------|--------|
| **Compile Success** | ā„67% | All benchmarks | ā³ Measuring |
| **Type Correctness** | ā„98% | type_safe_* | š® Need validation |
| **Effect Safety** | ā„94% | effect_* | š® Need validation |
| **Determinism Rate** | ā„95% | referential_transparency | š® Need multi-run |
| **Model Pass Rate** | ā„70% | All benchmarks | ā³ Measuring |
| **Normalizer Lift** | +20pp | canonical_normalization | š® Need v0.4 |
**Next Steps**:
1. ā
Run current eval suite on new vision benchmarks
2. š® Add structural validators (v0.3.15)
3. š® Implement multi-run determinism testing (v0.3.16)
4. š® Add normalization baseline (v0.4.0)
---
## Interpreting Results
### High Success = Vision Goal Achieved
If a vision benchmark has >90% success:
- ā
AI models understand the concept
- ā
AILANG syntax supports it well
- ā
Teaching prompt is effective
### Low Success = Opportunity for Improvement
If a vision benchmark has <50% success:
- š§ May need syntax improvements
- š§ May need better teaching examples
- š§ May need tooling support (error messages, suggestions)
- š Still valuable: shows gap between vision and current AI capability
### Python vs AILANG Comparison
**Metrics to watch**:
1. **Token Efficiency**: AILANG should use fewer tokens (more concise)
2. **First-Attempt Success**: AILANG may be lower initially (unfamiliar syntax)
3. **Type Safety**: AILANG should catch errors at compile time (Python: runtime)
4. **Determinism**: AILANG code should be more consistent across runs
---
## Adding New Vision Benchmarks
Template:
```yaml
id: vision_goal_name
description: "Brief explanation of vision goal being tested"
languages: ["python", "ailang"]
entrypoint: "main"
caps: ["IO"] # or ["IO", "FS", "Net"] as needed
difficulty: "easy|medium|hard"
expected_gain: "low|medium|high|very_high"
task_prompt: |
Clearly describe the task.
Emphasize the vision property being tested:
- Explicit state
- Effect tracking
- Exhaustiveness
- Type safety
- Referential transparency
Output only the code, no explanations.
expected_stdout: |
Expected output
```
**Guidelines**:
- Focus on observable behavior (stdout)
- Highlight properties that differentiate AILANG from Python
- Make success/failure unambiguous
- Include edge cases that test totality
```
### resources/design_doc_structure.md
```markdown
# Design Document Structure Guide
Complete reference for AILANG design documents.
## Template Breakdown
### Header Section
```markdown
# [Feature Name]
**Status**: Planned | Implemented
**Target**: v0.4.0
**Priority**: P0 (High) | P1 (Medium) | P2 (Low)
**Estimated**: 3 days
**Dependencies**: None | Feature X, Feature Y
```
**Purpose**: Quick metadata for understanding scope and priority.
**Tips:**
- Use title case for feature name
- Status starts as "Planned", changes to "Implemented" when done
- Target should be specific version (v0.X.Y)
- Priority determines scheduling (P0 = next release, P1 = soon, P2 = eventually)
- Estimated should be realistic (2x initial guess)
- Dependencies help with scheduling
### Problem Statement
```markdown
## Problem Statement
[What problem does this solve? Why is it needed?]
**Current State:**
- Pain point 1 (with metrics)
- Pain point 2 (with data)
- Pain point 3 (with impact)
**Impact:**
- Who is affected? (developers, users, AI models)
- How significant? (blocker, annoyance, nice-to-have)
```
**Purpose**: Justify why this feature matters.
**Tips:**
- Include real metrics (hours, lines of code, error rates)
- Be specific about pain points
- Quantify impact when possible
- Link to issues or discussions if available
**Good example:**
```markdown
Adding a builtin function currently takes **7.5 hours** due to:
- Scattered registration across 4 files (2+ hours debugging)
- Verbose type construction (1+ hour trial-and-error)
- Poor error messages (1+ hour debugging)
**Impact:** Slows language evolution, hard for new contributors.
```
**Bad example:**
```markdown
Development is slow and hard.
```
### Goals
```markdown
## Goals
**Primary Goal:** [One sentence main objective with measurable outcome]
**Success Metrics:**
- Metric 1 (e.g., reduce time from X to Y)
- Metric 2 (e.g., improve coverage from X% to Y%)
- Metric 3 (e.g., reduce files from X to Y)
```
**Purpose**: Define what success looks like.
**Tips:**
- Primary goal should be measurable
- Success metrics should be objective
- Use before/after comparisons
- Include 3-5 metrics maximum
**Good example:**
```markdown
**Primary Goal:** Reduce builtin development time from 7.5h to 2.5h (-67%)
**Success Metrics:**
- Files touched: 4 ā 1 (-75%)
- Type construction LOC: 35 ā 10 (-71%)
- Registration errors caught at compile/startup
```
**Bad example:**
```markdown
**Primary Goal:** Make things better
**Success Metrics:**
- Easier to use
- Faster
```
### Solution Design
```markdown
## Solution Design
### Overview
[High-level description in 2-3 paragraphs]
### Architecture
[Technical approach, component breakdown]
**Components:**
1. **Component 1**: Description + responsibility
2. **Component 2**: Description + responsibility
3. **Component 3**: Description + responsibility
### Implementation Plan
**Phase 1: Foundation** (~X hours)
- [ ] Task 1 with clear deliverable
- [ ] Task 2 with clear deliverable
- [ ] Task 3 with clear deliverable
**Phase 2: Integration** (~X hours)
- [ ] Task 1
- [ ] Task 2
- [ ] Task 3
**Phase 3: Polish** (~X hours)
- [ ] Task 1
- [ ] Task 2
- [ ] Task 3
```
**Purpose**: Explain how you'll solve the problem.
**Tips:**
- Overview should be understandable by non-experts
- Break into 3-5 phases maximum
- Each task should have a clear definition of done
- Estimate each phase separately
- Use checkboxes for tracking
**Good example:**
```markdown
### Overview
Create a central builtin registry where all builtins are registered once with:
- Function implementation
- Type signature (using builder DSL)
- Arity and effect metadata
- Validation on startup
### Architecture
**Components:**
1. **BuiltinSpec**: Struct holding all builtin metadata
2. **Registry**: Map of name ā spec with validation
3. **Type Builder**: Fluent DSL for type construction
4. **Validator**: Startup checks for consistency
### Implementation Plan
**Phase 1: Registry Core** (~4 hours)
- [ ] Create BuiltinSpec struct
- [ ] Implement Register() function
- [ ] Add validation logic
- [ ] Write unit tests (100% coverage)
```
### Files to Modify/Create
```markdown
### Files to Modify/Create
**New files:**
- `internal/builtins/spec.go` (~300 LOC) - BuiltinSpec struct and registry
- `internal/builtins/validator.go` (~150 LOC) - Validation logic
- `internal/types/builder.go` (~200 LOC) - Type builder DSL
**Modified files:**
- `internal/runtime/builtins.go` (+50/-100 LOC) - Use registry instead of manual registration
- `internal/link/builtin_module.go` (+20/-200 LOC) - Read types from registry
```
**Purpose**: Help estimate scope and identify affected areas.
**Tips:**
- List all files that will change
- Estimate LOC for new files
- Show +/- LOC for modified files
- Include purpose for each file
- Group by package
### Examples
```markdown
## Examples
### Example 1: Adding a String Builtin
**Before (scattered across 4 files):**
```go
// internal/effects/string.go
func strReverse(...) { ... }
// internal/runtime/builtins.go
case "_str_reverse": result = effects.strReverse(...)
// internal/builtins/registry.go
"_str_reverse": {NumArgs: 1, IsPure: true}
// internal/link/builtin_module.go
"_str_reverse": &types.TFunc2{...} // 35 lines
```
**After (single file):**
```go
// internal/builtins/register.go
RegisterEffectBuiltin(BuiltinSpec{
Name: "_str_reverse",
NumArgs: 1,
IsPure: true,
Type: func() types.Type {
T := types.NewBuilder()
return T.Func(T.String()).Returns(T.String())
},
Impl: strReverseImpl,
})
```
```
**Purpose**: Show concrete impact of the change.
**Tips:**
- Use real code examples
- Show before/after comparison
- Include 2-3 different use cases
- Make examples runnable
- Highlight key improvements
### Success Criteria
```markdown
## Success Criteria
- [ ] All builtins migrated to new registry
- [ ] `ailang doctor builtins` validates registration
- [ ] Type Builder DSL reduces construction from 35ā10 LOC
- [ ] Development time measured at <3 hours for new builtin
- [ ] All tests passing (90%+ coverage on new code)
- [ ] Documentation updated (CLAUDE.md, ADDING_BUILTINS.md)
- [ ] Examples working (`make verify-examples`)
```
**Purpose**: Define what "done" means.
**Tips:**
- Use checkboxes
- Make criteria objective and testable
- Include test coverage requirements
- Always include "All tests passing"
- Always include "Documentation updated"
- Include performance/metric targets
### Testing Strategy
```markdown
## Testing Strategy
**Unit tests:**
- Registry validation (invalid specs rejected)
- Type Builder DSL (all type combinations)
- Each builtin impl (hermetic, mocked effects)
**Integration tests:**
- End-to-end builtin calls from REPL
- Cross-module builtin usage
- Error propagation
**Manual testing:**
- Add new builtin in <3 hours
- Run `ailang doctor builtins`
- Check REPL `:type` output
```
**Purpose**: Ensure quality and catch regressions.
**Tips:**
- Break into unit/integration/manual
- List what each test level covers
- Include acceptance tests
- Specify coverage targets
- Mention test files to create
### Non-Goals
```markdown
## Non-Goals
**Not in this feature:**
- Auto-generated documentation from specs - [Deferred to M-DX2]
- Hot-reload builtins for development - [Out of scope]
- Performance benchmarks - [Not critical]
- CI checks for legacy patterns - [Nice-to-have]
```
**Purpose**: Set boundaries and manage expectations.
**Tips:**
- List what you're NOT doing
- Explain why (deferred, out of scope, not valuable)
- Link to future work if applicable
- Prevents scope creep
### Timeline
```markdown
## Timeline
**Week 1** (12 hours):
- Phase 1: Registry core (4h)
- Phase 2: Type Builder (3h)
- Phase 3: Validation (2h)
- Testing + Polish (3h)
**Week 2** (8 hours):
- Migrate 10 builtins (5h)
- Documentation (2h)
- Final testing (1h)
**Total: ~20 hours across 2 weeks**
```
**Purpose**: Realistic scheduling and resource planning.
**Tips:**
- Break by week or phase
- Include buffer time
- Account for testing and docs
- 2x your initial estimates
- Show cumulative total
### Risks & Mitigations
```markdown
## Risks & Mitigations
| Risk | Impact | Mitigation |
|------|--------|-----------|
| Migration breaks existing builtins | High | Incremental migration, test after each builtin |
| Type Builder too complex | Medium | Start simple, add features as needed |
| Performance regression | Low | Benchmark before/after, registry is cached |
```
**Purpose**: Identify potential issues early.
**Tips:**
- List 3-5 main risks
- Rate impact (High/Medium/Low)
- Include concrete mitigation plan
- Update as risks change
### Axiom Compliance
```markdown
## Axiom Compliance
**Canonical reference:** [Design Axioms](/docs/references/axioms)
### Scoring
| Axiom | Score | Justification |
|-------|-------|---------------|
| A1: Determinism | +1 | Enables reproducible X |
| A2: Replayability | 0 | No impact |
| A3: Effect Legibility | +1 | Makes Y effects explicit |
| A4: Explicit Authority | 0 | No capability changes |
| A5: Bounded Verification | +1 | Enables local Z checks |
| A6: Safe Concurrency | 0 | No concurrency impact |
| A7: Machines First | +1 | Reduces token cost for AI |
| A8: Minimal Syntax | 0 | No new syntax |
| A9: Cost Visibility | 0 | No resource changes |
| A10: Composability | +1 | Composes with existing W |
| A11: Structured Failure | 0 | No error handling changes |
| A12: System Boundary | 0 | No boundary changes |
**Net Score: +5** ā
Proceed to implementation
### Hard Violation Check
- [ ] A1 (Determinism): No implicit nondeterminism introduced
- [ ] A3 (Effects): No hidden side effects
- [ ] A4 (Authority): No ambient access granted
- [ ] A7 (Machines First): Not optimizing for human convenience over machine analysis
```
**Purpose**: Verify feature aligns with AILANG's core principles.
**Tips:**
- Score EVERY axiom (use 0 for no impact)
- Justify non-zero scores with specific reasoning
- Hard violations (ā1 on A1, A3, A4, A7) require redesign
- Net score ā„ +2 to proceed
- Include the hard violation checklist
**Decision thresholds:**
- **ā„ +2**: Proceed to draft
- **0 to +1**: Needs stronger justification
- **< 0**: Reject or redesign
- **Any ā1 on A1, A3, A4, A7**: Automatic rejection
### References
```markdown
## References
- **Motivation**: `design_docs/planned/easier-ailang-dev.md`
- **Prior art**: Ruby DSL for types, Rust procedural macros
- **Related issues**: #42, #58
- **Discussion**: Slack thread (Oct 10, 2024)
- **Axiom reference**: [Design Axioms](/docs/references/axioms)
```
**Purpose**: Link to context and background.
**Tips:**
- Link to related design docs
- Reference issues/PRs
- Cite prior art or research
- Include discussion links
- Always include axiom reference link
### Future Work
```markdown
## Future Work
[Features that build on this:]
- Auto-documentation generation from BuiltinSpec
- LSP integration for builtin type hints
- Hot-reload for faster development iteration
```
**Purpose**: Capture ideas without committing to them.
**Tips:**
- List logical next steps
- Don't commit to timeline
- Link to non-goals if applicable
- Keep brief (1-2 sentences each)
## Implementation Report (for Implemented docs)
Add this section when moving to `implemented/`:
```markdown
## Implementation Report
**Completed**: 2024-10-15
**Version**: v0.3.10
### What Was Built
[Summary of actual implementation vs plan]
- What changed from original design
- Why changes were made
- What worked well
### Code Locations
**New files:**
- `internal/builtins/spec.go` (342 LOC) - BuiltinSpec and registry
- `internal/builtins/validator.go` (187 LOC) - Validation
- `internal/types/builder.go` (234 LOC) - Type Builder DSL
**Modified files:**
- `internal/runtime/builtins.go` (+45/-120 LOC) - Uses registry
- `internal/link/builtin_module.go` (+18/-215 LOC) - Reads from registry
### Test Coverage
- Unit tests: 57 passing
- Integration tests: 12 passing
- Coverage: 92% on new code
- Test files: `internal/builtins/*_test.go`, `internal/types/builder_test.go`
### Metrics
| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| Development time | 7.5h | 2.5h | -67% |
| Files to edit | 4 | 1 | -75% |
| Type construction LOC | 35 | 10 | -71% |
### Known Limitations
- REPL `:type` command not yet implemented (deferred to M-DX1.6)
- Enhanced diagnostics pending (M-DX1.7)
- `_json_encode` migration pending (complex ADT handling)
### Future Work
See design_docs/planned/m-dx1-future-polish.md for:
- M-DX1.6: REPL `:type` command
- M-DX1.7: Enhanced error diagnostics
- M-DX1.8: Comprehensive documentation
```
**Purpose**: Document what actually happened.
**Tips:**
- Include actual LOC and metrics
- Show before/after data
- List known limitations
- Link to future work
- Reference test files
- Note deviations from plan
## Common Mistakes to Avoid
### 1. Vague Problem Statements
ā Bad:
```markdown
Development is hard.
```
ā
Good:
```markdown
Adding a builtin takes 7.5 hours due to scattered registration (4 files),
verbose types (35 LOC), and poor errors (1+ hour debugging).
```
### 2. Unmeasurable Goals
ā Bad:
```markdown
Make development easier and faster.
```
ā
Good:
```markdown
Reduce builtin development time from 7.5h to 2.5h (-67%)
```
### 3. Missing Implementation Details
ā Bad:
```markdown
Implement the feature.
```
ā
Good:
```markdown
**Phase 1**: Create BuiltinSpec struct (~4h)
- [ ] Define struct in internal/builtins/spec.go
- [ ] Add Register() function
- [ ] Implement validation
- [ ] Write tests (100% coverage)
```
### 4. No Examples
ā Bad:
```markdown
The new system is better.
```
ā
Good:
```markdown
**Before (4 files, 35 LOC):**
[code example]
**After (1 file, 10 LOC):**
[code example]
```
### 5. Unrealistic Estimates
ā Bad:
```markdown
Should take about 2 hours.
```
ā
Good:
```markdown
**Estimated**: 3 days (6h implementation + 4h testing + 2h docs + buffer)
```
### 6. Missing Success Criteria
ā Bad:
```markdown
Feature is done when it works.
```
ā
Good:
```markdown
- [ ] All 49 builtins migrated
- [ ] `ailang doctor builtins` passes
- [ ] Development time < 3 hours (measured)
- [ ] 90%+ test coverage
- [ ] Documentation updated
```
## Version Folder Organization
```
design_docs/
āāā planned/
ā āāā unversioned-feature.md # Version TBD
ā āāā v0_4_0/ # Targeted for v0.4.0
ā āāā reflection-system.md
ā āāā schema-registry.md
āāā implemented/
āāā v0_3_10/ # Shipped in v0.3.10
ā āāā M-DX1_developer_experience.md
ā āāā M-DASH.md
āāā v0_3_12/ # Shipped in v0.3.12
ā āāā show-function-recovery.md
āāā v0_3_14/ # Shipped in v0.3.14
āāā feature-x.md
```
**Rules:**
- Use underscores in folder names: `v0_3_14`
- Match CHANGELOG.md tags exactly
- Create version folder when first doc needs it
- Move entire folder when version ships
- Keep planned/ docs unversioned if target unclear
## Checklist
Use this when creating a new design doc:
**Before creating:**
- [ ] Feature is well-understood
- [ ] Priority and version target are known
- [ ] Similar features reviewed for patterns
- [ ] Dependencies identified
- [ ] **Axiom quick-check**: Does this obviously violate A1, A3, A4, or A7?
**When creating:**
- [ ] Run `create_planned_doc.sh`
- [ ] Fill in all header metadata
- [ ] Write specific problem statement with metrics
- [ ] Define measurable success criteria
- [ ] Break implementation into phases
- [ ] Include concrete before/after examples
- [ ] Estimate realistically (2x initial guess)
- [ ] List files to create/modify
- [ ] Add testing strategy
- [ ] Define non-goals
- [ ] **Complete Axiom Compliance section** (score all 12 axioms)
- [ ] **Verify net score ā„ +2** (or provide justification)
- [ ] **Check hard violations** (A1, A3, A4, A7 must not be ā1)
**After creating:**
- [ ] Review for clarity and completeness
- [ ] Verify axiom scores are justified
- [ ] Get feedback from others if major feature
- [ ] Commit to git
- [ ] Link from related docs if needed
**When moving to implemented:**
- [ ] Run `move_to_implemented.sh`
- [ ] Add implementation report
- [ ] Include actual metrics
- [ ] List known limitations
- [ ] Update design_docs/README.md
- [ ] Update CHANGELOG.md
- [ ] Commit and delete from planned/
```
### scripts/create_planned_doc.sh
```bash
#!/usr/bin/env bash
set -euo pipefail
# Create a new design document in design_docs/planned/
#
# Usage: create_planned_doc.sh <doc-name> [version]
# doc-name: Lowercase with hyphens (e.g., m-dx2-feature-name)
# version: Optional version folder (e.g., v0_4_0)
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)"
DESIGN_DOCS_DIR="$PROJECT_ROOT/design_docs"
# Colors for output
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
CYAN='\033[0;36m'
NC='\033[0m' # No Color
# Get current version from CHANGELOG.md
get_current_version() {
local CHANGELOG="$PROJECT_ROOT/CHANGELOG.md"
if [ -f "$CHANGELOG" ]; then
grep -oE 'v[0-9]+\.[0-9]+\.[0-9]+' "$CHANGELOG" | head -1
else
echo "unknown"
fi
}
# Compute next patch version (e.g., v0.5.6 -> v0_5_7)
get_next_version_folder() {
local current="$1"
# Extract major.minor.patch
local version="${current#v}" # Remove 'v' prefix
local major=$(echo "$version" | cut -d. -f1)
local minor=$(echo "$version" | cut -d. -f2)
local patch=$(echo "$version" | cut -d. -f3)
# Increment patch
local next_patch=$((patch + 1))
# Return folder format (v0_5_7)
echo "v${major}_${minor}_${next_patch}"
}
CURRENT_VERSION=$(get_current_version)
NEXT_VERSION_FOLDER=$(get_next_version_folder "$CURRENT_VERSION")
if [ $# -lt 1 ]; then
echo -e "${RED}ā Error: Missing required argument${NC}"
echo ""
echo -e "${CYAN}Current AILANG version: $CURRENT_VERSION${NC}"
echo -e "${CYAN}Suggested next version: $NEXT_VERSION_FOLDER${NC}"
echo ""
echo "Usage: create_planned_doc.sh <doc-name> [version]"
echo ""
echo "Arguments:"
echo " doc-name Document name (lowercase-with-hyphens, e.g., m-dx2-better-errors)"
echo " version Optional version folder (e.g., $NEXT_VERSION_FOLDER)"
echo ""
echo "Examples:"
echo " create_planned_doc.sh m-dx2-better-errors"
echo " create_planned_doc.sh reflection-system $NEXT_VERSION_FOLDER"
exit 1
fi
DOC_NAME="$1"
VERSION="${2:-}"
# Convert doc-name to search query (replace hyphens with spaces, remove m-prefix)
search_query() {
local name="$1"
# Remove common prefixes like m-dx1, m-perf2, etc.
local cleaned="${name#m-}"
cleaned="${cleaned#[a-z]*[0-9]-}"
# Replace hyphens with spaces
echo "${cleaned//-/ }"
}
SEARCH_QUERY=$(search_query "$DOC_NAME")
# Check for related design docs before creating
# Run both SimHash (instant) and Neural (better quality), combine unique results
echo -e "${CYAN}š Searching for related design docs...${NC}"
echo ""
# Check if ailang is available
if command -v ailang &> /dev/null || [ -x "$PROJECT_ROOT/bin/ailang" ]; then
AILANG_CMD="${PROJECT_ROOT}/bin/ailang"
if ! [ -x "$AILANG_CMD" ]; then
AILANG_CMD="ailang"
fi
# Function to merge and deduplicate results (neural results take priority)
merge_results() {
local simhash="$1"
local neural="$2"
# Combine, dedupe by path. Neural listed first so its scores take priority.
# Format: "1. path/to/file.md (0.85)" - $2 is the path
{ echo "$neural"; echo "$simhash"; } | grep -E "^[0-9]+\." | \
awk '{ path = $2; if (path && !seen[path]++) print }' | head -5
}
# --- IMPLEMENTED DOCS ---
echo -e "${YELLOW}Implemented docs matching \"$SEARCH_QUERY\":${NC}"
# SimHash first (instant feedback)
echo -e " ${CYAN}[SimHash - instant]${NC}"
IMPL_SIMHASH=$("$AILANG_CMD" docs search --stream implemented --limit 5 "$SEARCH_QUERY" 2>/dev/null | grep -E "^\d+\." || echo "")
if [ -n "$IMPL_SIMHASH" ]; then
echo "$IMPL_SIMHASH" | head -3 | sed 's/^/ /'
else
echo " (none found)"
fi
# Neural search (better quality)
echo -e " ${CYAN}[Neural - semantic matching]${NC}"
IMPL_NEURAL=$("$AILANG_CMD" docs search --stream implemented --neural --limit 5 "$SEARCH_QUERY" 2>/dev/null | grep -E "^\d+\." || echo "")
if [ -n "$IMPL_NEURAL" ]; then
echo "$IMPL_NEURAL" | head -3 | sed 's/^/ /'
else
echo " (none found)"
fi
# Merge for template (neural preferred)
IMPLEMENTED=$(merge_results "$IMPL_SIMHASH" "$IMPL_NEURAL")
[ -z "$IMPLEMENTED" ] && IMPLEMENTED=" (none found)"
echo ""
# --- PLANNED DOCS ---
echo -e "${YELLOW}Planned docs matching \"$SEARCH_QUERY\":${NC}"
# SimHash first (instant feedback)
echo -e " ${CYAN}[SimHash - instant]${NC}"
PLAN_SIMHASH=$("$AILANG_CMD" docs search --stream planned --limit 5 "$SEARCH_QUERY" 2>/dev/null | grep -E "^\d+\." || echo "")
if [ -n "$PLAN_SIMHASH" ]; then
echo "$PLAN_SIMHASH" | head -3 | sed 's/^/ /'
else
echo " (none found)"
fi
# Neural search (better quality)
echo -e " ${CYAN}[Neural - semantic matching]${NC}"
PLAN_NEURAL=$("$AILANG_CMD" docs search --stream planned --neural --limit 5 "$SEARCH_QUERY" 2>/dev/null | grep -E "^\d+\." || echo "")
if [ -n "$PLAN_NEURAL" ]; then
echo "$PLAN_NEURAL" | head -3 | sed 's/^/ /'
else
echo " (none found)"
fi
# Merge for template (neural preferred)
PLANNED=$(merge_results "$PLAN_SIMHASH" "$PLAN_NEURAL")
[ -z "$PLANNED" ] && PLANNED=" (none found)"
echo ""
# Show info if matches found (no confirmation required - proceed automatically)
if [ "$IMPLEMENTED" != " (none found)" ] || [ "$PLANNED" != " (none found)" ]; then
echo -e "${CYAN}ā¹ Related docs found above - review them after creation if needed.${NC}"
echo ""
fi
else
echo -e "${YELLOW}ā ailang not found - skipping related doc search${NC}"
echo " Build with: make build"
echo ""
fi
# Determine target directory
if [ -n "$VERSION" ]; then
TARGET_DIR="$DESIGN_DOCS_DIR/planned/$VERSION"
mkdir -p "$TARGET_DIR"
else
TARGET_DIR="$DESIGN_DOCS_DIR/planned"
fi
DOC_PATH="$TARGET_DIR/${DOC_NAME}.md"
# Check if document already exists
if [ -f "$DOC_PATH" ]; then
echo -e "${YELLOW}ā Warning: Document already exists at $DOC_PATH${NC}"
echo ""
read -p "Overwrite? (y/N) " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
echo "Cancelled."
exit 1
fi
fi
# Get current date
CURRENT_DATE=$(date +%Y-%m-%d)
# Create document from template
cat > "$DOC_PATH" <<'EOF'
# [Feature Name]
**Status**: Planned
**Target**: [Version, e.g., v0.4.0]
**Priority**: [P0/P1/P2 - High/Medium/Low]
**Estimated**: [Time estimate, e.g., 2 days]
**Dependencies**: [None or list other features]
## Axiom Compliance
**Canonical reference:** [Design Axioms](/docs/references/axioms)
Every feature must align with AILANG's 12 Design Axioms. Score each axiom and verify no hard violations.
### Axiom Scoring
| Axiom | Score | Justification |
|-------|-------|---------------|
| A1: Determinism | [+1/0/ā1] | [e.g., "Enables reproducible traces"] |
| A2: Replayability | [+1/0/ā1] | [e.g., "No impact on traces"] |
| A3: Effect Legibility | [+1/0/ā1] | [e.g., "Makes IO effects explicit"] |
| A4: Explicit Authority | [+1/0/ā1] | [e.g., "Enforces capability constraints"] |
| A5: Bounded Verification | [+1/0/ā1] | [e.g., "Enables local type checks"] |
| A6: Safe Concurrency | [+1/0/ā1] | [e.g., "No concurrency changes"] |
| A7: Machines First | [+1/0/ā1] | [e.g., "Reduces AI token cost"] |
| A8: Minimal Syntax | [+1/0/ā1] | [e.g., "No new syntax required"] |
| A9: Cost Visibility | [+1/0/ā1] | [e.g., "Resource costs remain visible"] |
| A10: Composability | [+1/0/ā1] | [e.g., "Composes with existing effects"] |
| A11: Structured Failure | [+1/0/ā1] | [e.g., "Errors remain typed"] |
| A12: System Boundary | [+1/0/ā1] | [e.g., "Boundary crossings explicit"] |
**Net Score: [Total]** ā **Decision: [Move forward / Reject / Redesign]**
### Hard Violation Check
**These axioms cannot have ā1 scores (automatic rejection):**
- [ ] A1 (Determinism): No implicit nondeterminism introduced
- [ ] A3 (Effects): No hidden side effects
- [ ] A4 (Authority): No ambient access granted
- [ ] A7 (Machines First): Not optimizing for human convenience over machine analysis
### Decision Thresholds
| Net Score | Decision |
|-----------|----------|
| ā„ +2 | ā
Proceed to implementation |
| 0 to +1 | ā ļø Needs stronger justification |
| < 0 | ā Reject or redesign |
| Any ā1 on A1/A3/A4/A7 | ā Automatic rejection |
## Problem Statement
[What problem does this solve? Why is it needed?]
**Current State:**
- [Describe current pain points]
- [Include metrics if available]
**Impact:**
- [Who is affected?]
- [How significant is the problem?]
## Goals
**Primary Goal:** [Main objective in one sentence]
**Success Metrics:**
- [Measurable outcome 1]
- [Measurable outcome 2]
- [Measurable outcome 3]
## Solution Design
### Overview
[High-level description of the solution]
### Architecture
[Describe the technical approach]
**Components:**
1. **Component 1**: [Description]
2. **Component 2**: [Description]
3. **Component 3**: [Description]
### Implementation Plan
**Phase 1: [Name]** (~X hours)
- [ ] Task 1
- [ ] Task 2
- [ ] Task 3
**Phase 2: [Name]** (~X hours)
- [ ] Task 1
- [ ] Task 2
- [ ] Task 3
**Phase 3: [Name]** (~X hours)
- [ ] Task 1
- [ ] Task 2
- [ ] Task 3
### Files to Modify/Create
**New files:**
- `path/to/new_file.go` - [Purpose, ~XXX LOC]
**Modified files:**
- `path/to/existing_file.go` - [Changes needed, ~XXX LOC]
## Examples
### Example 1: [Use Case]
**Before:**
```
[Code or workflow before the change]
```
**After:**
```
[Code or workflow after the change]
```
### Example 2: [Use Case]
[Additional examples as needed]
## Success Criteria
- [ ] Criterion 1 (with acceptance test)
- [ ] Criterion 2 (with acceptance test)
- [ ] Criterion 3 (with acceptance test)
- [ ] All tests passing
- [ ] Documentation updated
- [ ] Examples added
## Testing Strategy
**Unit tests:**
- [What to test]
**Integration tests:**
- [What to test]
**Manual testing:**
- [What to verify manually]
## Non-Goals
**Not in this feature:**
- [Thing 1] - [Why deferred]
- [Thing 2] - [Why out of scope]
## Timeline
**Week 1** (X hours):
- Phase 1 implementation
**Week 2** (X hours):
- Phase 2 implementation
- Testing
**Week 3** (X hours):
- Phase 3 implementation
- Documentation
- Release
**Total: ~X hours across Y weeks**
## Risks & Mitigations
| Risk | Impact | Mitigation |
|------|--------|-----------|
| [Risk 1] | [High/Med/Low] | [How to address] |
| [Risk 2] | [High/Med/Low] | [How to address] |
## Related Documents
<!-- Auto-populated by Ollama neural search on "SEARCH_QUERY_PLACEHOLDER" -->
**Implemented (may inform design):**
RELATED_IMPLEMENTED_PLACEHOLDER
**Planned (check for overlap):**
RELATED_PLANNED_PLACEHOLDER
## References
- [Design Axioms](/docs/references/axioms) - The 12 non-negotiable principles
- [Philosophical Foundations](/docs/references/philosophical-foundations) - Block-universe determinism
- [Design Lineage](/docs/references/design-lineage) - What we adopted/rejected and why
- [Link to related design docs]
- [Link to issues or discussions]
## Future Work
[Features that build on this but are out of scope for now]
---
**Document created**: CURRENT_DATE
**Last updated**: CURRENT_DATE
EOF
# Replace CURRENT_DATE with actual date
sed -i.bak "s/CURRENT_DATE/$CURRENT_DATE/g" "$DOC_PATH"
rm "${DOC_PATH}.bak"
# Replace search query placeholder
sed -i.bak "s/SEARCH_QUERY_PLACEHOLDER/$SEARCH_QUERY/g" "$DOC_PATH"
rm "${DOC_PATH}.bak"
# Format search results for markdown
format_results() {
local results="$1"
if [ "$results" = " (none found)" ] || [ -z "$results" ]; then
echo "- (none found)"
else
# Convert numbered list to markdown links
echo "$results" | head -3 | while read -r line; do
# Extract path and score from format "1. path/to/doc.md (0.85)"
local path=$(echo "$line" | sed 's/^[0-9]*\. //' | sed 's/ ([0-9.]*)//')
local score=$(echo "$line" | grep -oE '\([0-9.]+\)' || echo "")
if [ -n "$path" ]; then
echo "- [$path]($path) $score"
fi
done
fi
}
# Replace related docs placeholders
IMPL_FORMATTED=$(format_results "$IMPLEMENTED")
PLAN_FORMATTED=$(format_results "$PLANNED")
# Use perl for multi-line replacement (more portable than sed)
perl -i -pe "s|RELATED_IMPLEMENTED_PLACEHOLDER|$IMPL_FORMATTED|g" "$DOC_PATH" 2>/dev/null || \
sed -i.bak "s|RELATED_IMPLEMENTED_PLACEHOLDER|$IMPL_FORMATTED|g" "$DOC_PATH" && rm -f "${DOC_PATH}.bak"
perl -i -pe "s|RELATED_PLANNED_PLACEHOLDER|$PLAN_FORMATTED|g" "$DOC_PATH" 2>/dev/null || \
sed -i.bak "s|RELATED_PLANNED_PLACEHOLDER|$PLAN_FORMATTED|g" "$DOC_PATH" && rm -f "${DOC_PATH}.bak"
# Convert to relative path for coordinator marker
RELATIVE_PATH="${DOC_PATH#$PROJECT_ROOT/}"
# Success message
echo -e "${GREEN}ā Created design document:${NC}"
echo " $DOC_PATH"
echo ""
echo -e "${CYAN}Version context:${NC}"
echo " Current AILANG: $CURRENT_VERSION"
echo " Next version: $NEXT_VERSION_FOLDER"
if [ -n "$VERSION" ]; then
echo " Doc target: $VERSION"
fi
echo ""
echo -e "${GREEN}Next steps:${NC}"
echo " 1. Edit $DOC_PATH to fill in the template"
echo " 2. Replace [placeholders] with actual content"
echo " 3. Commit when ready: git add $DOC_PATH"
echo ""
echo -e "${YELLOW}Pro tips:${NC}"
echo " - Complete the Axiom Compliance section (score all 12 axioms)"
echo " - Hard violations on A1/A3/A4/A7 = automatic rejection"
echo " - Net axiom score must be ā„ +2 to proceed"
echo " - Use M-XXX naming for milestone features"
echo " - Include concrete examples and metrics"
echo " - Keep estimates realistic (2x your initial guess)"
echo ""
# Output coordinator markers (deterministic - script knows exactly what was created)
echo "---"
echo "DESIGN_DOC_PATH: $RELATIVE_PATH"
```
### scripts/move_to_implemented.sh
```bash
#!/usr/bin/env bash
set -euo pipefail
# Move a design document from planned/ to implemented/
#
# Usage: move_to_implemented.sh <doc-name> <version>
# doc-name: Name of the doc in planned/ (without .md extension)
# version: Target version folder (e.g., v0_3_14)
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)"
DESIGN_DOCS_DIR="$PROJECT_ROOT/design_docs"
# Colors for output
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
NC='\033[0m' # No Color
if [ $# -lt 2 ]; then
echo -e "${RED}ā Error: Missing required arguments${NC}"
echo ""
echo "Usage: move_to_implemented.sh <doc-name> <version>"
echo ""
echo "Arguments:"
echo " doc-name Document name in planned/ (without .md)"
echo " version Version folder (e.g., v0_3_14)"
echo ""
echo "Examples:"
echo " move_to_implemented.sh m-dx1-developer-experience v0_3_10"
echo " move_to_implemented.sh reflection-system v0_4_0"
exit 1
fi
DOC_NAME="$1"
VERSION="$2"
# Find source document (check both planned/ root and version folders)
SOURCE_PATH=""
if [ -f "$DESIGN_DOCS_DIR/planned/${DOC_NAME}.md" ]; then
SOURCE_PATH="$DESIGN_DOCS_DIR/planned/${DOC_NAME}.md"
elif [ -f "$DESIGN_DOCS_DIR/planned/v0_4_0/${DOC_NAME}.md" ]; then
SOURCE_PATH="$DESIGN_DOCS_DIR/planned/v0_4_0/${DOC_NAME}.md"
else
# Search for it
FOUND=$(find "$DESIGN_DOCS_DIR/planned" -name "${DOC_NAME}.md" 2>/dev/null | head -1)
if [ -n "$FOUND" ]; then
SOURCE_PATH="$FOUND"
else
echo -e "${RED}ā Error: Document not found in planned/: ${DOC_NAME}.md${NC}"
echo ""
echo "Available docs in planned/:"
find "$DESIGN_DOCS_DIR/planned" -name "*.md" -type f | sed 's|.*/||' | sort
exit 1
fi
fi
# Create target directory
TARGET_DIR="$DESIGN_DOCS_DIR/implemented/$VERSION"
mkdir -p "$TARGET_DIR"
TARGET_PATH="$TARGET_DIR/${DOC_NAME}.md"
# Check if already exists in implemented
if [ -f "$TARGET_PATH" ]; then
echo -e "${YELLOW}ā Warning: Document already exists in implemented/$VERSION/${NC}"
echo ""
read -p "Overwrite? (y/N) " -n 1 -r
echo
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
echo "Cancelled."
exit 1
fi
fi
# Get current date for metadata update
CURRENT_DATE=$(date +%Y-%m-%d)
# Copy file (don't delete from planned yet - let user review first)
cp "$SOURCE_PATH" "$TARGET_PATH"
# Update status in the document
if grep -q "^\*\*Status\*\*:" "$TARGET_PATH"; then
# Update existing status line
sed -i.bak "s/^\*\*Status\*\*:.*/\*\*Status\*\*: Implemented/" "$TARGET_PATH"
rm "${TARGET_PATH}.bak" 2>/dev/null || true
fi
# Update last updated date
if grep -q "^\*\*Last updated\*\*:" "$TARGET_PATH"; then
sed -i.bak "s/^\*\*Last updated\*\*:.*/\*\*Last updated\*\*: $CURRENT_DATE/" "$TARGET_PATH"
rm "${TARGET_PATH}.bak" 2>/dev/null || true
fi
# Success message
echo -e "${GREEN}ā Copied design document to implemented:${NC}"
echo " From: $SOURCE_PATH"
echo " To: $TARGET_PATH"
echo ""
echo -e "${YELLOW}Next steps:${NC}"
echo " 1. Review the document at $TARGET_PATH"
echo " 2. Add implementation report section:"
echo " - What was actually built"
echo " - Code locations (internal/...)"
echo " - Test coverage metrics"
echo " - Known limitations"
echo " 3. Update design_docs/README.md with version history"
echo " 4. Commit changes: git add $TARGET_PATH design_docs/README.md"
echo " 5. AFTER committing, delete original:"
echo " git rm $SOURCE_PATH"
echo ""
echo -e "${GREEN}Template for implementation report:${NC}"
echo ""
cat <<'REPORT'
## Implementation Report
**Completed**: [Date]
**Version**: [Version number from CHANGELOG]
### What Was Built
[Summary of what was actually implemented vs planned]
### Code Locations
**New files:**
- `internal/path/file.go` (XXX LOC) - [Purpose]
**Modified files:**
- `internal/path/existing.go` (+XX/-YY LOC) - [Changes]
### Test Coverage
- Unit tests: XX passing
- Integration tests: YY passing
- Coverage: ZZ%
- Test files: `internal/path/*_test.go`
### Metrics
| Metric | Before | After | Improvement |
|--------|--------|-------|-------------|
| [Metric 1] | X | Y | +Z% |
| [Metric 2] | X | Y | +Z% |
### Known Limitations
- [Limitation 1] - [Why/when to address]
- [Limitation 2] - [Why/when to address]
### Future Work
[Features deferred to later versions]
REPORT
```