Back to skills
SkillHub ClubShip Full StackFull StackTesting

AILANG Sprint Executor

Execute approved sprint plans with test-driven development, continuous linting, progress tracking, and pause points. Use when user says "execute sprint", "start sprint", or wants to implement an approved sprint plan.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
22
Hot score
88
Updated
March 20, 2026
Overall rating
C2.8
Composite score
2.8
Best-practice grade
C55.6

Install command

npx @skill-hub/cli install sunholo-data-ailang-sprint-executor
software developmentproject managementtest-driven developmentautomationgit

Repository

sunholo-data/ailang

Skill path: .claude/skills/sprint-executor

Execute approved sprint plans with test-driven development, continuous linting, progress tracking, and pause points. Use when user says "execute sprint", "start sprint", or wants to implement an approved sprint plan.

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack, Testing.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: sunholo-data.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install AILANG Sprint Executor into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/sunholo-data/ailang before adding AILANG Sprint Executor to shared team environments
  • Use AILANG Sprint Executor for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: AILANG Sprint Executor
description: Execute approved sprint plans with test-driven development, continuous linting, progress tracking, and pause points. Use when user says "execute sprint", "start sprint", or wants to implement an approved sprint plan.
---

# AILANG Sprint Executor

Execute an approved sprint plan with continuous progress tracking, testing, and documentation updates.

## Quick Start

**Most common usage:**
```bash
# User says: "Execute the sprint plan in design_docs/20251019/M-S1.md"
# This skill will:
# 1. Validate prerequisites (tests pass, linting clean)
# 2. Create TodoWrite tasks for all milestones
# 3. Execute each milestone with test-driven development
# 4. Run checkpoint after each milestone (tests + lint)
# 5. Update CHANGELOG and sprint plan progressively
# 6. Pause after each milestone for user review
```

## When to Use This Skill

Invoke this skill when:
- User says "execute sprint", "start sprint", "begin implementation"
- User has an approved sprint plan ready to implement
- User wants guided execution with built-in quality checks
- User needs progress tracking and pause points

## Coordinator Integration

**When invoked by the AILANG Coordinator** (detected by GitHub issue reference in the prompt), you MUST output these markers at the end of your response:

```
IMPLEMENTATION_COMPLETE: true
BRANCH_NAME: coordinator/task-XXXX
FILES_CREATED: file1.go, file2.go
FILES_MODIFIED: file3.go, file4.go
```

**Why?** The coordinator uses these markers to:
1. Track implementation completion
2. Post updates to GitHub issues
3. Trigger the merge approval workflow

**Example completion:**
```
## Implementation Complete

All milestones have been completed and tests pass.

**IMPLEMENTATION_COMPLETE**: true
**BRANCH_NAME**: `coordinator/task-abc123`
**FILES_CREATED**: `internal/new_file.go`, `internal/new_test.go`
**FILES_MODIFIED**: `internal/existing.go`
```

## Core Principles

1. **Test-Driven**: All code must pass tests before moving to next milestone
2. **Lint-Clean**: All code must pass linting before moving to next milestone
3. **Document as You Go**: Update CHANGELOG.md and sprint plan progressively
4. **Pause for Breath**: Stop at natural breakpoints for review and approval
5. **Track Everything**: Use TodoWrite to maintain visible progress
6. **DX-First**: Improve AILANG development experience as we go - make it easier next time

## Multi-Session Continuity (NEW)

**Sprint execution can now span multiple Claude Code sessions!**

Based on [Anthropic's long-running agent patterns](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents), sprint-executor implements the "Coding Agent" pattern:

- **Session Startup Routine**: Every session starts with `session_start.sh`
  - Checks working directory
  - Reads JSON progress file (`.ailang/state/sprints/sprint_<id>.json`)
  - Reviews recent git commits
  - Validates tests pass
  - Prints "Here's where we left off" summary

- **Structured Progress Tracking**: JSON file tracks state
  - Features with `passes: true/false/null` (follows "constrained modification" pattern)
  - Velocity metrics updated automatically
  - Clear checkpoint messages
  - Session timestamps

- **Pause and Resume**: Work can be interrupted at any time
  - Status saved to JSON: `not_started`, `in_progress`, `paused`, `completed`
  - Next session picks up exactly where you left off
  - No loss of context or progress

**For JSON schema details**, see [`resources/json_progress_schema.md`](resources/json_progress_schema.md)

## Available Scripts

### `scripts/session_start.sh <sprint_id>` **NEW**
Resume sprint execution across multiple sessions.

**When to use:** ALWAYS at the start of EVERY session continuing a sprint.

**What it does:**
1. **Syncs GitHub issues** via `ailang messages import-github`
2. Loads sprint JSON progress file
3. Shows linked GitHub issues with titles from local messages
4. Displays feature progress summary
5. Shows velocity metrics
6. Runs tests to verify clean state
7. Prints "Here's where we left off" summary

### `scripts/validate_prerequisites.sh`
Validate prerequisites before starting sprint execution.

**What it checks:**
1. **Syncs GitHub issues** via `ailang messages import-github`
2. Working directory status (clean/uncommitted changes)
3. Current branch (dev or main)
4. Test suite passes
5. Linting passes
6. Shows unread messages (potential issues/feedback)

### `scripts/validate_sprint_json.sh <sprint_id>` **NEW**
**REQUIRED before starting any sprint.** Validates that sprint JSON has real milestones (not placeholders).

**What it checks:**
- No placeholder milestone IDs (`MILESTONE_ID`)
- No placeholder acceptance criteria (`Criterion 1/2`)
- At least 2 milestones defined
- All milestones have custom values (not defaults)
- Dependencies reference valid milestone IDs

**Exit codes:**
- `0` - Valid JSON, ready for execution
- `1` - Invalid JSON or placeholders detected (sprint-planner must fix)

### `scripts/milestone_checkpoint.sh <milestone_name> [sprint_id]`
Run checkpoint after completing a milestone.

**CRITICAL: Tests passing ≠ Feature working!** This script verifies with REAL DATA.

**What it does:**
1. Runs `make test` and `make lint`
2. Shows git diff of changed files
3. Checks file sizes (AI-friendly codebase guidelines)
4. **Runs milestone-specific verification** (database queries, API calls, etc.)
5. Shows JSON update reminder with current milestone statuses

**Sprint-specific verification (e.g., M-TASK-HIERARCHY):**
- **M1**: Checks observatory_sync.go exists, tasks in DB with non-empty IDs
- **M2**: Checks OTEL_RESOURCE_ATTRIBUTES in executor, spans with task attributes
- **M3**: Checks spans have task_id linked
- **M4**: Checks task aggregates (span_count, tokens) are populated
- **M5**: Checks hierarchy API returns data for existing tasks
- **M6**: Checks TaskHierarchy component imported AND connected in UI
- **M7**: Checks backfill command exists and responds to --help

**Example:**
```bash
# Verify M1 (should pass if entity sync works)
.claude/skills/sprint-executor/scripts/milestone_checkpoint.sh M1 M-TASK-HIERARCHY

# Verify M2 (will fail if OTEL attributes not propagated)
.claude/skills/sprint-executor/scripts/milestone_checkpoint.sh M2 M-TASK-HIERARCHY
```

**Exit codes:**
- `0` - Checkpoint passed (all verifications succeeded)
- `1` - Checkpoint FAILED (DO NOT mark milestone complete)

### `scripts/acceptance_test.sh <milestone_id> <test_type>` **NEW**
Run end-to-end acceptance tests (parser, builtin, examples, REPL, e2e).

### `scripts/finalize_sprint.sh <sprint_id> [version]` **NEW**
Finalize a completed sprint by moving design docs and updating status.

**What it does:**
- Moves design doc from `planned/` to `implemented/<version>/`
- Moves sprint plan markdown to `implemented/<version>/`
- Updates design doc status to "IMPLEMENTED"
- Updates sprint JSON status to "completed"
- Updates file paths in sprint JSON

**When to use:** After all milestones pass and sprint is complete.

**Example:**
```bash
.claude/skills/sprint-executor/scripts/finalize_sprint.sh M-BUG-RECORD-UPDATE-INFERENCE v0_4_9
```

## Execution Flow

### Phase 0: Session Resumption (for continuing sprints)

**If this is NOT the first session for this sprint:**

```bash
# ALWAYS run session_start.sh first!
.claude/skills/sprint-executor/scripts/session_start.sh <sprint-id>
```

This prints "Here's where we left off" summary. **Then skip to Phase 2** to continue with the next milestone.

### Phase 1: Initialize Sprint (first session only)

1. **Validate Sprint JSON** - Run `validate_sprint_json.sh <sprint-id>` **REQUIRED FIRST**
   - If validation fails, STOP and notify user that sprint-planner must fix the JSON
   - Do NOT proceed with placeholder milestones
2. **Read Sprint Plan** - Parse markdown + load JSON progress file (`.ailang/state/sprints/sprint_<id>.json`)
3. **Validate Prerequisites** - Run `validate_prerequisites.sh` (tests, linting, git status)
4. **Create Todo List** - Use TodoWrite to track all milestones
5. **Initial Status Update** - Mark sprint as "🔄 In Progress"
6. **Initial DX Review** - Consider tools/helpers that would make sprint easier (see [resources/dx_improvement_patterns.md](resources/dx_improvement_patterns.md))

### Phase 2: Execute Milestones

**For each milestone:**

1. **Pre-Implementation** - Mark milestone as `in_progress` in TodoWrite
2. **Implement** - Write code with DX awareness (helper functions, debug flags, better errors)
3. **Write Tests** - TDD recommended for complex logic, comprehensive coverage required
4. **Verify Quality** - Run `milestone_checkpoint.sh <milestone-name>` (tests + lint must pass)
5. **Update Documentation**:
   - CHANGELOG.md (what, LOC, key decisions)
   - Example files (REQUIRED for new features)
   - Sprint plan markdown (mark milestone as ✅)
6. **Update Sprint JSON** âš ī¸ **CRITICAL** - The checkpoint script reminds you!
   - Update `passes: true/false` in `.ailang/state/sprints/sprint_<id>.json`
   - Set `completed: "<ISO timestamp>"`
   - Add `notes: "<summary of what was done>"`
7. **DX Reflection** - Identify and implement quick wins (<15 min), defer larger improvements
8. **Pause for Breath** - Show progress, ask user if ready to continue

**Quick tips:**
- Use parser test helpers from `internal/parser/test_helpers.go`
- Use `DEBUG_PARSER=1` for token flow tracing
- Use `make doc PKG=<package>` for API discovery
- See [resources/parser_patterns.md](resources/parser_patterns.md) for parser/pattern matching
- See [resources/codegen_patterns.md](resources/codegen_patterns.md) for Go code generation (builtins, expressions)
- See [resources/api_patterns.md](resources/api_patterns.md) for common API patterns
- See [resources/dx_improvement_patterns.md](resources/dx_improvement_patterns.md) for DX opportunities

### Phase 3: Finalize Sprint

1. **Final Testing** - Run `make test`, `make lint`, `make test-coverage-badge`
2. **Documentation Review** - Verify CHANGELOG.md, example files, sprint plan complete
3. **Final Commit** - Git commit with sprint summary (milestones, LOC, velocity)
4. **Move Design Docs** - Run `finalize_sprint.sh <sprint-id> [version]` to:
   - Move design docs from `planned/` to `implemented/<version>/`
   - Update design doc status to IMPLEMENTED
   - Update sprint JSON status to "completed"
   - Update file paths in sprint JSON
5. **Summary Report** - Compare planned vs actual (LOC, time, velocity)
6. **DX Impact Summary** - Document improvements made during sprint

## Key Features

### Continuous Testing
- Run `make test` after every file change
- Never proceed if tests fail
- Track test count increase

### Continuous Linting
- Run `make lint` after implementation
- Fix linting issues immediately
- Use `make fmt` for formatting

### Progress Tracking
- TodoWrite shows real-time progress
- Sprint plan updated at each milestone
- CHANGELOG.md grows incrementally
- **JSON file tracks structured state** (NEW)
- Git commits create audit trail

### GitHub Issue Integration (NEW)
**Uses `ailang messages` for GitHub sync and issue tracking!**

**Automatic sync:**
- `session_start.sh` and `validate_prerequisites.sh` run `ailang messages import-github` first
- Linked issues shown with titles from local message database

If `github_issues` is set in sprint JSON:
- `validate_sprint_json.sh` shows linked issues
- `session_start.sh` displays issue titles from messages
- `milestone_checkpoint.sh` reminds you to include `Refs #...` in commits
- `finalize_sprint.sh` suggests commit message with issue references

**Commit message format:**
```bash
# During development - use "refs" to LINK without closing
git commit -m "Complete M1: Parser foundation, refs #17"

# Final sprint commit - use "Fixes" to AUTO-CLOSE issues on merge
git commit -m "Finalize sprint M-BUG-FIX

Fixes #17
Fixes #42"
```

**Important: "refs" vs "Fixes"**
- `refs #17` - Links commit to issue (NO auto-close)
- `Fixes #17`, `Closes #17`, `Resolves #17` - AUTO-CLOSES issue when merged

**Workflow:**
1. Sprint JSON has `github_issues: [17, 42]` (set by sprint-planner, deduplicated)
2. During development: Use `refs #17` to link commits without closing
3. Final commit: Use `Fixes #17` to auto-close issues on merge
4. No duplicates: `ailang messages import-github` checks existing issues before importing

### Pause Points
- After each milestone completion
- When tests or linting fail (fix before continuing)
- When user requests "pause"
- When encountering unexpected issues

### Error Handling
- **If tests fail**: Show output, ask how to fix, don't proceed
- **If linting fails**: Show output, ask how to fix, don't proceed
- **If implementation unclear**: Ask for clarification, don't guess
- **If milestone takes much longer than estimated**: Pause and reassess

## Resources

### Multi-Session State
- **JSON Schema**: [`resources/json_progress_schema.md`](resources/json_progress_schema.md) - Sprint progress format
- **Session Startup**: Use `session_start.sh` ALWAYS for continuing sprints

### Development Tools
- **Parser & Patterns**: [`resources/parser_patterns.md`](resources/parser_patterns.md) - Parser development + pattern matching pipeline
- **Codegen Patterns**: [`resources/codegen_patterns.md`](resources/codegen_patterns.md) - Go code generation for new features/builtins
- **API Patterns**: [`resources/api_patterns.md`](resources/api_patterns.md) - Common constructor signatures and API gotchas
- **Developer Tools**: [`resources/developer_tools.md`](resources/developer_tools.md) - Make targets, ailang commands, workflows
- **DX Improvements**: [`resources/dx_improvement_patterns.md`](resources/dx_improvement_patterns.md) - Identifying and implementing DX wins
- **DX Quick Reference**: [`resources/dx_quick_reference.md`](resources/dx_quick_reference.md) - ROI calculator, decision matrix
- **Milestone Checklist**: [`resources/milestone_checklist.md`](resources/milestone_checklist.md) - Step-by-step per milestone

### External Documentation
- **Parser Guide**: [docs/guides/parser_development.md](../../docs/guides/parser_development.md)
- **Website**: https://ailang.sunholo.com/

## Progressive Disclosure

This skill loads information progressively:

1. **Always loaded**: This SKILL.md file (YAML frontmatter + execution workflow) - ~250 lines
2. **Execute as needed**: Scripts in `scripts/` directory (validation, checkpoints, testing)
3. **Load on demand**: Resources in `resources/` directory (detailed guides, patterns, references)

## Prerequisites

- Working directory clean (or only sprint-related changes)
- Current branch `dev` (or specified in sprint plan)
- All existing tests pass
- All existing linting passes
- Sprint plan approved and documented
- **JSON progress file created AND POPULATED by sprint-planner** (not just template!)
- **JSON must pass validation**: `scripts/validate_sprint_json.sh <sprint-id>`

## Failure Recovery

### If Tests Fail During Sprint
1. Show test failure output
2. Ask user: "Tests failing. Options: (a) fix now, (b) revert change, (c) pause sprint"
3. Don't proceed until tests pass

### If Linting Fails During Sprint
1. Show linting output
2. Try auto-fix: `make fmt`
3. If still failing, ask user for guidance
4. Don't proceed until linting passes

### If Implementation Blocked
1. Show what's blocking progress
2. Ask user for guidance or clarification
3. Consider simplifying the approach
4. Document the blocker in sprint plan

### If Velocity Much Lower Than Expected
1. Pause and reassess after 2-3 milestones
2. Calculate actual velocity
3. Propose: (a) continue as-is, (b) reduce scope, (c) extend timeline
4. Update sprint plan with revised estimates

## Coordinator Integration (v0.6.2+)

The sprint-executor skill integrates with the AILANG Coordinator for automated workflows.

### Autonomous Workflow

When configured in `~/.ailang/config.yaml`, the sprint-executor agent:
1. Receives handoff messages from sprint-planner
2. Executes sprint plans with TDD and continuous linting
3. Creates approval request for code review before merge

```yaml
coordinator:
  agents:
    - id: sprint-executor
      inbox: sprint-executor
      workspace: /path/to/ailang
      capabilities: [code, test, docs]
      trigger_on_complete: []  # End of chain
      auto_approve_handoffs: false
      auto_merge: false
      session_continuity: true
      max_concurrent_tasks: 1
```

### Receiving Handoffs from sprint-planner

The sprint-executor receives:
```json
{
  "type": "plan_ready",
  "correlation_id": "sprint_M-CACHE_20251231",
  "sprint_id": "M-CACHE",
  "plan_path": "design_docs/planned/v0_6_3/m-cache-sprint-plan.md",
  "progress_path": ".ailang/state/sprints/sprint_M-CACHE.json",
  "session_id": "claude-session-xyz",
  "estimated_duration": "3 days",
  "total_loc_estimate": 650
}
```

### Session Continuity

With `session_continuity: true`:
- Receives `session_id` from sprint-planner handoff
- Uses `--resume SESSION_ID` for Claude Code CLI
- Preserves full conversation context from design → planning → execution
- Maintains understanding of design decisions and rationale

### Human-in-the-Loop

With `auto_merge: false`:
1. Sprint is executed in isolated worktree
2. All milestones completed with tests passing
3. Approval request shows:
   - Git diff of all changes
   - Test results
   - CHANGELOG updates
4. Human reviews implementation quality
5. Approve → Changes merged to main branch
6. Reject → Worktree preserved for fixes

### End of Pipeline

As the last agent in the chain:
- `trigger_on_complete: []` means no automatic handoff
- Final approval merges directly to main branch
- Sprint JSON updated to `status: completed`
- Design docs moved to `implemented/` directory

## Notes

- This skill is long-running - expect it to take hours or days
- Pause points are built in - you're not locked into finishing
- Sprint plan is the source of truth - but reality may require adjustments
- Git commits create a reversible audit trail
- TodoWrite provides real-time visibility into progress
- Test-driven development is non-negotiable - tests must pass
- **Multi-session continuity** - Sprint can span multiple Claude Code sessions (NEW)
- **JSON state tracking** - Structured progress in `.ailang/state/sprints/sprint_<id>.json` (NEW)


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### resources/json_progress_schema.md

```markdown
# Sprint Progress JSON Schema

This document defines the JSON format for sprint progress tracking, following the [Anthropic long-running agent patterns](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents).

## Purpose

The JSON progress file enables **multi-session continuity** by:
- Providing structured, machine-readable progress state
- Following the "constrained modification" pattern (only `passes` field changes)
- Enabling session resumption across multiple Claude Code sessions
- Supporting velocity tracking and estimation accuracy

## File Location

Sprint progress files are stored in `.ailang/state/` directory:

```
.ailang/state/sprints/sprint_<id>.json
```

**Examples:**
- `.ailang/state/sprints/sprint_M-S1.json` (for sprint M-S1)
- `.ailang/state/sprints/sprint_M-POLY-A.json` (for sprint M-POLY-A)

## Schema Definition

### Root Object

```json
{
  "sprint_id": "string",
  "created": "ISO 8601 timestamp",
  "estimated_duration_days": "number",
  "correlation_id": "string",
  "design_doc": "string (path)",
  "markdown_plan": "string (path)",
  "features": [...],
  "velocity": {...},
  "last_session": "ISO 8601 timestamp",
  "last_checkpoint": "string | null",
  "status": "not_started | in_progress | paused | completed"
}
```

### Features Array

Each feature in the sprint:

```json
{
  "id": "string",
  "description": "string",
  "estimated_loc": "number",
  "actual_loc": "number | null",
  "dependencies": ["string"],
  "acceptance_criteria": ["string"],
  "passes": null | true | false,
  "started": "ISO 8601 timestamp | null",
  "completed": "ISO 8601 timestamp | null",
  "notes": "string | null"
}
```

**âš ī¸ CRITICAL: Constrained Modification Pattern**

According to the Anthropic article, only the `passes` field should be modified during execution:
- ✅ **Allowed**: Change `passes` from `null` → `true` or `false`
- ✅ **Allowed**: Update `actual_loc`, `completed`, `notes` (progress tracking)
- ❌ **Forbidden**: Change `description`, `acceptance_criteria` (prevents accidental requirement changes)
- ❌ **Forbidden**: Remove features from array (prevents losing work)
- ❌ **Forbidden**: Add new features mid-sprint (add to backlog instead)

### Velocity Object

Tracks planned vs actual velocity:

```json
{
  "target_loc_per_day": "number",
  "actual_loc_per_day": "number",
  "target_milestones_per_week": "number",
  "actual_milestones_per_week": "number",
  "estimated_total_loc": "number",
  "actual_total_loc": "number",
  "estimated_days": "number",
  "actual_days": "number | null"
}
```

## Complete Example

### Initial State (Created by sprint-planner)

```json
{
  "sprint_id": "M-S1",
  "created": "2025-01-27T10:00:00Z",
  "estimated_duration_days": 7,
  "correlation_id": "sprint_M-S1",
  "design_doc": "design_docs/planned/v0_4_0/m-s1-parser-improvements.md",
  "markdown_plan": "design_docs/planned/v0_4_0/m-s1-sprint-plan.md",
  "features": [
    {
      "id": "M-S1.1",
      "description": "Parser foundation - Add helper functions",
      "estimated_loc": 200,
      "actual_loc": null,
      "dependencies": [],
      "acceptance_criteria": [
        "Can parse basic type expressions",
        "Test coverage > 80%",
        "No parser errors on example files"
      ],
      "passes": null,
      "started": null,
      "completed": null,
      "notes": null
    },
    {
      "id": "M-S1.2",
      "description": "Type integration - Connect parser to type checker",
      "estimated_loc": 150,
      "actual_loc": null,
      "dependencies": ["M-S1.1"],
      "acceptance_criteria": [
        "Types resolve correctly",
        "Type errors show source location"
      ],
      "passes": null,
      "started": null,
      "completed": null,
      "notes": null
    },
    {
      "id": "M-S1.3",
      "description": "Testing - Add comprehensive parser tests",
      "estimated_loc": 300,
      "actual_loc": null,
      "dependencies": ["M-S1.1", "M-S1.2"],
      "acceptance_criteria": [
        "Test coverage > 90%",
        "All edge cases covered",
        "Golden files updated"
      ],
      "passes": null,
      "started": null,
      "completed": null,
      "notes": null
    }
  ],
  "velocity": {
    "target_loc_per_day": 200,
    "actual_loc_per_day": 0,
    "target_milestones_per_week": 5,
    "actual_milestones_per_week": 0,
    "estimated_total_loc": 650,
    "actual_total_loc": 0,
    "estimated_days": 7,
    "actual_days": null
  },
  "last_session": "2025-01-27T10:00:00Z",
  "last_checkpoint": null,
  "status": "not_started"
}
```

### In Progress (Updated by sprint-executor - Session 1)

```json
{
  "sprint_id": "M-S1",
  "created": "2025-01-27T10:00:00Z",
  "estimated_duration_days": 7,
  "correlation_id": "sprint_M-S1",
  "design_doc": "design_docs/planned/v0_4_0/m-s1-parser-improvements.md",
  "markdown_plan": "design_docs/planned/v0_4_0/m-s1-sprint-plan.md",
  "features": [
    {
      "id": "M-S1.1",
      "description": "Parser foundation - Add helper functions",
      "estimated_loc": 200,
      "actual_loc": 214,
      "dependencies": [],
      "acceptance_criteria": [
        "Can parse basic type expressions",
        "Test coverage > 80%",
        "No parser errors on example files"
      ],
      "passes": true,
      "started": "2025-01-27T10:30:00Z",
      "completed": "2025-01-27T14:30:00Z",
      "notes": "Added test helpers, took 20% longer than estimated due to edge cases"
    },
    {
      "id": "M-S1.2",
      "description": "Type integration - Connect parser to type checker",
      "estimated_loc": 150,
      "actual_loc": 0,
      "dependencies": ["M-S1.1"],
      "acceptance_criteria": [
        "Types resolve correctly",
        "Type errors show source location"
      ],
      "passes": null,
      "started": "2025-01-27T14:45:00Z",
      "completed": null,
      "notes": "In progress - working on type resolution"
    },
    {
      "id": "M-S1.3",
      "description": "Testing - Add comprehensive parser tests",
      "estimated_loc": 300,
      "actual_loc": null,
      "dependencies": ["M-S1.1", "M-S1.2"],
      "acceptance_criteria": [
        "Test coverage > 90%",
        "All edge cases covered",
        "Golden files updated"
      ],
      "passes": null,
      "started": null,
      "completed": null,
      "notes": null
    }
  ],
  "velocity": {
    "target_loc_per_day": 200,
    "actual_loc_per_day": 53.5,
    "target_milestones_per_week": 5,
    "actual_milestones_per_week": 1.75,
    "estimated_total_loc": 650,
    "actual_total_loc": 214,
    "estimated_days": 7,
    "actual_days": 1
  },
  "last_session": "2025-01-27T14:30:00Z",
  "last_checkpoint": "M-S1.1 complete - tests pass, linting clean",
  "status": "in_progress"
}
```

### Paused (User requested pause after Session 1)

```json
{
  "sprint_id": "M-S1",
  ...
  "features": [
    {
      "id": "M-S1.1",
      "passes": true,
      "completed": "2025-01-27T14:30:00Z",
      ...
    },
    {
      "id": "M-S1.2",
      "passes": null,
      "started": "2025-01-27T14:45:00Z",
      "completed": null,
      "notes": "In progress - paused mid-implementation. Next: wire up type resolution logic"
    },
    ...
  ],
  "last_session": "2025-01-27T16:00:00Z",
  "last_checkpoint": "M-S1.1 complete, M-S1.2 50% complete",
  "status": "paused"
}
```

### Completed (All features done)

```json
{
  "sprint_id": "M-S1",
  "created": "2025-01-27T10:00:00Z",
  "estimated_duration_days": 7,
  "correlation_id": "sprint_M-S1",
  "design_doc": "design_docs/implemented/v0_4_0/m-s1-parser-improvements.md",
  "markdown_plan": "design_docs/implemented/v0_4_0/m-s1-sprint-plan.md",
  "features": [
    {
      "id": "M-S1.1",
      "passes": true,
      "actual_loc": 214,
      "completed": "2025-01-27T14:30:00Z",
      ...
    },
    {
      "id": "M-S1.2",
      "passes": true,
      "actual_loc": 163,
      "completed": "2025-01-28T10:15:00Z",
      ...
    },
    {
      "id": "M-S1.3",
      "passes": true,
      "actual_loc": 287,
      "completed": "2025-01-28T16:00:00Z",
      ...
    }
  ],
  "velocity": {
    "target_loc_per_day": 200,
    "actual_loc_per_day": 186,
    "target_milestones_per_week": 5,
    "actual_milestones_per_week": 5.25,
    "estimated_total_loc": 650,
    "actual_total_loc": 664,
    "estimated_days": 7,
    "actual_days": 3.5
  },
  "last_session": "2025-01-28T16:00:00Z",
  "last_checkpoint": "All milestones complete - sprint done!",
  "status": "completed"
}
```

## Usage Workflow

### sprint-planner Creates Initial File

```bash
# In sprint-planner skill, after creating markdown plan:
.claude/skills/sprint-planner/scripts/create_sprint_json.sh \
  "M-S1" \
  "design_docs/planned/v0_4_0/m-s1-sprint-plan.md"

# Creates: .ailang/state/sprints/sprint_M-S1.json
# Sends handoff message with correlation_id to sprint-executor
```

### sprint-executor Reads and Updates File

```bash
# Session 1: Start sprint
.claude/skills/sprint-executor/scripts/session_start.sh "M-S1"
# Reads .ailang/state/sprints/sprint_M-S1.json
# Prints "Here's where we left off" summary

# During milestone completion:
# - Update feature.passes to true/false
# - Update feature.actual_loc
# - Update velocity metrics
# - Update last_session timestamp
# - Update last_checkpoint description

# Session 2: Resume sprint (next day)
.claude/skills/sprint-executor/scripts/session_start.sh "M-S1"
# Reads JSON, sees M-S1.1 complete, M-S1.2 in progress
# Continues from where we left off
```

## Why JSON Instead of Markdown?

From the Anthropic article:

> **Using structured JSON prevents accidental modifications better than markdown.**
>
> The feature list constrains agents to mark only the `passes` field, with strong instructions preventing removal or editing of test descriptions.

**Benefits:**
1. **Machine-readable**: Easy to parse and validate
2. **Constrained updates**: Clear which fields can/can't change
3. **Atomic updates**: jq or Go can update specific fields safely
4. **Validation**: Can validate schema before/after updates
5. **History tracking**: Git shows exactly what changed
6. **Multi-tool support**: Any tool can read/write JSON

**Markdown limitations:**
- Easy to accidentally delete lines
- Parsing is fragile (format changes break parsers)
- Hard to enforce field constraints
- Difficult to validate programmatically

## Validation

### Required Fields

- `sprint_id` - Must be unique, match design doc
- `created` - ISO 8601 timestamp
- `features[]` - Must have at least one feature
- `features[].id` - Must be unique within sprint
- `features[].passes` - Must be null, true, or false
- `velocity` - All fields must be present
- `status` - Must be valid enum value

### Validation Script

```bash
# Validate JSON structure
jq -e . .ailang/state/sprints/sprint_M-S1.json >/dev/null

# Check required fields
jq -e '.sprint_id, .created, .features, .velocity, .status' \
  .ailang/state/sprints/sprint_M-S1.json >/dev/null

# Check passes field only has valid values
jq -e '[.features[].passes] | all(. == null or . == true or . == false)' \
  .ailang/state/sprints/sprint_M-S1.json
```

## Migration from Markdown

**Old workflow:**
1. sprint-planner creates markdown plan
2. sprint-executor marks milestones with ✅ in markdown
3. Hard to parse, easy to corrupt

**New workflow:**
1. sprint-planner creates both markdown (human-readable) and JSON (machine-readable)
2. sprint-executor updates JSON (authoritative), markdown (informational)
3. JSON is source of truth for state, markdown is for documentation

**Backwards compatibility:**
- Old sprint plans (markdown only) still work
- New sprints (JSON + markdown) get improved resumption support
- No migration needed for old sprints

```

### resources/dx_improvement_patterns.md

```markdown
# DX Improvement Patterns

**Comprehensive guide to improving AILANG development experience during sprint execution.**

This resource documents patterns for identifying and implementing DX improvements as you work.

## When to Think About DX

**DX-aware implementation:**
- If you're writing boilerplate, could it be a helper function?
- If you're debugging something, could a debug flag help?
- If you're looking things up repeatedly, should it be documented?
- If an error message confused you, would it confuse others?
- If a test is verbose, could test helpers make it cleaner?

## When to Act on DX Ideas

- đŸŸĸ **Quick (<15 min)**: Do it now as part of this milestone
- 🟡 **Medium (15-30 min)**: Note in TODO list, do at end of milestone if time allows
- 🔴 **Large (>30 min)**: Note for design doc in reflection step

## Common DX Improvement Patterns

### 1. Repetitive Boilerplate → Helper Functions

**Signals:**
- Copying/pasting the same test setup code
- Same validation logic repeated across functions
- Common error handling patterns duplicated

**Quick fixes (5-15 min):**
- Extract to helper function in same package
- Add to `*_helpers.go` file
- Document with usage example
- Add tests for helper if complex

**Example:** M-DX9 added `AssertNoErrors(t, p)` after noticing parser test boilerplate.

**Before DX thinking:**
```go
if p.Errors() != nil {
    // Manually check each error...
}
```

**After DX thinking - Add helper:**
```go
AssertNoErrors(t, p)  // Helper added for reuse
```

### 2. Hard-to-Debug Issues → Debug Flags

**Signals:**
- Adding temporary `fmt.Printf()` statements
- Manually tracing execution flow
- Repeatedly inspecting internal state

**Quick fixes (5-10 min):**
- Add `DEBUG_<SUBSYSTEM>=1` environment variable check
- Gate debug output behind flag (zero overhead when off)
- Document in CLAUDE.md or code comments

**Example:** M-DX9 added `DEBUG_PARSER=1` for token flow tracing.

**Before DX thinking:**
```go
// Manually inspecting tokens with fmt.Printf
fmt.Printf("cur=%v peek=%v\n", p.curToken, p.peekToken)
```

**After DX thinking - Add debug mode:**
```go
// DEBUG_PARSER=1 automatically traces token flow
```

### 3. Manual Workflows → Make Targets

**Signals:**
- Running multi-step commands repeatedly
- Forgetting command flags or order
- Different team members using different commands

**Quick fixes (3-5 min):**
- Add `make <target>` with clear name
- Document what it does in `make help`
- Show example usage in relevant docs

**Example:** `make update-golden` for parser test golden files.

### 4. Confusing APIs → Documentation

**Signals:**
- Looking up API signatures multiple times
- Trial-and-error with function arguments
- Grep-diving to understand usage

**Quick fixes (10-20 min):**
- Add package-level godoc with examples
- Document common patterns in CLAUDE.md
- Add usage examples to function comments
- Create `make doc PKG=<package>` target if missing

**Example:** M-TESTING documented common API patterns in CLAUDE.md.

### 5. Poor Error Messages → Actionable Errors

**Signals:**
- Error doesn't explain what went wrong
- No suggestion for how to fix
- Missing context (line numbers, file names)

**Quick fixes (5-15 min):**
- Add context to error message
- Suggest fix or workaround
- Link to documentation if relevant
- Include values that triggered error

**Example:**
```go
// ❌ Before
return fmt.Errorf("parse error")

// ✅ After
return fmt.Errorf("parse error at %s:%d: expected RPAREN, got %s. Did you forget to close the argument list? See: https://ailang.sunholo.com/docs/guides/parser_development#common-issues",
    p.filename, p.curToken.Line, p.curToken.Type)
```

### 6. Painful Testing → Test Utilities

**Signals:**
- Verbose test setup/teardown
- Repeated value construction
- Brittle test assertions

**Quick fixes (10-20 min):**
- Create test helper package (e.g., `testctx/`)
- Add value constructors (e.g., `MakeString()`, `MakeInt()`)
- Add assertion helpers (e.g., `AssertNoErrors()`)

**Example:** M-DX1 added `testctx` package for builtin testing.

## DX ROI Calculator

**When deciding whether to implement a DX improvement:**

```
Time saved per use × Expected uses = Total savings
If Total savings > Implementation time + Maintenance → DO IT
```

**Examples:**
- Helper function: 2 min × 20 uses = 40 min saved, costs 10 min → ROI = 4x ✅
- Debug flag: 15 min × 5 uses = 75 min saved, costs 8 min → ROI = 9x ✅
- Documentation: 5 min × 30 uses = 150 min saved, costs 20 min → ROI = 7.5x ✅
- New skill: 30 min × 2 uses = 60 min saved, costs 120 min → ROI = 0.5x ❌ (create design doc for later)

**Note:** ROI compounds over time as more developers/sprints benefit!

## DX Reflection After Each Milestone

**After each milestone, reflect on the development experience:**

**Ask yourself:**
- What was painful during this milestone?
- What took longer than expected due to tooling gaps?
- What did we have to lookup multiple times?
- What errors/bugs could better tooling prevent?

**Categorize DX improvements:**

**đŸŸĸ Quick wins (<15 min) - Do immediately:**
- Add helper function to reduce boilerplate
- Add debug flag for better visibility
- Improve error message with actionable suggestion
- Add make target for common workflow
- Document pattern in code comments

**🟡 Medium improvements (15-30 min) - Add to current sprint if time allows:**
- Create test utility package
- Add validation script
- Improve CLI flag organization
- Add comprehensive examples

**🔴 Large improvements (>30 min) - Create design doc:**
- New skill for complex workflow
- Major architectural change
- New developer tool or subsystem
- Significant codebase reorganization

**Document in milestone summary:**
```markdown
## DX Improvements (Milestone X)

✅ **Applied**: Added `AssertNoErrors(t, p)` test helper (5 min)
📝 **Deferred**: Created M-DX10 design doc for parser AST viewer tool (estimated 2 hours)
💡 **Considered**: Better REPL error messages (added to backlog)
```

## DX Impact Summary Template

**Consolidate all DX improvements made during sprint:**

```markdown
## DX Improvements Summary (Sprint M-XXX)

### Applied During Sprint
✅ **Test Helpers** (Day 2, 10 min): Added `AssertNoErrors()` and `AssertLiteralInt()` helpers
   - Impact: Reduced test boilerplate by ~30%
   - Files: internal/parser/test_helpers.go

✅ **Debug Flag** (Day 4, 5 min): Added `DEBUG_PARSER=1` for token tracing
   - Impact: Eliminated 2 hours of token position debugging
   - Files: internal/parser/debug.go

✅ **Make Target** (Day 6, 3 min): Added `make update-golden` for parser test updates
   - Impact: Simplified golden file workflow
   - Files: Makefile

### Design Docs Created
📝 **M-DX10**: Parser AST Viewer Tool (estimated 2 hours)
   - Rationale: Spent 45 min manually inspecting AST structures
   - Expected ROI: Save ~30 min per future parser sprint
   - File: design_docs/planned/v0_4_0/m-dx10-ast-viewer.md

📝 **M-DX11**: Unified Error Message System (estimated 4 hours)
   - Rationale: Error messages inconsistent across lexer/parser/type checker
   - Expected ROI: Easier debugging for AI and humans
   - File: design_docs/planned/v0_4_0/m-dx11-error-system.md

### Considered But Deferred
💡 **REPL history search**: Nice-to-have, low impact vs effort
💡 **Syntax highlighting**: Human-focused, AILANG is AI-first
💡 **Auto-completion**: Deferred until reflection system complete

### Total DX Investment This Sprint
- Time spent: 18 min (quick wins)
- Time saved: ~3 hours (estimated, based on future sprint projections)
- Design docs: 2 (total estimated effort: 6 hours for future sprints)
- **Net impact**: Positive ROI even in current sprint
```

## Examples from Real Sprints

### M-DX9: Parser Developer Experience (October 2025)

**Applied:**
- Test helpers (`AssertNoErrors`, `AssertLiteralInt`, etc.) - 10 min
- Debug flag (`DEBUG_PARSER=1`) - 5 min
- Enhanced error messages with context - 15 min

**Impact:**
- Reduced parser test boilerplate by 40%
- Eliminated token position debugging time
- 30% faster parser development

**ROI:** 30 min investment → 2+ hours saved per parser sprint

### M-DX1: Builtin Developer Experience (September 2025)

**Applied:**
- Central registry system - 2 hours (larger improvement)
- Type Builder DSL - 1 hour
- Mock test context - 30 min

**Impact:**
- Reduced builtin development time from 7.5h to 2.5h per builtin
- 67% faster development
- Zero-friction testing with hermetic mocks

**ROI:** 3.5 hours investment → 5 hours saved per builtin (14x ROI after 3 builtins)

### M-TESTING: Test Infrastructure (November 2025)

**Applied:**
- API discovery with `make doc` - 5 min
- Common constructor reference in CLAUDE.md - 15 min

**Impact:**
- 80% faster API lookups (5-10 min → 30 sec)
- Prevented 23 API misuse errors

**ROI:** 20 min investment → 2+ hours saved on Day 7 alone

## Quick Reference Card

| Problem | Solution | Time | ROI |
|---------|----------|------|-----|
| Test boilerplate | Helper functions | 10 min | 4x |
| Hard to debug | Debug flag | 5 min | 9x |
| Manual workflow | Make target | 3 min | High |
| Confusing API | Documentation | 15 min | 7.5x |
| Poor error | Better message | 5 min | Medium |
| Verbose tests | Test utilities | 15 min | High |

```

### resources/parser_patterns.md

```markdown
# Parser Development Tools & Pattern Matching Pipeline

**Comprehensive reference for parser and pattern matching development.**

## Parser Development Tools (M-DX9)

### Quick Reference Tools

1. **Comprehensive Guide**: [docs/guides/parser_development.md](../../../docs/guides/parser_development.md)
   - Quick start with example (adding new expression type)
   - Token position convention (AT vs AFTER) - prevents 30% of bugs
   - Common AST types reference
   - Parser patterns (delimited lists, optional sections, precedence)
   - Test infrastructure guide
   - Debug tools reference
   - Common gotchas and troubleshooting

2. **Test Helpers**: [internal/parser/test_helpers.go](../../../internal/parser/test_helpers.go)
   - 15 helper functions for cleaner parser tests
   - `AssertNoErrors(t, p)` - Check for parser errors
   - `AssertLiteralInt/String/Bool/Float(t, expr, value)` - Check literals
   - `AssertIdentifier(t, expr, name)` - Check identifiers
   - `AssertFuncCall/List/ListLength(t, expr)` - Check structures
   - `AssertDeclCount/FuncDecl/TypeDecl(t, file, ...)` - Check declarations
   - All helpers call `t.Helper()` for clean stack traces

3. **Debug Tooling**: [internal/parser/debug.go](../../../internal/parser/debug.go), [internal/parser/delimiter_trace.go](../../../internal/parser/delimiter_trace.go)
   - `DEBUG_PARSER=1` environment variable for token flow tracing
   - Shows ENTER/EXIT with cur/peek tokens for parseExpression, parseType
   - Zero overhead when disabled
   - Example: `DEBUG_PARSER=1 ailang run test.ail`

   **Delimiter Stack Tracer (v0.3.21):**
   - `DEBUG_DELIMITERS=1` environment variable for delimiter matching tracing
   - Shows opening/closing of `{` `}` with context (match, block, case, function)
   - Visual indentation shows nesting depth
   - Detects delimiter mismatches and shows expected vs actual
   - Shows stack state on errors
   - Example: `DEBUG_DELIMITERS=1 ailang run test.ail`
   - **Use when**: Debugging nested match expressions, finding unmatched braces, understanding complex nesting

4. **Enhanced Error Messages** (v0.3.21): [internal/parser/parser_error.go](../../../internal/parser/parser_error.go)
   - Context-aware hints for delimiter errors
   - Shows nesting depth when inside nested constructs
   - Suggests DEBUG_DELIMITERS=1 for deep nesting issues
   - Specific guidance for `}`, `)`, `]` errors
   - Actionable workarounds (simplify nesting, use let bindings)

5. **AST Usage Examples**: [internal/ast/ast.go](../../../internal/ast/ast.go)
   - Comprehensive documentation on 6 major AST types
   - Usage examples for Identifier, Literal, Lambda, FuncCall, List, FuncDecl
   - âš ī¸ **CRITICAL**: int64 vs int gotcha prominently documented
   - Common parser patterns for each type

6. **Quick Reference**: CLAUDE.md "Parser Developer Experience Guide" section
   - Token position convention
   - Common AST types
   - Quick token lookup
   - Parsing optional sections pattern
   - Test error printing pattern

### When to Use These Tools

- ✅ Any sprint touching `internal/parser/` code
- ✅ Any sprint adding new expression/statement/type syntax
- ✅ Any sprint modifying AST nodes
- ✅ When encountering token position bugs
- ✅ When writing parser tests

### Impact

M-DX9 tools reduce parser development time by 30% by eliminating token position debugging overhead.

## Pattern Matching Pipeline (M-DX10)

**For pattern matching sprints (adding/fixing patterns), understand the 4-layer pipeline.**

Pattern changes propagate through: parser → elaborator → type checker → evaluator. Each layer transforms the pattern representation.

### The 4-Layer Pipeline

#### 1. Parser ([internal/parser/parser_pattern.go](../../../internal/parser/parser_pattern.go))

- **Input**: Source syntax (e.g., `::(x, rest)`, `(a, b)`, `[]`)
- **Output**: AST pattern nodes (`ast.ConstructorPattern`, `ast.TuplePattern`, `ast.ListPattern`)
- **Role**: Recognize pattern syntax and build AST
- **Example**: `::(x, rest)` → `ast.ConstructorPattern{Name: "::", Patterns: [x, rest]}`

#### 2. Elaborator ([internal/elaborate/patterns.go](../../../internal/elaborate/patterns.go))

- **Input**: AST patterns
- **Output**: Core patterns (`core.ConstructorPattern`, `core.TuplePattern`, `core.ListPattern`)
- **Role**: Convert surface syntax to core representation
- **âš ī¸ Special cases**: Some AST patterns transform differently in Core!
  - `::` ConstructorPattern → `ListPattern{Elements: [head], Tail: tail}` (M-DX10)
  - Why: Lists are `ListValue` at runtime, not `TaggedValue` with constructors

#### 3. Type Checker ([internal/types/patterns.go](../../../internal/types/patterns.go))

- **Input**: Core patterns
- **Output**: Pattern types, exhaustiveness checking
- **Role**: Infer pattern types, check coverage
- **Example**: `::(x: int, rest: List[int])` → `List[int]`

#### 4. Evaluator ([internal/eval/eval_patterns.go](../../../internal/eval/eval_patterns.go))

- **Input**: Core patterns + runtime values
- **Output**: Pattern match success/failure + bindings
- **Role**: Runtime pattern matching against values
- **âš ī¸ CRITICAL**: Pattern type must match Value type!
  - `ListPattern` matches `ListValue`
  - `ConstructorPattern` matches `TaggedValue`
  - `TuplePattern` matches `TupleValue`
  - Mismatch = pattern never matches!

### Cross-References in Code

Each layer has comments pointing to the next layer:

```go
// internal/parser/parser_pattern.go
case lexer.DCOLON:
    // Parses :: pattern syntax
    // See internal/elaborate/patterns.go for elaboration to Core

// internal/elaborate/patterns.go
case *ast.ConstructorPattern:
    if p.Name == "::" {
        // Special case: :: elaborates to ListPattern
        // See internal/eval/eval_patterns.go for runtime matching
    }

// internal/eval/eval_patterns.go
case *core.ListPattern:
    // Matches against ListValue at runtime
    // If pattern type doesn't match value type, match fails
```

### Common Pattern Gotchas

#### 1. Two-Phase Fix Required (M-DX10 Lesson)

- **Symptom**: Parser accepts pattern, but runtime never matches
- **Cause**: Parser fix alone isn't enough - elaborator also needs fixing
- **Solution**: Check elaborator transforms pattern correctly for runtime
- **Example**: `::` parsed as `ConstructorPattern`, but must elaborate to `ListPattern`

#### 2. Pattern Type Mismatch

- **Symptom**: Pattern looks correct but never matches any value
- **Cause**: Pattern type doesn't match value type in evaluator
- **Debug**: Check `matchPattern()` in `eval_patterns.go` - does pattern type match value type?

#### 3. Special Syntax Requires Special Elaboration

- **Symptom**: Standard elaboration doesn't work for custom syntax
- **Solution**: Add special case in elaborator (like `::` → `ListPattern`)
- **When**: Syntax sugar, built-in constructors, or ML-style patterns

### When to Use This Guide

**Use when:**
- ✅ Adding new pattern syntax (e.g., `::`, `@`, guards)
- ✅ Fixing pattern matching bugs
- ✅ Understanding why patterns don't match at runtime
- ✅ Debugging elaboration or evaluation of patterns

**Quick checklist for pattern changes:**
1. Parser: Does `parsePattern()` recognize the syntax?
2. Elaborator: Does it transform to correct Core pattern type?
3. Type Checker: Does pattern type inference work?
4. Evaluator: Does pattern type match value type at runtime?

### Impact

Understanding this pipeline prevents two-phase fix discoveries and reduces pattern debugging time by 50%.

```

### resources/codegen_patterns.md

```markdown
# Code Generation Patterns

**Quick reference for adding Go code generation support for new AILANG features.**

## When Codegen Support is Needed

**Codegen support IS required when:**
- Adding new Core AST expression types (e.g., new operators, new literals)
- Adding effect builtins that need handler wiring (e.g., `_io_*`, `_fs_*`)
- Adding pure builtins that map to Go stdlib (e.g., `_math_*`, `_str_*`)
- Adding new ADT patterns or record operations
- Adding new runtime helpers (e.g., list ops, type conversions)

**Codegen support is NOT required when:**
- Adding interpreter-only builtins (no Go compilation needed)
- Modifying type inference (types package only)
- Adding parser features that don't change Core AST
- Adding CLI commands or tooling

## File Organization

```
internal/gen/golang/
├── codegen.go             # Main generator, Generate(), ADT/record registration
├── codegen_decl.go        # Declaration generation (functions, let bindings)
├── codegen_expr.go        # Main expression dispatch (generateExpr)
├── codegen_expr_simple.go # Literals, Var, VarGlobal, Lambda, builtin mappings
├── codegen_expr_app.go    # Function application (App)
├── codegen_expr_let.go    # Let/LetRec bindings
├── codegen_expr_control.go# If expressions, block expressions
├── codegen_ops.go         # BinOp, UnOp, Record, List, Tuple, DictAbs
├── codegen_match.go       # Pattern matching (Match expressions)
├── codegen_block.go       # Block expression handling
├── codegen_runtime.go     # Entry point for runtime helpers
├── codegen_runtime_arith.go    # Arithmetic helpers (Add, Sub, etc.)
├── codegen_runtime_collections.go # List/array helpers
├── codegen_runtime_records.go  # Record helpers (FieldGet, RecordUpdate)
├── codegen_runtime_misc.go     # Function calling, misc helpers
├── types.go               # TypeMapper, AILANG → Go type conversions
├── naming.go              # Go naming conventions (ToGoVarName, ToPascalCase)
├── effects.go             # Effect handler registration
├── adt.go                 # ADT struct generation
```

## Common Workflows

### 1. Adding a Builtin Function Mapping

**When:** Adding a new builtin that compiles to Go code (not interpreted).

**Location:** `codegen_expr_simple.go`

**Example - Adding a math function:**

```go
// In mapPureMathBuiltin() function:
func (g *Generator) mapPureMathBuiltin(name string) string {
    mathMappings := map[string]string{
        // ... existing mappings ...

        // NEW: Add hyperbolic functions
        "_math_sinh": "math.Sinh",
        "_math_cosh": "math.Cosh",
        "_math_tanh": "math.Tanh",
        "sinh":       "math.Sinh",  // stdlib wrapper
        "cosh":       "math.Cosh",
        "tanh":       "math.Tanh",
    }

    if expr, ok := mathMappings[name]; ok {
        g.needsMathImport = true  // Mark math import needed
        return expr
    }
    return ""
}
```

**Example - Adding an effect builtin:**

```go
// In mapEffectBuiltinToHandler() function:
func mapEffectBuiltinToHandler(name string) string {
    effectMappings := map[string]string{
        // ... existing mappings ...

        // NEW: Add database effect
        "_db_query":   "requireDB().Query",
        "_db_execute": "requireDB().Execute",
        "db_query":    "requireDB().Query",   // stdlib wrapper
        "db_execute":  "requireDB().Execute",
    }
    return effectMappings[name]
}
```

**Checklist:**
- [ ] Add mapping in appropriate function (`mapPureMathBuiltin` or `mapEffectBuiltinToHandler`)
- [ ] Add both `_builtin_name` (internal) and `builtin_name` (stdlib wrapper) variants
- [ ] Set import flags if needed (`g.needsMathImport = true`)
- [ ] Add tests in `codegen_*_test.go`

### 2. Adding a New Core Expression Type

**When:** New AST node type that needs code generation.

**Location:** `codegen_expr.go` (dispatch) + dedicated handler file

**Step 1: Add case to generateExpr dispatch:**

```go
// In codegen_expr.go:
func (g *Generator) generateExpr(expr core.Expr) error {
    switch e := expr.(type) {
    // ... existing cases ...

    case *core.NewExprType:
        return g.generateNewExprType(e)

    default:
        return fmt.Errorf("unsupported expression type: %T", expr)
    }
}
```

**Step 2: Implement the handler:**

```go
// In appropriate file (codegen_expr_*.go):
func (g *Generator) generateNewExprType(e *core.NewExprType) error {
    // Generate Go code for this expression
    g.writef("/* generated code for NewExprType */")
    return nil
}
```

**Checklist:**
- [ ] Add case to `generateExpr()` switch
- [ ] Implement handler function
- [ ] Handle nested expressions recursively (call `g.generateExpr()`)
- [ ] Manage indentation (`g.indent++`, `g.indent--`)
- [ ] Add comprehensive tests

### 3. Adding Runtime Helpers

**When:** Generated code needs supporting Go functions at runtime.

**Location:** `codegen_runtime_*.go` files

**Step 1: Add to writeRuntimeHelpers dispatch:**

```go
// In codegen_runtime.go:
func (g *Generator) writeRuntimeHelpers() {
    g.writeRuntimeRecordHelpers()
    g.writeRuntimeListHelpers()
    g.writeRuntimeArithmeticHelpers()
    g.writeRuntimeMiscHelpers()
    g.writeRuntimeSliceConverters()
    g.writeArrayRuntimeFunctions()
    g.writeADTSliceConverters()

    // NEW: Add custom helpers
    g.writeNewFeatureHelpers()
}
```

**Step 2: Implement helper generator:**

```go
// In codegen_runtime_misc.go (or new file):
func (g *Generator) writeNewFeatureHelpers() {
    g.writef(`
// NewFeatureHelper does something useful
func NewFeatureHelper(x interface{}) interface{} {
    // Implementation
    return x
}
`)
}
```

**Checklist:**
- [ ] Add call in `writeRuntimeHelpers()`
- [ ] Implement helper generator function
- [ ] Use `g.writef()` with backtick strings for multi-line
- [ ] Consider `skipRuntimeHelpers` flag for multi-file compilation
- [ ] Add tests that verify generated helper code

### 4. Adding Type Mappings

**When:** New AILANG type needs Go representation.

**Location:** `types.go`

**Example:**

```go
// In TypeMapper.MapType():
func (tm *TypeMapper) MapType(t types.Type) string {
    switch ty := t.(type) {
    // ... existing cases ...

    case *types.TNewType:
        return "GoNewType"

    default:
        return "interface{}"
    }
}
```

**Checklist:**
- [ ] Add case to `MapType()` switch
- [ ] Consider pointer vs value semantics
- [ ] Update `GoTypeToAILANG()` for reverse mapping if needed
- [ ] Add tests in `types_test.go`

## Testing Patterns

### Basic Codegen Test

```go
// In codegen_*_test.go:
func TestGenerateNewFeature(t *testing.T) {
    // Build Core AST
    prog := &core.Program{
        Decls: []core.Decl{
            &core.FuncDecl{
                Name: "testFunc",
                Body: &core.NewExprType{...},
            },
        },
    }

    // Generate code
    gen := New("main")
    code, err := gen.Generate(prog)
    if err != nil {
        t.Fatalf("Generate failed: %v", err)
    }

    // Verify output contains expected Go code
    codeStr := string(code)
    if !strings.Contains(codeStr, "expectedGoCode") {
        t.Errorf("Missing expected code.\nGot:\n%s", codeStr)
    }
}
```

### Integration Test (Compile + Run)

```go
func TestNewFeatureIntegration(t *testing.T) {
    src := `
module test
let result = newFeature(42)
`
    // Parse → Elaborate → TypeCheck → Generate → Compile → Run
    // See existing integration tests for patterns
}
```

## Common Pitfalls

### 1. Forgetting Interface{} Type Assertions

**Problem:** Generated code uses `interface{}` but Go operations need concrete types.

```go
// ❌ WRONG - Won't compile
func Add(a, b interface{}) interface{} {
    return a + b  // Can't add interface{} values!
}

// ✅ CORRECT - Type assert first
func Add(a, b interface{}) interface{} {
    return a.(int64) + b.(int64)
}
```

### 2. Missing Import Tracking

**Problem:** Generated code uses stdlib but import not included.

```go
// ❌ WRONG - Forgets to track import
func (g *Generator) mapSomething(name string) string {
    return "math.Sqrt"  // But math not imported!
}

// ✅ CORRECT - Track import
func (g *Generator) mapSomething(name string) string {
    g.needsMathImport = true  // Mark import needed
    return "math.Sqrt"
}
```

### 3. Indentation Mismatch

**Problem:** Generated code has wrong indentation.

```go
// ❌ WRONG - Forgot to decrement
g.indent++
g.generateExpr(body)
// Missing g.indent--!

// ✅ CORRECT - Always balance
g.indent++
g.generateExpr(body)
g.indent--
```

### 4. Not Handling Nested Expressions

**Problem:** Only generates outer expression, not inner.

```go
// ❌ WRONG - Just writes literal
func (g *Generator) generateIf(e *core.If) error {
    g.writef("if %v { ... }", e.Cond)  // Cond not generated!
}

// ✅ CORRECT - Generate nested expression
func (g *Generator) generateIf(e *core.If) error {
    g.writef("if ")
    if err := g.generateExpr(e.Cond); err != nil {
        return err
    }
    g.writef(" { ... }")
}
```

## Quick Reference: Key Functions

| Function | Purpose | Location |
|----------|---------|----------|
| `generateExpr()` | Main expression dispatch | codegen_expr.go |
| `generateDecl()` | Declaration generation | codegen_decl.go |
| `mapEffectBuiltinToHandler()` | Effect builtin → handler | codegen_expr_simple.go |
| `mapPureMathBuiltin()` | Math builtin → Go math | codegen_expr_simple.go |
| `writeRuntimeHelpers()` | Runtime helper dispatch | codegen_runtime.go |
| `MapType()` | AILANG type → Go type | types.go |
| `ToGoVarName()` | AILANG name → Go var name | naming.go |
| `ToPascalCase()` | Convert to exported name | naming.go |

## Debugging

```bash
# View generated code without formatting
go test -v -run TestMyFeature ./internal/gen/golang/

# Check type mapper
make doc PKG=internal/gen/golang | grep -A5 "TypeMapper"

# Trace generation with verbose output
# Add debug prints in generateExpr() temporarily
```

## Related Resources

- **Builtin Developer Skill**: [`../.claude/skills/builtin-developer/SKILL.md`](../../builtin-developer/SKILL.md) - For adding builtins to interpreter
- **Parser Patterns**: [`parser_patterns.md`](parser_patterns.md) - For AST changes
- **API Patterns**: [`api_patterns.md`](api_patterns.md) - Common constructor signatures

```

### resources/api_patterns.md

```markdown
# Common API Patterns

**Quick reference for AILANG internal APIs learned from M-TESTING and other sprints.**

## Key Principle

**âš ī¸ ALWAYS check `make doc PKG=<package>` before grepping or guessing APIs!**

API discovery with `make doc` is 80% faster than manual searching (5-10 min → 30 sec per lookup).

## Quick API Lookup

```bash
# Find constructor signatures
make doc PKG=internal/testing | grep "NewCollector"
# Output: func NewCollector(modulePath string) *Collector

# Find struct fields
make doc PKG=internal/ast | grep -A 20 "type FuncDecl"
# Shows: Tests []*TestCase, Properties []*Property
```

## Common Constructors

| Package | Constructor | Signature | Notes |
|---------|-------------|-----------|-------|
| `internal/testing` | `NewCollector(path)` | Takes module path | M-TESTING |
| `internal/elaborate` | `NewElaborator()` | No arguments | Surface → Core |
| `internal/types` | `NewTypeChecker(core, imports)` | Takes Core prog + imports | Type inference |
| `internal/link` | `NewLinker()` | No arguments | Dictionary linking |
| `internal/parser` | `New(lexer)` | Takes lexer instance | Parser |
| `internal/eval` | `NewEvaluator(ctx)` | Takes EffContext | Core evaluator |

## Common API Mistakes

### Test Collection (M-TESTING)

```go
// ✅ CORRECT
collector := testing.NewCollector("module/path")
suite := collector.Collect(file)
for _, test := range suite.Tests { ... }  // Tests is the slice!

// ❌ WRONG
collector := testing.NewCollector(file, modulePath)  // Wrong arg order!
for _, test := range suite.Tests.Cases { ... }      // No .Cases field!
```

### String Formatting

```go
// ✅ CORRECT
name := fmt.Sprintf("test_%d", i+1)

// ❌ WRONG - Produces "\x01" not "1"!
name := "test_" + string(rune(i+1))  // BUG!
```

### Field Access

```go
// ✅ CORRECT
funcDecl.Tests        // []*ast.TestCase
funcDecl.Properties   // []*ast.Property

// ❌ WRONG
funcDecl.InlineTests  // Doesn't exist! Use .Tests
```

## API Discovery Workflow

1. **`make doc PKG=<package>`** (~30 sec) ← Start here!
2. Check source file if you know location (`grep "^func New" file.go`)
3. Check test files for usage examples (`grep "NewCollector" *_test.go`)
4. Read [docs/guides/](../../../docs/guides/) for complex workflows

**Time savings**: 80% reduction (5-10 min → 30 sec per lookup)

## M-DX10: Unit-Argument Model for "Nullary" Builtins

**AILANG has no true nullary functions.** All functions that appear to take no arguments actually take a `unit` parameter.

### The Rule

| Surface syntax | Desugars to | Builtin registration |
|----------------|-------------|---------------------|
| `f()` | `f(())` | `NumArgs: 1`, type `() -> T` |
| `keys()` | `keys(())` | `NumArgs: 1`, type `() -> list[string]` |

### Implementing "Zero-Arg" Builtins

```go
// ✅ CORRECT - Unit-argument model
RegisterEffectBuiltin(BuiltinSpec{
    Name:    "_sharedmem_keys",
    NumArgs: 1,  // Takes unit parameter!
    Type:    func() types.Type {
        T := types.NewBuilder()
        return T.Func(T.Unit()).Returns(T.List(T.String())).Effects("SharedMem")
    },
    Impl: func(ctx *effects.EffContext, args []eval.Value) (eval.Value, error) {
        // args[0] is unit, ignored (but validates arity)
        // ... implementation
    },
})

// ❌ WRONG - True nullary (doesn't work!)
RegisterEffectBuiltin(BuiltinSpec{
    Name:    "_sharedmem_keys",
    NumArgs: 0,  // BUG: Will cause "arity mismatch: 0 vs 1"
    // ...
})
```

### Stdlib Wrappers

```ailang
-- ✅ CORRECT - Call with ()
export func keys(u: unit) -> list[string] ! {SharedMem} {
    _sharedmem_keys(u)
}
-- Or using expression body:
export func now() -> int ! {Clock} = _clock_now()

-- ❌ WRONG - Missing ()
export func now() -> int ! {Clock} = _clock_now  -- Returns function object!
```

### Go Tests for "Zero-Arg" Builtins

```go
// ✅ CORRECT - Pass unit value
result, err := sharedMemKeysImpl(ctx, []eval.Value{&eval.UnitValue{}})

// ❌ WRONG - Empty slice
result, err := sharedMemKeysImpl(ctx, []eval.Value{})  // Arity mismatch!
```

### Why This Model?

- **ML tradition**: Follows OCaml/SML where `()` is the canonical "no value"
- **Uniform semantics**: `f x` works for all functions; `f ()` is just applying unit
- **Higher-order friendly**: `let f = now` has type `() -> int ! {Clock}`
- **S-CALL0 sugar**: Parser automatically desugars `f()` → `f(())`

**Reference**: [M-DX10 design doc](../../../../design_docs/implemented/v0_4_6/m-dx10-nullary-function-calls.md)

## Full Reference

See CLAUDE.md "Common API Patterns" section for additional patterns and examples.

```

### resources/developer_tools.md

```markdown
# AILANG Developer Tools Quick Reference

**For sprint-executor skill** - Load this when you need to know which tools are available for specific development tasks.

## Core Development Tools

### Building & Installing

```bash
# Local build
make build                 # Build to bin/ailang

# System installation
make install               # Full install with version info (slow)
make quick-install         # Fast reinstall (recommended during dev)

# Development workflow
make dev                   # Quick development build
make watch                 # Watch mode (local build)
make watch-install         # Watch mode (auto-install to PATH)
```

**After code changes, always run:**
```bash
make quick-install && ailang run --caps IO examples/hello.ail
```

### Testing

#### Core Testing
```bash
make test                  # All tests (excludes scripts/)
make test-coverage         # Tests with coverage report (coverage.html)
make test-coverage-badge   # Quick coverage check (shows percentage)
make ci                    # Full CI verification locally
make ci-strict             # Extended CI with A2 milestone gates (pre-release)
```

#### Specialized Testing
```bash
# Parser
make test-parser           # Parser tests with golden files
make test-parser-update    # Update parser golden files after changes

# Import system
make test-imports          # Test import system (success + errors)
make test-import-errors    # Test import error goldens
make regen-import-error-goldens  # Regenerate import error goldens

# Operators
make test-lowering         # Test operator lowering (desugaring)
make test-operator-assertions    # Test operator desugaring assertions
make verify-lowering       # Verify operator lowering

# Type system
make test-row-properties   # Test row unification properties
make test-golden-types     # Test builtin type snapshots
make test-iface-determinism # Test interface determinism

# Builtins
make test-builtin-consistency # Test builtin three-way parity
make test-builtin-freeze   # Freeze builtin type signatures
make doctor                # Validate builtin registry

# Stdlib
make test-stdlib-canaries  # Test stdlib health (std/io, std/net)
make test-stdlib-freeze    # Verify stdlib interfaces haven't changed
make freeze-stdlib         # Generate SHA256 golden files for stdlib
make verify-stdlib         # Verify stdlib against golden hashes

# REPL
make test-repl-smoke       # REPL smoke tests (:type command)
make test-parity           # Test REPL/file parity (manual, requires interactive REPL)

# Regression & Guards
make test-regression-guards # Run regression guard tests
make test-recursion        # Test recursion limits
make verify-no-shim        # Verify no shim code exists

# Golden file management
make check-golden-drift    # Check for drift in golden files
```

#### Coverage & Quality Gates
```bash
# Coverage reports
make cover-lines           # Show parser line coverage
make cover-branch          # Open parser branch coverage HTML
make cover-lexer           # Lexer coverage
make cover-parser          # Parser coverage
make cover-all-packages    # Coverage across all packages

# Quality gates
make gate-lexer            # Lexer quality gates
make gate-parser           # Parser quality gates
make gate-all-packages     # Quality gates for all packages
```

### Code Quality

```bash
# Formatting
make fmt                   # Format all Go code (gofmt)
make fmt-check             # Check if code is formatted (CI)

# Linting
make lint                  # Run golangci-lint
make install-lint          # Install golangci-lint

# Static analysis
make vet                   # Run go vet
```

### File Size Management (AI-Friendly Codebase)

```bash
# Check file sizes
make check-file-sizes      # Fail if any file >800 lines (CI check)
make report-file-sizes     # Report all files >500 lines
make codebase-health       # Full codebase health metrics
make largest-files         # Show 20 largest files

# Target: 0 files over 800 lines, <5 files between 500-800 lines
# Use codebase-organizer agent to refactor large files
```

## Stdlib Development

### When Modifying Stdlib

```bash
# 1. Make changes to stdlib/std/*.ail

# 2. Verify interfaces haven't broken
make verify-stdlib         # Checks SHA256 hashes match golden files

# 3. If intentional breaking change
make freeze-stdlib         # Update golden hashes

# 4. Run health checks
make test-stdlib-canaries  # Tests std/io, std/net work
make test-stdlib-freeze    # Verify stdlib interface freeze
```

### Stdlib Tools

```bash
# Output normalized JSON interface
ailang iface stdlib/std/io.ail    # See module interface

# Freeze/verify workflow
tools/freeze-stdlib.sh            # Generate SHA256 golden files
tools/verify-stdlib.sh            # Verify against goldens
```

## Golden File Testing

**What are golden files?** Pre-recorded expected outputs for deterministic testing.

### When Parser/Types Change

```bash
# 1. Run tests
make test-parser

# 2. If expected failures (e.g., parser produces new AST format)
make test-parser-update    # Update golden files

# 3. Review changes
git diff tests/golden/

# 4. Check for drift
make check-golden-drift
```

### Other Golden File Targets

```bash
make test-golden-types     # Test builtin type snapshots
make test-lowering         # Test operator lowering goldens
make regen-import-error-goldens  # Regenerate import error goldens
```

## Evaluation & Benchmarking

### Running Evals

```bash
# Development (cheap models: gpt5-mini, claude-haiku, gemini-flash)
make eval-suite
ailang eval-suite          # Same as above

# Full suite (all 6 models: GPT-5, GPT-5-mini, Claude Sonnet/Haiku, Gemini Pro/Flash)
make eval-suite FULL=true
ailang eval-suite --full

# Baseline for release (REQUIRES version!)
make eval-baseline EVAL_VERSION=v0.3.15              # Dev models
make eval-baseline EVAL_VERSION=v0.3.15 FULL=true    # All models

# Resume interrupted baseline (v0.3.14+)
ailang eval-suite --full --skip-existing  # Skip benchmarks with existing results
```

**CRITICAL**: `ailang eval-suite` OVERWRITES output directory by default!
```bash
# ❌ WRONG - Second run overwrites first
ailang eval-suite --models gpt5
ailang eval-suite --models claude-sonnet-4-5  # DELETES gpt5 results!

# ✅ CORRECT - Run all models in ONE command
ailang eval-suite --models gpt5,claude-sonnet-4-5,gemini-2-5-pro
```

### Comparing & Reporting

```bash
# Compare two baselines
ailang eval-compare eval_results/baselines/v0.3.14 eval_results/baselines/v0.3.15

# Generate comprehensive report
ailang eval-report eval_results/baselines/v0.3.15 v0.3.15 --format=markdown

# Update dashboard JSON (preserves history!)
ailang eval-report eval_results/baselines/v0.3.15 v0.3.15 --format=json
# ✅ Automatically writes to docs/static/benchmarks/latest.json with history

# Performance matrix
ailang eval-matrix eval_results/baselines/v0.3.15 v0.3.15

# Summary
ailang eval-summary eval_results/baselines/v0.3.15
```

### Advanced Eval Workflows

```bash
# Validate specific fix
ailang eval-validate fizzbuzz eval_results/baselines/v0.3.14
# Runs just fizzbuzz against baseline to verify fix works

# A/B test prompts
make eval-prompt-ab PROMPT_A=v0.3.8 PROMPT_B=v0.3.9
tools/eval_prompt_ab.sh v0.3.8 v0.3.9

# Auto-improve prompts
make eval-auto-improve          # Analyzes failures, suggests improvements
make eval-auto-improve-apply    # Applies suggestions

# Full workflow: evals → analysis → design docs
make eval-to-design
tools/eval-to-design.sh
```

### Eval Analysis

```bash
# Analyze eval failures and generate design docs
make eval-analyze              # With deduplication
make eval-analyze-fresh        # Force new design docs (disable dedup)
ailang eval-analyze eval_results/ --output design_docs/planned/

# List available models
make eval-models
ailang eval-suite --list-models
```

### Eval Cleanup

```bash
make eval-clean                # Remove eval_results/
```

## Example Management

```bash
# Verify examples
make verify-examples           # Verify all working examples
make verify-examples-all       # Verify all examples (including broken)
make examples-status           # Show example status

# Update documentation
make update-readme             # Update README with example status
make flag-broken               # Add warning headers to broken examples

# Audit examples
tools/audit-examples.sh        # Comprehensive example audit
```

## Documentation

### Documentation Site (Docusaurus)

```bash
# Install dependencies
make docs-install              # cd docs && npm install

# Development
make docs-serve                # Start dev server (http://localhost:3000)
make docs                      # Alias for docs-serve
make docs-preview              # Preview production build

# Build
make docs-build                # Build for production

# Troubleshooting
make docs-clean                # Clear Docusaurus cache
make docs-restart              # Clear cache + restart dev server

# Manual cache clearing (for webpack chunk errors)
cd docs && npm run clear && rm -rf .docusaurus build && npm start
```

### Documentation Sync

```bash
# Sync prompts to docs site (Docusaurus)
make sync-prompts              # Copies prompts/ to docs/docs/prompts/
docs/scripts/sync-prompts.sh   # (Active prompt + recent versions)

# Generate LLM-friendly context
make generate-llms-txt         # Creates docs/static/llms.txt
tools/generate-llms-txt.sh
```

### Update Benchmark Dashboard

```bash
# Generate dashboard files (markdown + JSON with history)
ailang eval-report eval_results/baselines/v0.3.15 v0.3.15 --format=docusaurus 2>/dev/null > docs/docs/benchmarks/performance.md

# Update JSON (preserves history automatically!)
ailang eval-report eval_results/baselines/v0.3.15 v0.3.15 --format=json
# Writes to docs/static/benchmarks/latest.json

# Verify JSON is valid
jq -r '.version, .aggregates.finalSuccess' docs/static/benchmarks/latest.json

# Clear cache and test
cd docs && npm run clear && npm start
```

## WASM

```bash
make build-wasm                # Build WASM binary for browser REPL
# Output: docs/static/wasm/ailang.wasm
```

## Fuzz Testing

```bash
make fuzz-parser               # Fuzz test parser (short run)
make fuzz-parser-long          # Long-running fuzz test
```

## Debug & Introspection

### `--debug-types` - Type Inference Debugging (v0.5.11+)

**When to use**: Debugging type inference issues, understanding constraint resolution, or investigating "why does this have type X?"

```bash
# Show full type inference debug output
ailang run --debug-types --caps IO --entry main examples/runnable/debug_types_demo.ail

# Filter to specific node ID
ailang run --debug-types --node 42 --caps IO --entry main file.ail
```

**Output sections**:
- **Substitution Map**: Type variable bindings (α → int)
- **Constraints**: Num/Eq/Ord constraints added and resolved
- **CoreTI Entries**: Type info for each Core AST node

**Demo file**: See `examples/runnable/debug_types_demo.ail` for complete example.

**Documentation**: [Debugging Guide](docs/docs/guides/debugging.md)

### Debug Effect (v0.4.10+) - Runtime Tracing

For debugging AILANG code at runtime, use the Debug effect:

```ailang
import std/debug as Debug

func update(e: Entity) -> Entity ! {Debug} {
    Debug.check(e.health >= 0, "health must be non-negative");
    Debug.log("updating entity " ++ show(e.id));
    -- ... entity logic
}
```

**Run with Debug capability:**
```bash
ailang run --caps IO,Debug --entry main game.ail
```

**Key features:**
- `Debug.log(msg)` - Write trace message (host collects)
- `Debug.check(cond, msg)` - Record assertion (doesn't throw, continues execution)
- Ghost effect - erased in `--release` mode (zero runtime cost)
- Write-only from AILANG - only host calls `DebugContext.Collect()`

**Note:** Use `Debug.check` not `Debug.assert` (`assert` is a reserved keyword).

### Debug AST Command

**NEW in v0.3.16**: Inspect Core AST (ANF) and inferred types

```bash
# Show Core AST with node IDs
ailang debug ast example.ail

# Show Core AST with inferred types
ailang debug --show-types ast example.ail

# Compact output (no indentation)
ailang debug --compact ast example.ail
```

**Example output** (`ailang debug --show-types ast concat.ail`):
```
=== Core AST (ANF) ===
Program:
  [0] Let(xs) [#13] :: [Îą7]:
    Value: List[3] [#4] :: [Îą1]:
      [0]: Lit(1) [#1] :: Îą1
      [1]: Lit(2) [#2] :: Îą2
      [2]: Lit(3) [#3] :: Îą3
    Body:  Let(ys) [#12] :: [Îą7]:
      Value: List[3] [#8] :: [Îą4]:
        [0]: Lit(4) [#5] :: Îą4
        [1]: Lit(5) [#6] :: Îą5
        [2]: Lit(6) [#7] :: Îą6
      Body:  Intrinsic(11) [#11] :: [Îą7]:
        Arg[0]: Var(xs) [#9] :: [int]
        Arg[1]: Var(ys) [#10] :: [int]
```

**Use cases**:
- Debug operator lowering (see which types were inferred for operators)
- Understand ANF transformations (see Let bindings, Intrinsics)
- Validate type inference results (verify CoreTypeInfo is populated correctly)
- Learn AILANG internals (see how surface syntax becomes Core AST)
- Investigate type errors (see where inference stopped)

**What you see**:
- `[#N]` - Node ID (used by CoreTypeInfo for type-guided lowering)
- `:: Type` - Inferred type from type checker (when --show-types used)
- `Intrinsic(N)` - Operator intrinsics (e.g., 11 = OpConcat for `++`)
- `Let` bindings - ANF explicit sequencing
- `VarGlobal` - Builtin function references

## Utilities

```bash
# Dependencies
make deps                      # Download Go dependencies (go mod download)

# Cleanup
make clean                     # Remove build artifacts and coverage files

# Help
make help                      # Show all available make targets
make help-release              # Show release workflow
```

## Coordinator Daemon

The coordinator is an always-on daemon that processes tasks automatically using AI agents.

### Basic Commands

```bash
# Start the coordinator daemon
ailang coordinator start

# Check status
ailang coordinator status
ailang coordinator status --json     # JSON output

# Stop the daemon
ailang coordinator stop
```

### Configuration Options

```bash
# Start with custom settings
ailang coordinator start --poll-interval 60s  # Check for tasks every 60s
ailang coordinator start --max-worktrees 5    # Allow 5 concurrent tasks
```

### Delegating Tasks

Claude can delegate tasks to the coordinator:

```bash
# Send a task to the coordinator
ailang messages send user "Fix bug in parser" --type bug --title "Parser bug fix"

# Check coordinator status
ailang coordinator status

# View coordinator logs
tail -f ~/.ailang/logs/coordinator.log
```

### When to Use

Use the coordinator when:
- Task is large and can run autonomously
- You want to parallelize multiple tasks
- Task should continue after session ends
- You need isolated git worktree execution

## Skills (Auto-Invoked)

**Don't call these manually - they're auto-invoked by Claude when appropriate.**

```bash
# Use via natural language:
"Ready to release v0.3.15"     → release-manager skill
"Update benchmarks"            → post-release skill
"Plan the next sprint"         → sprint-planner skill
"Execute the sprint plan"      → sprint-executor skill (YOU ARE HERE!)
"Create a new skill"           → skill-builder skill
"Create a design doc"          → design-doc-creator skill
"Help me write AILANG code"    → use-ailang skill
```

## Agents (Auto-Invoked)

**Don't call these manually - they're auto-invoked by Claude when appropriate.**

```bash
# Use via natural language:
"Run evals and compare results"        → eval-orchestrator agent
"Implement fix from eval failure"      → eval-fix-implementer agent
"Split this large file"                → codebase-organizer agent
"Sync docs with code changes"          → docs-sync-guardian agent
"Verify code matches design spec"      → design-spec-auditor agent
"Analyze test coverage"                → test-coverage-guardian agent
```

## Common Workflows

### During Sprint Execution

```bash
# After each code change
make quick-install             # Fast reinstall
make test                      # Verify tests pass
make lint                      # Verify linting passes

# At milestone checkpoints
.claude/skills/sprint-executor/scripts/milestone_checkpoint.sh "Milestone name"

# Before committing
make test                      # All tests must pass
make lint                      # All linting must pass
make check-file-sizes          # Verify no files >800 lines
```

### After Parser/Type System Changes

```bash
# 1. Run tests
make test-parser

# 2. If golden files need updating
make test-parser-update

# 3. Review changes
git diff tests/golden/

# 4. Verify other systems
make test-lowering             # Operator lowering
make test-row-properties       # Row unification
make test-golden-types         # Builtin types
```

### After Stdlib Changes

```bash
# 1. Verify no breaking changes
make verify-stdlib

# 2. If intentional breaking change
make freeze-stdlib

# 3. Run health checks
make test-stdlib-canaries
```

### Before Releasing

```bash
# Use the release-manager skill:
"Ready to release v0.3.15"

# Manual pre-release checks:
make ci-strict                 # Full CI with A2 gates
make check-file-sizes          # No files >800 lines
make verify-stdlib             # Stdlib unchanged (or frozen)
make test-coverage-badge       # Coverage check
```

### After Releasing

```bash
# Use the post-release skill:
"Update benchmarks for v0.3.15"

# Manual post-release:
make eval-baseline EVAL_VERSION=v0.3.15 FULL=true
ailang eval-report eval_results/baselines/v0.3.15 v0.3.15 --format=json
ailang eval-report eval_results/baselines/v0.3.15 v0.3.15 --format=docusaurus > docs/docs/benchmarks/performance.md
cd docs && npm run clear && npm start
```

## Tool Discovery Cheat Sheet

| I need to... | Tool |
|--------------|------|
| Build locally | `make build` |
| Install to system | `make quick-install` |
| Run all tests | `make test` |
| Run tests with coverage | `make test-coverage` |
| Update parser goldens | `make test-parser-update` |
| Verify stdlib unchanged | `make verify-stdlib` |
| Freeze stdlib after change | `make freeze-stdlib` |
| Run full CI | `make ci` or `make ci-strict` |
| Check file sizes | `make check-file-sizes` |
| Run evals (dev models) | `make eval-suite` |
| Run evals (all models) | `make eval-suite FULL=true` |
| Baseline for release | `make eval-baseline EVAL_VERSION=vX.Y.Z` |
| Compare baselines | `ailang eval-compare <dir1> <dir2>` |
| Validate specific fix | `ailang eval-validate <benchmark> <baseline>` |
| Update dashboard | `ailang eval-report <dir> <ver> --format=json` |
| Format code | `make fmt` |
| Lint code | `make lint` |
| Verify examples | `make verify-examples` |
| Debug AST and types | `ailang debug --show-types ast <file>` |
| Debug type inference | `ailang run --debug-types <file>` |
| Inspect Core AST | `ailang debug ast <file>` |
| Start docs server | `make docs-serve` |
| Clear docs cache | `make docs-clean` |
| Release new version | Use `release-manager` skill |
| Post-release tasks | Use `post-release` skill |
| Refactor large file | Use `codebase-organizer` agent |

## Emergency Recovery

### Tests Failing During Sprint

```bash
# 1. Show test output
make test

# 2. Options:
# (a) Fix now - implement fix
# (b) Revert change - git restore <file>
# (c) Pause sprint - commit WIP, resume later
```

### Linting Failing During Sprint

```bash
# 1. Try auto-fix
make fmt

# 2. If still failing, show output
make lint

# 3. Fix manually or ask for guidance
```

### Stdlib Accidentally Changed

```bash
# 1. Check what changed
make verify-stdlib

# 2. If unintentional
git restore stdlib/std/*.ail

# 3. If intentional
make freeze-stdlib
```

### Docusaurus Webpack Errors

```bash
# Nuclear option - clear everything
cd docs && npm run clear && rm -rf .docusaurus build && npm start
```

### Golden Files Out of Sync

```bash
# Parser goldens
make test-parser-update

# Import error goldens
make regen-import-error-goldens

# Stdlib goldens
make freeze-stdlib

# Review all changes
git diff tests/golden/ .stdlib-golden/
```

---

**Last updated**: 2025-10-21 for v0.3.16 development
**Maintained by**: DX Tools Documentation Initiative
**See also**:
- CLAUDE.md (comprehensive development guide)
- design_docs/planned/v0_3_16/dx-tools-documentation-audit.md (full audit)
- .claude/skills/README.md (skills overview)

```

### resources/dx_quick_reference.md

```markdown
# DX Quick Reference Card

Use this during sprint execution to quickly identify and act on DX improvements.

## Decision Matrix

| Time to Implement | Action |
|-------------------|--------|
| < 15 min | đŸŸĸ **DO NOW** - Add to current milestone |
| 15-30 min | 🟡 **DEFER** - Add to TODO, do at milestone end if time allows |
| > 30 min | 🔴 **DESIGN DOC** - Create in `design_docs/planned/vX_Y/m-dx*.md` |

## Common Patterns

### 1. Boilerplate → Helper Function (5-15 min)
**Signal**: Copy/paste same code 3+ times
**Fix**: Extract to `*_helpers.go`
**Example**: `AssertNoErrors(t, p)`

### 2. Hard to Debug → Debug Flag (5-10 min)
**Signal**: Adding `fmt.Printf()` for visibility
**Fix**: Add `DEBUG_<SUBSYSTEM>=1` check
**Example**: `DEBUG_PARSER=1 ailang run test.ail`

### 3. Manual Workflow → Make Target (3-5 min)
**Signal**: Multi-step command repeated
**Fix**: Add `make <target>` with docs
**Example**: `make update-golden`

### 4. Confusing API → Documentation (10-20 min)
**Signal**: Looking up same API 3+ times
**Fix**: Add godoc, CLAUDE.md section, or `make doc`
**Example**: M-TESTING API patterns in CLAUDE.md

### 5. Bad Error → Actionable Message (5-15 min)
**Signal**: Error doesn't explain how to fix
**Fix**: Add context + suggestion + docs link
**Example**: "expected RPAREN at line 42, got COMMA. See: ..."

### 6. Verbose Test → Test Utility (10-20 min)
**Signal**: Test setup takes 10+ lines
**Fix**: Add helpers in `testctx/` or `*_helpers.go`
**Example**: `MakeString()`, `AssertLiteralInt()`

## ROI Formula

```
Time saved per use × Expected uses = Total savings
If Total savings > Implementation time → DO IT
```

**Examples**:
- Helper (10 min impl): 2 min × 20 uses = 40 min → ROI = 4x ✅
- Debug flag (8 min impl): 15 min × 5 uses = 75 min → ROI = 9x ✅
- Documentation (20 min impl): 5 min × 30 uses = 150 min → ROI = 7.5x ✅

## Reflection Questions (Ask After Each Milestone)

- [ ] What was painful during this milestone?
- [ ] What took longer than expected due to tooling gaps?
- [ ] What did I lookup multiple times?
- [ ] What errors/bugs could better tooling prevent?
- [ ] Did I write any boilerplate that could be a helper?
- [ ] Did I add debug print statements that should be a flag?
- [ ] Would this error message confuse others?

## Documentation Template (Per Milestone)

```markdown
## DX Improvements (Milestone X)

✅ **Applied**: <What> (<Time>)
   - Impact: <Benefit>
   - Files: <Locations>

📝 **Deferred**: Created M-DX* design doc (<Estimated time>)
   - Rationale: <Why needed>
   - Expected ROI: <Time saved>
   - File: design_docs/planned/vX_Y/m-dx*-name.md

💡 **Considered**: <What> (added to backlog / rejected because...)
```

## Design Doc Template (For Large DX Improvements)

When creating DX design docs for improvements >30 min:

```markdown
# M-DX*: [Title]

## Problem
What pain point does this solve? How did we discover it?

## Current Workaround
What do we do now? Why is it painful?

## Proposed Solution
What should we build? High-level approach.

## Estimated Effort
- Implementation: X hours
- Testing: Y hours
- Documentation: Z hours
- **Total**: X+Y+Z hours

## Expected ROI
- Time saved per use: A minutes
- Expected uses per sprint: B times
- Expected uses per year: C times
- **Total savings**: A × C minutes/year
- **ROI**: (A × C) / ((X+Y+Z) × 60) = D.D×

## Acceptance Criteria
- [ ] What defines "done"?
- [ ] How do we measure success?

## Related
- Sprint: M-XXX (where pain point discovered)
- Related DX improvements: M-DX*, M-DX*
```

## Compiler Debugging (v0.5.11+)

### `--debug-types` - Type Inference Visibility

When debugging type inference issues during development:

```bash
# Full type inference debug output
ailang run --debug-types --caps IO --entry main file.ail

# Filter to specific node
ailang run --debug-types --node 42 --caps IO --entry main file.ail
```

**Output sections:**
- **Substitution Map**: α → int (type variable bindings)
- **Constraints**: Num, Eq, Ord constraints and resolution
- **CoreTI Entries**: Per-node type information

**Use when:**
- Type inference produces unexpected results
- Need to understand constraint resolution
- Debugging "expected X, got Y" errors

**Demo file:** `examples/runnable/debug_types_demo.ail`

## Debug Effect for Runtime Tracing (v0.4.10+)

When implementing features that need runtime tracing, use the Debug effect:

```ailang
import std/debug as Debug

func update(e: Entity) -> Entity ! {Debug} {
    Debug.check(e.health >= 0, "health must be non-negative");
    Debug.log("updating entity " ++ show(e.id));
    -- ... entity logic
}
```

**Key benefits:**
- **Write-only**: AILANG code writes, host collects - no branching on debug state
- **Ghost effect**: Erased in `--release` mode (zero runtime cost in production)
- **Structured output**: `DebugContext.Collect()` returns JSON-serializable data
- **Assertions don't throw**: `Debug.check(false, msg)` records failure but continues

**Run with Debug capability:**
```bash
ailang run --caps IO,Debug --entry main game.ail
```

**Note:** Use `Debug.check` not `Debug.assert` (`assert` is a reserved keyword).

## Quick Tips

- **Before implementation**: Ask "How can I make this easier for next time?"
- **During debugging**: "Could a debug flag save me 10+ min?"
- **For AILANG code**: "Should I add Debug.log/check here?"
- **During testing**: "Will I write this test pattern again?"
- **After error**: "Would this message confuse me tomorrow?"
- **End of milestone**: "What took the most time? Could tooling help?"

## Red Flags (DON'T DO THESE)

- ❌ Optimizing for hypothetical future use (no proven need)
- ❌ Over-engineering simple solutions (YAGNI)
- ❌ DX improvements with negative ROI (<1x)
- ❌ Tooling that only helps humans (AILANG is AI-first)
- ❌ Breaking changes to save small amounts of time

## Green Flags (DO THESE)

- ✅ Fixing pain you just experienced (proven need)
- ✅ Simple solutions with clear ROI (>2x)
- ✅ Tooling that helps AI understanding (determinism, clarity)
- ✅ Error messages with actionable fixes
- ✅ Documentation where you got confused

```

### resources/milestone_checklist.md

```markdown
# Milestone Execution Checklist

## Pre-Implementation
- [ ] Mark milestone as `in_progress` in TodoWrite
- [ ] Review milestone goals and acceptance criteria
- [ ] Identify files to create/modify
- [ ] Verify estimated LOC matches plan

## Implementation
- [ ] Write implementation code
- [ ] Follow design patterns from sprint plan
- [ ] Add inline comments for complex logic
- [ ] Keep functions small and focused (<50 lines)

## Testing
- [ ] Create/update test files (*_test.go)
- [ ] Comprehensive coverage (all acceptance criteria)
- [ ] Include edge cases and error conditions
- [ ] Test both success and failure paths

## Quality Verification
- [ ] Run tests: `make test` - MUST PASS
- [ ] Run linting: `make lint` - MUST PASS
- [ ] Check formatting: `make fmt-check` or run `make fmt`
- [ ] If any fail, fix immediately before proceeding

## Documentation
- [ ] Update CHANGELOG.md with milestone completion:
  - What was implemented
  - LOC counts (implementation + tests)
  - Key design decisions
  - Files modified/created
- [ ] Update sprint plan with completion status (✅)
- [ ] Add metrics (actual LOC vs estimated, time spent)

## Sprint JSON Update (CRITICAL)
**âš ī¸ The checkpoint script will remind you, but DON'T SKIP THIS!**
- [ ] Update `.ailang/state/sprints/sprint_<id>.json`:
  - Set `passes: true` (or `false` if failing)
  - Set `completed: "<ISO timestamp>"`
  - Add `notes: "<what was done>"`
- [ ] If sprint is fully complete, change `status: "completed"`
- File path shown in checkpoint output

## Pause for Breath
- [ ] Show summary of what was completed
- [ ] Show current sprint progress (X of Y milestones done)
- [ ] Show velocity (LOC/day vs planned)
- [ ] Ask user: "Ready to continue to next milestone?"
- [ ] If user says "pause" or "stop", save state and exit gracefully

## Commit (After Quality Checks Pass)
- [ ] Stage files: `git add <files>`
- [ ] Commit with descriptive message
- [ ] Include milestone name and key changes
- [ ] Push if appropriate

## Milestone Complete
- [ ] Mark milestone as `completed` in TodoWrite
- [ ] Move to next milestone or finalize sprint

```

### scripts/session_start.sh

```bash
#!/bin/bash
#
# Session Start Script for Sprint Executor
#
# Purpose: Resume a sprint across multiple Claude Code sessions
# Based on: https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
#
# Usage:
#   .claude/skills/sprint-executor/scripts/session_start.sh <sprint_id>
#
# Example:
#   .claude/skills/sprint-executor/scripts/session_start.sh M-S1
#
# This script implements the "Session Startup Routine" pattern from the Anthropic article:
# 1. Check pwd (working directory)
# 2. Read progress JSON file
# 3. Review git log (recent commits)
# 4. Run tests to verify clean state
# 5. Print "Here's where we left off" summary

set -e  # Exit on error

# Check arguments
if [ $# -ne 1 ]; then
    echo "Usage: $0 <sprint_id>"
    echo "Example: $0 M-S1"
    exit 1
fi

SPRINT_ID="$1"
PROGRESS_FILE=".ailang/state/sprints/sprint_${SPRINT_ID}.json"

# Colors for output
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

echo "═══════════════════════════════════════════════════════════════"
echo " Sprint Continuation Check - ${SPRINT_ID}"
echo "═══════════════════════════════════════════════════════════════"
echo ""

# 0. Sync GitHub issues via ailang messages
echo -e "${BLUE}0. GitHub Sync${NC}"
if command -v ailang &> /dev/null; then
    if ailang messages import-github --labels bug,feature,ailang-message 2>/dev/null; then
        echo -e "   ${GREEN}✓ Synced issues from GitHub${NC}"
    else
        echo -e "   ${YELLOW}⚠ GitHub sync failed (continuing anyway)${NC}"
    fi
else
    echo -e "   ${YELLOW}⚠ ailang not found, skipping GitHub sync${NC}"
fi
echo ""

# 1. Check working directory
echo -e "${BLUE}1. Working Directory${NC}"
echo "   $(pwd)"
echo ""

# 2. Check if progress file exists
if [ ! -f "$PROGRESS_FILE" ]; then
    echo -e "${RED}✗ Progress file not found: $PROGRESS_FILE${NC}"
    echo ""
    echo "Options:"
    echo "  1. If this is a new sprint, run sprint-planner first"
    echo "  2. If continuing an old sprint, check the sprint ID is correct"
    echo "  3. If migrating from markdown-only sprint, see migration notes"
    exit 1
fi

echo -e "${GREEN}✓ Found progress file: $PROGRESS_FILE${NC}"
echo ""

# 3. Load sprint metadata
echo -e "${BLUE}2. Sprint Metadata${NC}"
SPRINT_STATUS=$(jq -r '.status' "$PROGRESS_FILE")
CREATED=$(jq -r '.created' "$PROGRESS_FILE")
LAST_SESSION=$(jq -r '.last_session' "$PROGRESS_FILE")
LAST_CHECKPOINT=$(jq -r '.last_checkpoint // "Not set"' "$PROGRESS_FILE")

echo "   Status: $SPRINT_STATUS"
echo "   Created: $CREATED"
echo "   Last session: $LAST_SESSION"
echo "   Last checkpoint: $LAST_CHECKPOINT"
echo ""

# 3b. Show linked GitHub issues
GITHUB_ISSUES=$(jq -r '.github_issues // [] | @csv' "$PROGRESS_FILE" 2>/dev/null | tr -d '"')
if [ -n "$GITHUB_ISSUES" ] && [ "$GITHUB_ISSUES" != "" ]; then
    echo -e "${BLUE}   Linked GitHub Issues${NC}"
    for issue_num in $(echo "$GITHUB_ISSUES" | tr ',' ' '); do
        if command -v ailang &> /dev/null; then
            # Try to get issue details from local messages
            issue_title=$(ailang messages list --json --limit 100 2>/dev/null | jq -r ".[] | select(.github_issue == $issue_num) | .title" 2>/dev/null | head -1 || echo "")
            if [ -n "$issue_title" ]; then
                echo -e "   ${GREEN}→ #${issue_num}: ${issue_title}${NC}"
            else
                echo -e "   → #${issue_num} (no local message)"
            fi
        else
            echo "   → #${issue_num}"
        fi
    done
    echo "   Commits: 'refs #...' (link only) or 'Fixes #...' (auto-close)"
    echo ""
fi

# 4. Feature progress summary
echo -e "${BLUE}3. Feature Progress${NC}"
TOTAL_FEATURES=$(jq '.features | length' "$PROGRESS_FILE")
COMPLETE_FEATURES=$(jq '[.features[] | select(.passes == true)] | length' "$PROGRESS_FILE")
FAILED_FEATURES=$(jq '[.features[] | select(.passes == false)] | length' "$PROGRESS_FILE")
IN_PROGRESS=$(jq '[.features[] | select(.passes == null and .started != null)] | length' "$PROGRESS_FILE")
NOT_STARTED=$(jq '[.features[] | select(.started == null)] | length' "$PROGRESS_FILE")

echo "   Total: $TOTAL_FEATURES features"
echo -e "   ${GREEN}✓ Complete: $COMPLETE_FEATURES${NC}"
if [ "$FAILED_FEATURES" -gt 0 ]; then
    echo -e "   ${RED}✗ Failed: $FAILED_FEATURES${NC}"
fi
if [ "$IN_PROGRESS" -gt 0 ]; then
    echo -e "   ${YELLOW}âŸŗ In progress: $IN_PROGRESS${NC}"
fi
if [ "$NOT_STARTED" -gt 0 ]; then
    echo "   ○ Not started: $NOT_STARTED"
fi
echo ""

# 5. Show completed features
if [ "$COMPLETE_FEATURES" -gt 0 ]; then
    echo -e "${GREEN}Completed Features:${NC}"
    jq -r '.features[] | select(.passes == true) | "  ✓ \(.id): \(.description) (\(.actual_loc) LOC)"' "$PROGRESS_FILE"
    echo ""
fi

# 6. Show failed features
if [ "$FAILED_FEATURES" -gt 0 ]; then
    echo -e "${RED}Failed Features:${NC}"
    jq -r '.features[] | select(.passes == false) | "  ✗ \(.id): \(.description) - \(.notes // "No notes")"' "$PROGRESS_FILE"
    echo ""
fi

# 7. Show in-progress features
if [ "$IN_PROGRESS" -gt 0 ]; then
    echo -e "${YELLOW}In Progress:${NC}"
    jq -r '.features[] | select(.passes == null and .started != null) | "  âŸŗ \(.id): \(.description)\n    Status: \(.notes // "No notes")"' "$PROGRESS_FILE"
    echo ""
fi

# 8. Show next feature to work on
if [ "$NOT_STARTED" -gt 0 ]; then
    echo -e "${BLUE}Next Feature:${NC}"
    # Find first feature with no dependencies or all dependencies complete
    NEXT_FEATURE=$(jq -r '
        .features[] |
        select(.started == null) |
        select(
            (.dependencies | length) == 0 or
            all(.dependencies[]; . as $dep | any(.features[]; .id == $dep and .passes == true))
        ) |
        "  → \(.id): \(.description) (estimated: \(.estimated_loc) LOC)" |
        @text
    ' "$PROGRESS_FILE" | head -1)

    if [ -n "$NEXT_FEATURE" ]; then
        echo "$NEXT_FEATURE"
    else
        echo "  → Check dependencies - some features may be blocked"
    fi
    echo ""
fi

# 9. Velocity metrics
echo -e "${BLUE}4. Velocity Metrics${NC}"
TARGET_LOC=$(jq -r '.velocity.target_loc_per_day' "$PROGRESS_FILE")
ACTUAL_LOC=$(jq -r '.velocity.actual_loc_per_day' "$PROGRESS_FILE")
ESTIMATED_TOTAL=$(jq -r '.velocity.estimated_total_loc' "$PROGRESS_FILE")
ACTUAL_TOTAL=$(jq -r '.velocity.actual_total_loc' "$PROGRESS_FILE")
ESTIMATED_DAYS=$(jq -r '.velocity.estimated_days' "$PROGRESS_FILE")
ACTUAL_DAYS=$(jq -r '.velocity.actual_days // 0' "$PROGRESS_FILE")

echo "   Target: ${TARGET_LOC} LOC/day"
echo "   Actual: ${ACTUAL_LOC} LOC/day"
echo "   Progress: ${ACTUAL_TOTAL}/${ESTIMATED_TOTAL} LOC ($(echo "scale=1; $ACTUAL_TOTAL * 100 / $ESTIMATED_TOTAL" | bc)%)"
echo "   Days: ${ACTUAL_DAYS}/${ESTIMATED_DAYS}"
echo ""

# 10. Recent git commits
echo -e "${BLUE}5. Recent Work (Last 3 Commits)${NC}"
git log --oneline -3 --color=always | sed 's/^/   /'
echo ""

# 11. Git status
echo -e "${BLUE}6. Working Directory Status${NC}"
if ! git diff-index --quiet HEAD --; then
    echo -e "   ${YELLOW}⚠  Uncommitted changes detected${NC}"
    git status --short | sed 's/^/   /'
else
    echo -e "   ${GREEN}✓ Working directory clean${NC}"
fi
echo ""

# 12. Run tests to verify clean state
echo -e "${BLUE}7. Pre-Session Validation${NC}"
echo "   Running tests..."

if make test 2>&1 | grep -q "FAIL"; then
    echo -e "   ${RED}✗ Tests failing!${NC}"
    echo ""
    echo "âš ī¸  You should fix tests before continuing the sprint."
    echo "   Run 'make test' to see details."
    exit 1
else
    echo -e "   ${GREEN}✓ All tests pass${NC}"
fi
echo ""

# 13. Summary and recommendations
echo "═══════════════════════════════════════════════════════════════"
echo " Summary"
echo "═══════════════════════════════════════════════════════════════"
echo ""

if [ "$SPRINT_STATUS" = "completed" ]; then
    echo -e "${GREEN}🎉 This sprint is complete!${NC}"
    echo ""
    echo "Next steps:"
    echo "  1. Review the completed work"
    echo "  2. Create git tag if this is a release"
    echo "  3. Move design docs to implemented/"
    echo "  4. Start planning next sprint"
elif [ "$SPRINT_STATUS" = "paused" ]; then
    echo -e "${YELLOW}⏸  Sprint is paused${NC}"
    echo ""
    echo "Last checkpoint: $LAST_CHECKPOINT"
    echo ""
    echo "To resume:"
    echo "  1. Review the in-progress feature above"
    echo "  2. Continue implementation from where you left off"
    echo "  3. Update JSON progress file as you complete milestones"
elif [ "$SPRINT_STATUS" = "in_progress" ]; then
    echo -e "${BLUE}â–ļ  Sprint in progress${NC}"
    echo ""
    echo "Current progress: $COMPLETE_FEATURES/$TOTAL_FEATURES features complete"
    echo ""
    if [ "$IN_PROGRESS" -gt 0 ]; then
        echo "Continue working on in-progress features first."
    elif [ "$NOT_STARTED" -gt 0 ]; then
        echo "Start working on the next feature listed above."
    fi
else
    echo -e "${BLUE}Ready to start sprint${NC}"
    echo ""
    echo "Begin with the first feature listed above."
fi

echo ""
echo "Progress file: $PROGRESS_FILE"
echo "═══════════════════════════════════════════════════════════════"

exit 0

```

### scripts/validate_prerequisites.sh

```bash
#!/usr/bin/env bash
# Validate prerequisites before starting sprint execution

set -euo pipefail

echo "Validating sprint prerequisites..."
echo

FAILURES=0

# 0. Sync GitHub issues via ailang messages
echo "0/5 Syncing GitHub issues..."
if command -v ailang &> /dev/null; then
    if ailang messages import-github --labels bug,feature,ailang-message 2>/dev/null; then
        echo "  ✓ Synced issues from GitHub"
    else
        echo "  ⚠ GitHub sync failed (continuing anyway)"
    fi
else
    echo "  ⚠ ailang not found, skipping GitHub sync"
fi
echo

# 1. Check working directory is clean
echo "1/5 Checking working directory..."
if [[ -z $(git status --short) ]]; then
    echo "  ✓ Working directory clean"
else
    echo "  ⚠ Working directory has uncommitted changes:"
    git status --short | head -10
    echo "  Consider committing or stashing before starting sprint"
fi
echo

# 2. Check current branch
echo "2/5 Checking current branch..."
BRANCH=$(git branch --show-current)
if [[ "$BRANCH" == "dev" ]] || [[ "$BRANCH" == "main" ]]; then
    echo "  ✓ On branch: $BRANCH"
else
    echo "  ⚠ On branch: $BRANCH (expected 'dev' or 'main')"
fi
echo

# 3. Run tests
echo "3/5 Running tests..."
if make test > /tmp/sprint_prereq_test.log 2>&1; then
    echo "  ✓ All tests pass"
else
    echo "  ✗ Tests failing"
    echo "  See: /tmp/sprint_prereq_test.log"
    FAILURES=$((FAILURES + 1))
fi
echo

# 4. Run linting
echo "4/5 Running linter..."
if make lint > /tmp/sprint_prereq_lint.log 2>&1; then
    echo "  ✓ Linting passes"
else
    echo "  ✗ Linting fails"
    echo "  See: /tmp/sprint_prereq_lint.log"
    FAILURES=$((FAILURES + 1))
fi
echo

# 5. Show unread messages (potential issues/feedback)
echo "5/5 Checking unread messages..."
if command -v ailang &> /dev/null; then
    UNREAD_COUNT=$(ailang messages list --unread --json 2>/dev/null | jq 'length' 2>/dev/null || echo "0")
    if [[ "$UNREAD_COUNT" -gt 0 ]]; then
        echo "  ⚠ $UNREAD_COUNT unread message(s):"
        ailang messages list --unread --limit 5 2>/dev/null | head -10 | sed 's/^/    /'
        echo "  Consider reviewing before starting sprint"
    else
        echo "  ✓ No unread messages"
    fi
else
    echo "  ⚠ ailang not found, skipping message check"
fi
echo

# Summary
if [[ $FAILURES -eq 0 ]]; then
    echo "✓ All prerequisites validated!"
    echo "Ready to start sprint execution."
    exit 0
else
    echo "✗ $FAILURES prerequisite(s) failed"
    echo "Fix issues before starting sprint."
    exit 1
fi

```

### scripts/validate_sprint_json.sh

```bash
#!/bin/bash
#
# Validate Sprint JSON Progress File
#
# Purpose: Ensure sprint JSON has real milestones before execution begins
# This prevents sprint-executor from starting with placeholder data
#
# Usage:
#   .claude/skills/sprint-executor/scripts/validate_sprint_json.sh <sprint_id>
#
# Exit codes:
#   0 - Valid JSON with real milestones
#   1 - Invalid JSON or placeholder data detected

set -e

# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[0;33m'
NC='\033[0m'

# Check arguments
if [ $# -lt 1 ]; then
    echo "Usage: $0 <sprint_id>"
    echo "Example: $0 M-IMPORT-ALIASING"
    exit 1
fi

SPRINT_ID="$1"
PROGRESS_FILE=".ailang/state/sprints/sprint_${SPRINT_ID}.json"

echo "═══════════════════════════════════════════════════════════════"
echo " Validating Sprint JSON: ${SPRINT_ID}"
echo "═══════════════════════════════════════════════════════════════"
echo ""

# Check file exists
if [ ! -f "$PROGRESS_FILE" ]; then
    echo -e "${RED}ERROR: Sprint JSON not found: ${PROGRESS_FILE}${NC}"
    echo ""
    echo "sprint-planner must create the JSON file first:"
    echo "  .claude/skills/sprint-planner/scripts/create_sprint_json.sh ${SPRINT_ID} <plan.md> <design.md>"
    exit 1
fi

echo "File: ${PROGRESS_FILE}"
echo ""

ERRORS=0

# Validate JSON syntax
if ! jq -e . "$PROGRESS_FILE" >/dev/null 2>&1; then
    echo -e "${RED}ERROR: Invalid JSON syntax${NC}"
    exit 1
fi
echo -e "${GREEN}✓ JSON syntax valid${NC}"

# Check for placeholder milestone ID
if jq -e '.features[] | select(.id == "MILESTONE_ID")' "$PROGRESS_FILE" >/dev/null 2>&1; then
    echo -e "${RED}ERROR: Found placeholder milestone ID 'MILESTONE_ID'${NC}"
    echo "  sprint-planner must replace placeholders with real milestone IDs"
    ERRORS=$((ERRORS + 1))
else
    echo -e "${GREEN}✓ No placeholder milestone IDs${NC}"
fi

# Check for placeholder acceptance criteria
if jq -e '.features[].acceptance_criteria[] | select(. == "Criterion 1" or . == "Criterion 2")' "$PROGRESS_FILE" >/dev/null 2>&1; then
    echo -e "${RED}ERROR: Found placeholder acceptance criteria${NC}"
    echo "  sprint-planner must replace 'Criterion 1/2' with real criteria"
    ERRORS=$((ERRORS + 1))
else
    echo -e "${GREEN}✓ No placeholder acceptance criteria${NC}"
fi

# Check minimum milestone count
MILESTONE_COUNT=$(jq '.features | length' "$PROGRESS_FILE")
if [ "$MILESTONE_COUNT" -lt 2 ]; then
    echo -e "${RED}ERROR: Only ${MILESTONE_COUNT} milestone(s) defined (minimum: 2)${NC}"
    echo "  Most sprints should have 2+ milestones"
    ERRORS=$((ERRORS + 1))
else
    echo -e "${GREEN}✓ ${MILESTONE_COUNT} milestones defined${NC}"
fi

# Check estimated LOC is reasonable
TOTAL_LOC=$(jq '.velocity.estimated_total_loc' "$PROGRESS_FILE")
if [ "$TOTAL_LOC" -eq 1000 ]; then
    echo -e "${YELLOW}WARNING: estimated_total_loc is default value (1000)${NC}"
    echo "  Consider updating to match actual sprint plan estimates"
fi

# Check for GitHub issues (informational, not error)
GITHUB_ISSUES=$(jq -r '.github_issues // [] | length' "$PROGRESS_FILE")
if [ "$GITHUB_ISSUES" -gt 0 ]; then
    echo -e "${GREEN}✓ ${GITHUB_ISSUES} GitHub issue(s) linked:${NC}"
    jq -r '.github_issues[]' "$PROGRESS_FILE" | while read issue; do
        echo "    #$issue"
    done
    echo "  Commits will include Refs #... in messages"
else
    echo -e "${YELLOW}â„šī¸  No GitHub issues linked to this sprint${NC}"
    echo "  To link issues: jq '.github_issues = [123]' $PROGRESS_FILE > tmp && mv tmp $PROGRESS_FILE"
fi

# Check each milestone has required fields
# Note: estimated_loc == 0 is the placeholder from create_sprint_json.sh (not 200 - that's a valid real estimate)
INCOMPLETE_MILESTONES=$(jq -r '.features[] | select(.description == "Milestone description" or .estimated_loc == 0) | .id' "$PROGRESS_FILE")
if [ -n "$INCOMPLETE_MILESTONES" ]; then
    echo -e "${RED}ERROR: Milestones with default/placeholder values:${NC}"
    echo "$INCOMPLETE_MILESTONES"
    ERRORS=$((ERRORS + 1))
else
    echo -e "${GREEN}✓ All milestones have custom values${NC}"
fi

# Check dependencies are valid (reference existing milestone IDs)
ALL_IDS=$(jq -r '.features[].id' "$PROGRESS_FILE" | sort | uniq)
INVALID_DEPS=$(jq -r '.features[].dependencies[]' "$PROGRESS_FILE" 2>/dev/null | while read dep; do
    if ! echo "$ALL_IDS" | grep -q "^${dep}$"; then
        echo "$dep"
    fi
done)
if [ -n "$INVALID_DEPS" ]; then
    echo -e "${YELLOW}WARNING: Dependencies reference unknown milestone IDs:${NC}"
    echo "$INVALID_DEPS"
fi

echo ""
echo "═══════════════════════════════════════════════════════════════"

if [ $ERRORS -gt 0 ]; then
    echo -e "${RED}VALIDATION FAILED: ${ERRORS} error(s) found${NC}"
    echo ""
    echo "sprint-planner must populate the JSON with real milestone data"
    echo "before sprint-executor can begin."
    echo ""
    echo "See: .claude/skills/sprint-planner/SKILL.md section 8"
    echo "═══════════════════════════════════════════════════════════════"
    exit 1
else
    echo -e "${GREEN}VALIDATION PASSED: Sprint JSON is ready for execution${NC}"
    echo "═══════════════════════════════════════════════════════════════"
    exit 0
fi

```

### scripts/milestone_checkpoint.sh

```bash
#!/usr/bin/env bash
# Run checkpoint after completing a milestone
#
# CRITICAL: This script enforces VERIFICATION, not just compilation.
# Tests passing != Feature working. You MUST verify with real data.
#
# Usage:
#   milestone_checkpoint.sh <milestone_name> [sprint_id]
#
# Example:
#   milestone_checkpoint.sh M1 M-TASK-HIERARCHY
#
# The script will:
# 1. Run tests and linting (basic hygiene)
# 2. Run milestone-specific verification if defined
# 3. Remind you to manually verify with real data
# 4. FAIL LOUDLY if verification commands fail

set -euo pipefail

MILESTONE_NAME="${1:-Unknown Milestone}"
SPRINT_ID="${2:-}"
SPRINT_DIR=".ailang/state/sprints"

echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "MILESTONE CHECKPOINT: $MILESTONE_NAME"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo
echo "âš ī¸  REMINDER: Tests passing ≠ Feature working"
echo "   You MUST verify with real data before marking complete!"
echo

FAILURES=0
WARNINGS=0

# 0. Find current sprint JSON
CURRENT_SPRINT=""
if [[ -d "$SPRINT_DIR" ]]; then
    # If SPRINT_ID was passed as argument, use that directly
    if [[ -n "$SPRINT_ID" && -f "$SPRINT_DIR/sprint_$SPRINT_ID.json" ]]; then
        CURRENT_SPRINT="$SPRINT_DIR/sprint_$SPRINT_ID.json"
    else
        # Find in_progress or ready sprint
        for f in "$SPRINT_DIR"/sprint_*.json; do
            if [[ -f "$f" ]]; then
                STATUS=$(grep -o '"status": "[^"]*"' "$f" 2>/dev/null | head -1 | cut -d'"' -f4 || echo "")
                if [[ "$STATUS" == "in_progress" || "$STATUS" == "ready" ]]; then
                    CURRENT_SPRINT="$f"
                    SPRINT_ID=$(basename "$f" .json | sed 's/sprint_//')
                    break
                fi
            fi
        done
    fi
fi

# 1. Run tests
echo "1/4 Running tests..."
if make test > /tmp/milestone_test.log 2>&1; then
    echo "  ✓ Tests pass"
else
    echo "  ✗ Tests fail"
    echo "  See: /tmp/milestone_test.log"
    echo "  FIX BEFORE PROCEEDING!"
    FAILURES=$((FAILURES + 1))
fi
echo

# 2. Run linting
echo "2/4 Running linter..."
if make lint > /tmp/milestone_lint.log 2>&1; then
    echo "  ✓ Linting passes"
else
    echo "  ✗ Linting fails"
    echo "  See: /tmp/milestone_lint.log"
    echo "  FIX BEFORE PROCEEDING!"
    FAILURES=$((FAILURES + 1))
fi
echo

# 3. Show files changed
echo "3/4 Files changed in this milestone..."
git diff --stat HEAD | tail -10 || echo "No changes yet"
echo

# 4. File size warnings (AI-friendly codebase - keep files <800 lines)
echo "4/4 Checking file sizes..."
LARGE_FILES=0
for file in $(git diff --name-only --diff-filter=AM HEAD 2>/dev/null || echo ""); do
    if [[ -f "$file" && "$file" == *.go ]]; then
        lines=$(wc -l < "$file" 2>/dev/null || echo "0")
        if [[ $lines -gt 800 ]]; then
            echo "  ❌ $file: $lines lines (CRITICAL: >800, must split!)"
            LARGE_FILES=$((LARGE_FILES + 1))
        elif [[ $lines -gt 500 ]]; then
            echo "  âš ī¸  $file: $lines lines (warning: consider splitting if >800)"
        elif [[ $lines -gt 300 ]]; then
            echo "  📝 $file: $lines lines (healthy)"
        fi
    fi
done

if [[ $LARGE_FILES -gt 0 ]]; then
    echo "  âš ī¸  $LARGE_FILES file(s) exceed 800 lines (see codebase-health guidelines)"
    echo "  Consider splitting before adding more features"
else
    echo "  ✓ All files within size guidelines"
fi
echo

# 5. CRITICAL: Data Verification
echo "5/5 Running data verification..."
echo "    (Tests passing ≠ Feature working - verify real data!)"
echo

# M-TASK-HIERARCHY specific verification
if [[ "$SPRINT_ID" == "M-TASK-HIERARCHY" ]] || [[ -n "$CURRENT_SPRINT" && "$CURRENT_SPRINT" == *"M-TASK-HIERARCHY"* ]]; then
    echo "  📊 M-TASK-HIERARCHY Sprint Verification"
    echo

    case "$MILESTONE_NAME" in
        M1|M1:*|*Entity*Sync*)
            echo "  Verifying M1: Entity Sync..."

            # Check if observatory_sync.go exists
            if [[ -f "internal/coordinator/observatory_sync.go" ]]; then
                echo "    ✓ observatory_sync.go exists"
            else
                echo "    ✗ FAIL: observatory_sync.go not found!"
                FAILURES=$((FAILURES + 1))
            fi

            # Check if there are tasks in observatory DB
            if command -v sqlite3 &> /dev/null; then
                TASK_COUNT=$(sqlite3 ~/.ailang/state/observatory.db "SELECT COUNT(*) FROM tasks" 2>/dev/null || echo "0")
                if [[ "$TASK_COUNT" -gt 0 ]]; then
                    echo "    ✓ Observatory has $TASK_COUNT task(s)"
                    SAMPLE_TASK=$(sqlite3 ~/.ailang/state/observatory.db "SELECT id FROM tasks LIMIT 1" 2>/dev/null || echo "")
                    if [[ -n "$SAMPLE_TASK" && "$SAMPLE_TASK" != "" ]]; then
                        echo "    ✓ Sample task ID: $SAMPLE_TASK (non-empty)"
                    else
                        echo "    ✗ FAIL: Task IDs are empty! Sync not working."
                        FAILURES=$((FAILURES + 1))
                    fi
                else
                    echo "    âš ī¸  WARNING: No tasks in Observatory DB"
                    echo "       Run a coordinator task first to verify sync"
                    WARNINGS=$((WARNINGS + 1))
                fi
            else
                echo "    âš ī¸  sqlite3 not available - manual verification required"
                WARNINGS=$((WARNINGS + 1))
            fi
            ;;

        M2|M2:*|*Context*Propagation*)
            echo "  Verifying M2: Context Propagation..."

            # Check if OTEL_RESOURCE_ATTRIBUTES is set in executor code
            if grep -q "OTEL_RESOURCE_ATTRIBUTES" internal/executor/claude/claude.go 2>/dev/null; then
                echo "    ✓ OTEL_RESOURCE_ATTRIBUTES found in claude executor"
            else
                echo "    ✗ FAIL: OTEL_RESOURCE_ATTRIBUTES not in claude executor!"
                echo "       Context is NOT being propagated to executors."
                FAILURES=$((FAILURES + 1))
            fi

            # Check if ailang.task_id is being added to attributes
            if grep -q "ailang.task_id" internal/executor/claude/claude.go 2>/dev/null; then
                echo "    ✓ ailang.task_id attribute found in claude executor"
            else
                echo "    ✗ FAIL: ailang.task_id not added to OTEL attributes!"
                FAILURES=$((FAILURES + 1))
            fi

            # Check for ailang.task_id in span attributes
            if command -v sqlite3 &> /dev/null; then
                SPANS_WITH_TASK=$(sqlite3 ~/.ailang/state/observatory.db \
                    "SELECT COUNT(*) FROM spans WHERE resource_attributes LIKE '%ailang.task_id%'" 2>/dev/null || echo "0")
                if [[ "$SPANS_WITH_TASK" -gt 0 ]]; then
                    echo "    ✓ Found $SPANS_WITH_TASK spans with ailang.task_id"
                else
                    echo "    âš ī¸  WARNING: No spans have ailang.task_id attribute yet"
                    echo "       Run a task and check spans after execution"
                    WARNINGS=$((WARNINGS + 1))
                fi
            fi
            ;;

        M3|M3:*|*OTLP*|*Extraction*)
            echo "  Verifying M3: OTLP Extraction..."

            # Check if OTLP receiver code extracts ailang.task_id
            if grep -q 'extractString.*ailang.task_id' internal/observatory/otlp_receiver.go 2>/dev/null; then
                echo "    ✓ OTLP receiver extracts ailang.task_id from resource attrs"
            else
                echo "    ✗ FAIL: OTLP receiver doesn't extract ailang.task_id!"
                FAILURES=$((FAILURES + 1))
            fi

            # Check if Span struct has TaskID field that gets set
            if grep -q 'TaskID.*taskID' internal/observatory/otlp_receiver.go 2>/dev/null; then
                echo "    ✓ TaskID is set on spans"
            else
                echo "    ✗ FAIL: TaskID not being set on spans"
                FAILURES=$((FAILURES + 1))
            fi

            # Check database for linked spans (informational)
            if command -v sqlite3 &> /dev/null; then
                LINKED_SPANS=$(sqlite3 ~/.ailang/state/observatory.db \
                    "SELECT COUNT(*) FROM spans WHERE task_id IS NOT NULL AND task_id != ''" 2>/dev/null || echo "0")
                if [[ "$LINKED_SPANS" -gt 0 ]]; then
                    echo "    ✓ Found $LINKED_SPANS spans already linked to tasks"
                else
                    echo "    âš ī¸  WARNING: No spans linked yet (run a task to generate data)"
                    WARNINGS=$((WARNINGS + 1))
                fi
            fi
            ;;

        M4|M4:*|*Aggregation*)
            echo "  Verifying M4: Aggregations..."

            # Check if aggregation update code exists in store
            if grep -q 'updateTaskAggregatesFromSpan' internal/observatory/store.go 2>/dev/null; then
                echo "    ✓ updateTaskAggregatesFromSpan function exists"
            else
                echo "    ✗ FAIL: updateTaskAggregatesFromSpan not found!"
                FAILURES=$((FAILURES + 1))
            fi

            # Check if CreateSpan calls aggregation update
            if grep -q 'updateTaskAggregatesFromSpan.*span' internal/observatory/store.go 2>/dev/null; then
                echo "    ✓ CreateSpan calls aggregation update"
            else
                echo "    ✗ FAIL: CreateSpan doesn't call aggregation update!"
                FAILURES=$((FAILURES + 1))
            fi

            # Check database for tasks with stats (informational)
            if command -v sqlite3 &> /dev/null; then
                TASKS_WITH_STATS=$(sqlite3 ~/.ailang/state/observatory.db \
                    "SELECT COUNT(*) FROM tasks WHERE span_count > 0 OR total_tokens_in > 0" 2>/dev/null || echo "0")
                if [[ "$TASKS_WITH_STATS" -gt 0 ]]; then
                    echo "    ✓ Found $TASKS_WITH_STATS tasks with aggregated stats"
                else
                    echo "    âš ī¸  WARNING: No tasks have stats yet (run a task with linked spans)"
                    WARNINGS=$((WARNINGS + 1))
                fi
            fi
            ;;

        M5|M5:*|*Hierarchy*API*)
            echo "  Verifying M5: Hierarchy API..."

            # Check if server is running and hierarchy endpoint works
            if curl -s http://localhost:1957/health > /dev/null 2>&1; then
                # Get a task ID
                TASK_ID=$(curl -s http://localhost:1957/api/observatory/tasks 2>/dev/null | jq -r '.[0].id // empty' 2>/dev/null || echo "")
                if [[ -n "$TASK_ID" ]]; then
                    HIERARCHY=$(curl -s "http://localhost:1957/api/observatory/tasks/$TASK_ID/hierarchy" 2>/dev/null)
                    if [[ -n "$HIERARCHY" && "$HIERARCHY" != "null" && "$HIERARCHY" != "{}" ]]; then
                        echo "    ✓ Hierarchy API returns data for task $TASK_ID"
                    else
                        echo "    ✗ FAIL: Hierarchy API returns empty for task $TASK_ID"
                        FAILURES=$((FAILURES + 1))
                    fi
                else
                    echo "    âš ī¸  WARNING: No tasks found to test hierarchy API"
                    WARNINGS=$((WARNINGS + 1))
                fi
            else
                echo "    âš ī¸  WARNING: Server not running - start with 'ailang serve'"
                WARNINGS=$((WARNINGS + 1))
            fi
            ;;

        M6|M6:*|*UI*)
            echo "  Verifying M6: UI Hierarchy View..."

            # Check if TaskHierarchy component exists
            if [[ -f "ui/src/features/observatory/TaskHierarchy.tsx" ]]; then
                echo "    ✓ TaskHierarchy.tsx exists"
            else
                echo "    ✗ FAIL: TaskHierarchy.tsx not found!"
                FAILURES=$((FAILURES + 1))
            fi

            # Check if TaskHierarchy is imported and used in Observatory
            if grep -q "TaskHierarchy" ui/src/features/observatory/Observatory.tsx 2>/dev/null; then
                echo "    ✓ TaskHierarchy imported in Observatory.tsx"
            else
                echo "    ✗ FAIL: TaskHierarchy not imported in Observatory.tsx!"
                echo "       Component exists but is not connected to main view"
                FAILURES=$((FAILURES + 1))
            fi

            # Check for actual tab navigation (button or tab that switches to tasks view)
            if grep -qE "setActiveView.*['\"]tasks['\"]|activeView.*===.*['\"]tasks['\"]|onClick.*tasks" ui/src/features/observatory/Observatory.tsx 2>/dev/null; then
                echo "    ✓ Tasks tab navigation found"
            else
                echo "    ✗ FAIL: No Tasks tab navigation in Observatory!"
                echo "       Users cannot switch to TaskHierarchyView"
                FAILURES=$((FAILURES + 1))
            fi
            ;;

        M7|M7:*|*Backfill*)
            echo "  Verifying M7: Backfill Tool..."

            # Check if backfill command exists
            if [[ -f "cmd/ailang/observatory_backfill.go" ]]; then
                echo "    ✓ observatory_backfill.go exists"
            else
                echo "    ✗ FAIL: observatory_backfill.go not found!"
                FAILURES=$((FAILURES + 1))
            fi

            # Test --help to verify command is registered
            if ./bin/ailang observatory backfill --help > /dev/null 2>&1; then
                echo "    ✓ 'ailang observatory backfill' command works"
            else
                echo "    ✗ FAIL: 'ailang observatory backfill' command not working"
                FAILURES=$((FAILURES + 1))
            fi
            ;;

        *)
            echo "    No specific verification defined for milestone: $MILESTONE_NAME"
            echo "    âš ī¸  MANUAL VERIFICATION REQUIRED"
            echo "       Check the sprint plan for verification commands"
            WARNINGS=$((WARNINGS + 1))
            ;;
    esac
else
    echo "  No sprint-specific verification defined."
    echo "  âš ī¸  MANUAL VERIFICATION REQUIRED"
    echo "     Ensure you test with REAL DATA before marking complete!"
    WARNINGS=$((WARNINGS + 1))
fi
echo

# Summary
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
if [[ $FAILURES -eq 0 ]]; then
    if [[ $WARNINGS -gt 0 ]]; then
        echo "âš ī¸  Checkpoint passed with $WARNINGS warning(s)"
        echo "   Review warnings above - some manual verification may be needed"
    else
        echo "✓ Milestone checkpoint PASSED!"
    fi
    echo

    # Sprint JSON reminder
    if [[ -n "$CURRENT_SPRINT" ]]; then
        echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
        echo "📋 SPRINT JSON UPDATE REQUIRED"
        echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
        echo
        echo "Sprint: $SPRINT_ID"
        echo "File:   $CURRENT_SPRINT"
        echo
        # Check for linked GitHub issues
        if command -v jq &> /dev/null; then
            GITHUB_ISSUES=$(jq -r '.github_issues // [] | map("#" + tostring) | join(", ")' "$CURRENT_SPRINT" 2>/dev/null || echo "")
            if [[ -n "$GITHUB_ISSUES" ]]; then
                echo "🔗 Linked GitHub issues: $GITHUB_ISSUES"
                echo "   Include 'Refs $GITHUB_ISSUES' in commit messages"
                echo
            fi
        fi
        echo "Current milestone status:"
        # Show milestone statuses from JSON
        if command -v jq &> /dev/null; then
            jq -r '.features[] | "  â€ĸ \(.id): passes=\(.passes // "null"), completed=\(.completed // "null")"' "$CURRENT_SPRINT" 2>/dev/null || echo "  (could not parse JSON)"
        else
            grep -E '"id"|"passes"|"completed"' "$CURRENT_SPRINT" | head -20 || echo "  (jq not available)"
        fi
        echo
        echo "âš ī¸  After completing $MILESTONE_NAME:"
        echo "   1. Update passes: true/false in the JSON"
        echo "   2. Set completed: \"$(date -u +%Y-%m-%dT%H:%M:%SZ)\""
        echo "   3. Add notes about what was done"
        if command -v jq &> /dev/null; then
            GITHUB_ISSUES=$(jq -r '.github_issues // [] | map("#" + tostring) | join(", ")' "$CURRENT_SPRINT" 2>/dev/null || echo "")
            if [[ -n "$GITHUB_ISSUES" ]]; then
                echo "   4. Include 'Refs $GITHUB_ISSUES' in commit message"
            fi
        fi
        echo
        echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
    fi
    echo
    echo "Ready to proceed to next milestone."
    exit 0
else
    echo "✗ CHECKPOINT FAILED: $FAILURES verification(s) failed"
    echo
    echo "❌ DO NOT mark this milestone as complete!"
    echo "   Fix the issues above first."
    echo
    echo "Common fixes:"
    echo "  - Run actual tasks to generate real data"
    echo "  - Check that code changes are actually deployed"
    echo "  - Verify with manual commands (curl, sqlite3)"
    echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
    exit 1
fi

```

### scripts/acceptance_test.sh

```bash
#!/bin/bash
#
# Acceptance Testing Script
#
# Purpose: Run end-to-end tests for a sprint milestone
# Based on: https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
#
# Usage:
#   .claude/skills/sprint-executor/scripts/acceptance_test.sh <milestone_id> <test_type>
#
# Example:
#   # Parser sprint - run example files
#   .claude/skills/sprint-executor/scripts/acceptance_test.sh M-S1.1 parser
#
#   # Builtin sprint - test in REPL
#   .claude/skills/sprint-executor/scripts/acceptance_test.sh M-DX1.2 builtin
#
#   # General feature - run specific examples
#   .claude/skills/sprint-executor/scripts/acceptance_test.sh M-POLY-A.3 examples
#
# Test Types:
#   parser    - Run example AILANG files through parser
#   builtin   - Test builtin functions in REPL or test harness
#   examples  - Run specific example files for the feature
#   repl      - Interactive REPL testing (manual)
#   e2e       - Full end-to-end workflow test
#
# This implements the "testing as user would" pattern from the Anthropic article.
# Unit tests are great, but end-to-end tests catch integration issues.

set -e  # Exit on error

# Check arguments
if [ $# -lt 2 ]; then
    echo "Usage: $0 <milestone_id> <test_type>"
    echo ""
    echo "Test types:"
    echo "  parser    - Run example files through parser"
    echo "  builtin   - Test builtin functions"
    echo "  examples  - Run specific example files"
    echo "  repl      - Interactive REPL testing"
    echo "  e2e       - Full end-to-end workflow"
    echo ""
    echo "Example:"
    echo "  $0 M-S1.1 parser"
    exit 1
fi

MILESTONE_ID="$1"
TEST_TYPE="$2"

# Colors for output
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

echo "═══════════════════════════════════════════════════════════════"
echo " Acceptance Tests - ${MILESTONE_ID}"
echo "═══════════════════════════════════════════════════════════════"
echo ""
echo "Test type: $TEST_TYPE"
echo "Milestone: $MILESTONE_ID"
echo ""

# Ensure ailang binary is up to date
echo -e "${BLUE}Ensuring ailang binary is current...${NC}"
if ! make quick-install > /dev/null 2>&1; then
    echo -e "${YELLOW}⚠  Failed to rebuild ailang, using existing binary${NC}"
fi
echo ""

# Test based on type
case "$TEST_TYPE" in
    parser)
        echo -e "${BLUE}Running parser acceptance tests...${NC}"
        echo ""

        # Test that example files parse correctly
        echo "Testing example files..."
        FAILED=0
        PASSED=0

        for file in examples/*.ail; do
            if [[ "$file" == *"broken"* ]] || [[ "$file" == *"wip"* ]]; then
                echo -e "  ${YELLOW}⊘ Skipping: $(basename "$file")${NC}"
                continue
            fi

            if ailang run "$file" > /dev/null 2>&1; then
                echo -e "  ${GREEN}✓ $(basename "$file")${NC}"
                ((PASSED++))
            else
                echo -e "  ${RED}✗ $(basename "$file")${NC}"
                ((FAILED++))
            fi
        done

        echo ""
        if [ $FAILED -eq 0 ]; then
            echo -e "${GREEN}✓ All parser tests passed ($PASSED files)${NC}"
            exit 0
        else
            echo -e "${RED}✗ Some parser tests failed ($FAILED/$((PASSED + FAILED)))${NC}"
            exit 1
        fi
        ;;

    builtin)
        echo -e "${BLUE}Running builtin acceptance tests...${NC}"
        echo ""

        # Check if there are specific test files for this milestone
        # This assumes builtins have test files in tests/ or examples/
        echo "Testing builtin functions..."

        # Run unit tests for builtins package
        if make test | grep -q "internal/builtins"; then
            echo -e "${GREEN}✓ Builtin unit tests pass${NC}"
        else
            echo -e "${RED}✗ Builtin unit tests failed${NC}"
            exit 1
        fi

        # Check if there are example files demonstrating the builtin
        # Format: examples/builtin_<name>.ail
        BUILTIN_EXAMPLES=$(find examples -name "builtin_*.ail" 2>/dev/null | wc -l)
        if [ "$BUILTIN_EXAMPLES" -gt 0 ]; then
            echo ""
            echo "Testing builtin example files..."
            for file in examples/builtin_*.ail; do
                if ailang run "$file" > /dev/null 2>&1; then
                    echo -e "  ${GREEN}✓ $(basename "$file")${NC}"
                else
                    echo -e "  ${RED}✗ $(basename "$file")${NC}"
                    exit 1
                fi
            done
        fi

        echo ""
        echo -e "${GREEN}✓ Builtin acceptance tests passed${NC}"
        ;;

    examples)
        echo -e "${BLUE}Running example file tests...${NC}"
        echo ""

        # Ask user which examples to test
        echo "Example files in examples/:"
        ls -1 examples/*.ail | head -10
        echo ""

        read -p "Enter example file pattern (e.g., 'list_*.ail' or specific file): " PATTERN

        echo ""
        echo "Testing examples matching: $PATTERN"
        FAILED=0
        PASSED=0

        for file in examples/${PATTERN}; do
            if [ ! -f "$file" ]; then
                echo -e "${RED}✗ File not found: $file${NC}"
                continue
            fi

            echo "Running: $(basename "$file")"
            if ailang run "$file"; then
                echo -e "${GREEN}✓ $(basename "$file")${NC}"
                ((PASSED++))
            else
                echo -e "${RED}✗ $(basename "$file") failed${NC}"
                ((FAILED++))
            fi
            echo ""
        done

        if [ $FAILED -eq 0 ] && [ $PASSED -gt 0 ]; then
            echo -e "${GREEN}✓ All example tests passed ($PASSED files)${NC}"
            exit 0
        else
            echo -e "${RED}✗ Some tests failed or no files matched${NC}"
            exit 1
        fi
        ;;

    repl)
        echo -e "${BLUE}Manual REPL testing...${NC}"
        echo ""
        echo "Instructions:"
        echo "1. Test the feature interactively in the REPL"
        echo "2. Try edge cases and error conditions"
        echo "3. Verify error messages are helpful"
        echo "4. Exit when done (Ctrl+D)"
        echo ""
        echo "Starting REPL..."
        echo ""

        ailang repl

        echo ""
        read -p "Did the feature work as expected? (y/n): " -n 1 -r
        echo
        if [[ $REPLY =~ ^[Yy]$ ]]; then
            echo -e "${GREEN}✓ REPL testing passed${NC}"
            exit 0
        else
            echo -e "${RED}✗ REPL testing failed${NC}"
            exit 1
        fi
        ;;

    e2e)
        echo -e "${BLUE}Running end-to-end workflow test...${NC}"
        echo ""

        # Full workflow: parse → elaborate → type check → evaluate
        echo "Testing full pipeline..."

        # Create a test file
        TEST_FILE="/tmp/ailang_e2e_test.ail"
        cat > "$TEST_FILE" << 'EOF'
module e2e_test

def add(x: int, y: int): int = x + y

def main(): int = {
    let result = add(40, 2);
    result
}
EOF

        echo "Test file created: $TEST_FILE"
        echo ""

        # Run with different flags to test pipeline stages
        echo "1. Parsing..."
        if ailang run "$TEST_FILE" > /dev/null 2>&1; then
            echo -e "   ${GREEN}✓ Parse successful${NC}"
        else
            echo -e "   ${RED}✗ Parse failed${NC}"
            exit 1
        fi

        echo "2. Type checking..."
        if ailang run "$TEST_FILE" > /dev/null 2>&1; then
            echo -e "   ${GREEN}✓ Type check successful${NC}"
        else
            echo -e "   ${RED}✗ Type check failed${NC}"
            exit 1
        fi

        echo "3. Execution..."
        OUTPUT=$(ailang run --entry main --caps IO "$TEST_FILE" 2>&1)
        if echo "$OUTPUT" | grep -q "42"; then
            echo -e "   ${GREEN}✓ Execution successful (output: 42)${NC}"
        else
            echo -e "   ${RED}✗ Execution failed or wrong output${NC}"
            echo "   Output: $OUTPUT"
            exit 1
        fi

        echo ""
        echo -e "${GREEN}✓ End-to-end test passed${NC}"

        # Cleanup
        rm "$TEST_FILE"
        ;;

    *)
        echo -e "${RED}Unknown test type: $TEST_TYPE${NC}"
        echo ""
        echo "Valid types: parser, builtin, examples, repl, e2e"
        exit 1
        ;;
esac

echo ""
echo "═══════════════════════════════════════════════════════════════"
echo " Acceptance Tests Complete"
echo "═══════════════════════════════════════════════════════════════"

exit 0

```

### scripts/finalize_sprint.sh

```bash
#!/bin/bash
# finalize_sprint.sh - Finalize a completed sprint
# Moves design docs from planned/ to implemented/<version>/
# Updates sprint JSON status to "completed"

set -e

SPRINT_ID="${1:-}"
TARGET_VERSION="${2:-}"

if [ -z "$SPRINT_ID" ]; then
    echo "Usage: $0 <sprint-id> [target-version]"
    echo ""
    echo "Example: $0 M-BUG-RECORD-UPDATE-INFERENCE v0_4_9"
    echo ""
    echo "If target-version is not specified, will try to extract from design doc path"
    exit 1
fi

SPRINT_FILE=".ailang/state/sprints/sprint_${SPRINT_ID}.json"

if [ ! -f "$SPRINT_FILE" ]; then
    echo "Error: Sprint file not found: $SPRINT_FILE"
    exit 1
fi

echo "═══════════════════════════════════════════════════════════════"
echo " Finalizing Sprint: $SPRINT_ID"
echo "═══════════════════════════════════════════════════════════════"
echo ""

# Check if all milestones pass
ALL_PASS=$(jq -r '.features | map(select(.passes != true)) | length' "$SPRINT_FILE")
if [ "$ALL_PASS" != "0" ]; then
    echo "Warning: Not all milestones have passes=true"
    echo "Milestones with passes != true:"
    jq -r '.features[] | select(.passes != true) | "  - \(.id): passes=\(.passes)"' "$SPRINT_FILE"
    echo ""
    read -p "Continue anyway? (y/N) " -n 1 -r
    echo
    if [[ ! $REPLY =~ ^[Yy]$ ]]; then
        echo "Aborted."
        exit 1
    fi
fi

# Get design doc path from sprint JSON
DESIGN_DOC=$(jq -r '.design_doc // empty' "$SPRINT_FILE")
SPRINT_PLAN=$(jq -r '.sprint_plan // empty' "$SPRINT_FILE")

if [ -z "$DESIGN_DOC" ]; then
    echo "Warning: No design_doc field in sprint JSON"
    DESIGN_DOC=""
fi

# Determine target version
if [ -z "$TARGET_VERSION" ]; then
    # Try to extract from design doc path (e.g., design_docs/planned/v0_4_9/...)
    TARGET_VERSION=$(echo "$DESIGN_DOC" | grep -oE 'v[0-9]+_[0-9]+(_[0-9]+)?' | head -1)
    if [ -z "$TARGET_VERSION" ]; then
        echo "Error: Could not determine target version from design doc path"
        echo "Please specify target version as second argument"
        exit 1
    fi
fi

echo "Target version: $TARGET_VERSION"
echo ""

# Create implemented directory if needed
IMPL_DIR="design_docs/implemented/$TARGET_VERSION"
mkdir -p "$IMPL_DIR"

# Move design doc if it exists in planned/
MOVED_FILES=()

if [ -n "$DESIGN_DOC" ] && [ -f "$DESIGN_DOC" ]; then
    BASENAME=$(basename "$DESIGN_DOC")
    DEST="$IMPL_DIR/$BASENAME"

    if [ -f "$DEST" ]; then
        echo "Design doc already exists at: $DEST"
    else
        echo "Moving design doc:"
        echo "  From: $DESIGN_DOC"
        echo "  To:   $DEST"
        mv "$DESIGN_DOC" "$DEST"
        MOVED_FILES+=("$DESIGN_DOC -> $DEST")

        # Update design doc status to IMPLEMENTED
        if grep -q "^\\*\\*Status\\*\\*:" "$DEST"; then
            sed -i '' 's/^\*\*Status\*\*:.*$/\*\*Status\*\*: IMPLEMENTED/' "$DEST"
            echo "  Updated status to IMPLEMENTED"
        fi
    fi
fi

# Move sprint plan if it exists in planned/
if [ -n "$SPRINT_PLAN" ] && [ -f "$SPRINT_PLAN" ]; then
    BASENAME=$(basename "$SPRINT_PLAN")
    DEST="$IMPL_DIR/$BASENAME"

    if [ -f "$DEST" ]; then
        echo "Sprint plan already exists at: $DEST"
    else
        echo "Moving sprint plan:"
        echo "  From: $SPRINT_PLAN"
        echo "  To:   $DEST"
        mv "$SPRINT_PLAN" "$DEST"
        MOVED_FILES+=("$SPRINT_PLAN -> $DEST")
    fi
fi

# Update sprint JSON status to completed
echo ""
echo "Updating sprint JSON status to 'completed'..."
TEMP_FILE=$(mktemp)
jq '.status = "completed" | .completed = (now | strftime("%Y-%m-%dT%H:%M:%SZ"))' "$SPRINT_FILE" > "$TEMP_FILE"
mv "$TEMP_FILE" "$SPRINT_FILE"

# Update design_doc and sprint_plan paths in JSON to point to new locations
if [ ${#MOVED_FILES[@]} -gt 0 ]; then
    TEMP_FILE=$(mktemp)
    NEW_DESIGN_DOC="$IMPL_DIR/$(basename "$DESIGN_DOC" 2>/dev/null || echo "")"
    NEW_SPRINT_PLAN="$IMPL_DIR/$(basename "$SPRINT_PLAN" 2>/dev/null || echo "")"

    jq --arg dd "$NEW_DESIGN_DOC" --arg sp "$NEW_SPRINT_PLAN" '
        if .design_doc then .design_doc = $dd else . end |
        if .sprint_plan then .sprint_plan = $sp else . end
    ' "$SPRINT_FILE" > "$TEMP_FILE"
    mv "$TEMP_FILE" "$SPRINT_FILE"
    echo "Updated file paths in sprint JSON"
fi

# Get linked GitHub issues if any
GITHUB_ISSUES=$(jq -r '.github_issues // [] | map("#" + tostring) | join(", ")' "$SPRINT_FILE" 2>/dev/null || echo "")
ISSUE_REF=""
if [ -n "$GITHUB_ISSUES" ]; then
    ISSUE_REF=", refs $GITHUB_ISSUES"
fi

echo ""
echo "═══════════════════════════════════════════════════════════════"
echo " Sprint Finalized Successfully!"
echo "═══════════════════════════════════════════════════════════════"
echo ""
echo "Summary:"
echo "  Sprint ID: $SPRINT_ID"
echo "  Status: completed"
echo "  Target version: $TARGET_VERSION"
if [ -n "$GITHUB_ISSUES" ]; then
    echo "  Linked issues: $GITHUB_ISSUES"
fi
if [ ${#MOVED_FILES[@]} -gt 0 ]; then
    echo ""
    echo "Files moved:"
    for f in "${MOVED_FILES[@]}"; do
        echo "  $f"
    done
fi
echo ""
echo "Next steps:"
if [ -n "$GITHUB_ISSUES" ]; then
    echo "  1. Commit the changes: git add -A && git commit -m 'Finalize sprint $SPRINT_ID${ISSUE_REF}'"
else
    echo "  1. Commit the changes: git add -A && git commit -m 'Finalize sprint $SPRINT_ID'"
fi
echo "  2. Update CHANGELOG.md if not already done"
echo "  3. Consider creating a release if milestone reached"

```