AILANG Post-Release Tasks
Run automated post-release workflow (eval baselines, dashboard, docs) for AILANG releases. Executes 46-benchmark suite (medium/hard/stretch) + full standard eval with validation and progress reporting. Use when user says "post-release tasks for vX.X.X" or "update dashboard". Fully autonomous with pre-flight checks.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install sunholo-data-ailang-post-release
Repository
Skill path: .claude/skills/post-release
Run automated post-release workflow (eval baselines, dashboard, docs) for AILANG releases. Executes 46-benchmark suite (medium/hard/stretch) + full standard eval with validation and progress reporting. Use when user says "post-release tasks for vX.X.X" or "update dashboard". Fully autonomous with pre-flight checks.
Open repositoryBest for
Primary workflow: Write Technical Docs.
Technical facets: Full Stack, Data / AI, Tech Writer.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: sunholo-data.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install AILANG Post-Release Tasks into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/sunholo-data/ailang before adding AILANG Post-Release Tasks to shared team environments
- Use AILANG Post-Release Tasks for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: AILANG Post-Release Tasks
description: Run automated post-release workflow (eval baselines, dashboard, docs) for AILANG releases. Executes 46-benchmark suite (medium/hard/stretch) + full standard eval with validation and progress reporting. Use when user says "post-release tasks for vX.X.X" or "update dashboard". Fully autonomous with pre-flight checks.
---
# AILANG Post-Release Tasks
Run post-release tasks for an AILANG release: evaluation baselines, dashboard updates, and documentation.
## Quick Start
**Most common usage:**
```bash
# User says: "Run post-release tasks for v0.3.14"
# This skill will:
# 1. Run eval baseline (ALL 6 PRODUCTION MODELS, both languages) - ALWAYS USE --full FOR RELEASES
# 2. Update website dashboard (JSON with history preservation)
# 3. Update axiom scorecard KPI (if features affect axiom compliance)
# 4. Extract metrics and UPDATE CHANGELOG.md automatically
# 5. Move design docs from planned/ to implemented/
# 6. Run docs-sync to verify website accuracy (version constants, PLANNED banners, examples)
# 7. Commit all changes to git
```
**🚨 CRITICAL: For releases, ALWAYS use --full flag by default**
- Dev models (without --full) are only for quick testing/validation, NOT releases
- Users expect full benchmark results when they say "post-release" or "update dashboard"
- Never start with dev models and then try to add production models later
## When to Use This Skill
Invoke this skill when:
- User says "post-release tasks", "update dashboard", "run benchmarks"
- After successful release (once GitHub release is published)
- User asks about eval baselines or benchmark results
- User wants to update documentation after a release
## Available Scripts
### `scripts/run_eval_baseline.sh <version> [--full]`
Run evaluation baseline for a release version.
**🚨 CRITICAL: ALWAYS use --full for releases!**
**Usage:**
```bash
# ✅ CORRECT - For releases (ALL 6 production models)
.claude/skills/post-release/scripts/run_eval_baseline.sh 0.3.14 --full
.claude/skills/post-release/scripts/run_eval_baseline.sh v0.3.14 --full
# ❌ WRONG - Only use without --full for quick testing/validation
.claude/skills/post-release/scripts/run_eval_baseline.sh 0.3.14
```
**Output:**
```
Running eval baseline for 0.3.14...
Mode: FULL (all 6 production models)
Expected cost: ~$0.50-1.00
Expected time: ~15-20 minutes
[Running benchmarks...]
✓ Baseline complete
Results: eval_results/baselines/0.3.14
Files: 264 result files
```
**What it does:**
- **Step 1**: Runs `make eval-baseline` (standard 0-shot + repair evaluation)
- Tests both AILANG and Python implementations
- Uses all 6 production models (--full) or 3 dev models (default)
- Tests all benchmarks in benchmarks/ directory
- **Step 2**: Runs agent eval on full benchmark suite
- **Current suite** (v0.4.8+): 46 benchmarks (trimmed from 56)
- Removed: trivial benchmarks (print tests), most easy benchmarks
- Kept: fizzbuzz (1 easy for validation), all medium/hard/stretch benchmarks
- **Stretch goals** (6 new): symbolic_diff, mini_interpreter, lambda_calc,
graph_bfs, type_unify, red_black_tree
- Expected: ~55-70% success rate with haiku, higher with sonnet/opus
- Uses haiku+sonnet (--full) or haiku only (default)
- Tests both AILANG and Python implementations
- Saves combined results to eval_results/baselines/X.X.X/
- Accepts version with or without 'v' prefix
### `scripts/update_dashboard.sh <version>`
Update website benchmark dashboard with new release data.
**Usage:**
```bash
.claude/skills/post-release/scripts/update_dashboard.sh 0.3.14
```
**Output:**
```
Updating dashboard for 0.3.14...
1/5 Generating Docusaurus markdown...
✓ Written to docs/docs/benchmarks/performance.md
2/5 Generating dashboard JSON with history...
✓ Written to docs/static/benchmarks/latest.json (history preserved)
3/5 Validating JSON...
✓ Version: 0.3.14
✓ Success rate: 0.627
4/5 Clearing Docusaurus cache...
✓ Cache cleared
5/5 Summary
✓ Dashboard updated for 0.3.14
✓ Markdown: docs/docs/benchmarks/performance.md
✓ JSON: docs/static/benchmarks/latest.json
Next steps:
1. Test locally: cd docs && npm start
2. Visit: http://localhost:3000/ailang/docs/benchmarks/performance
3. Verify timeline shows 0.3.14
4. Commit: git add docs/docs/benchmarks/performance.md docs/static/benchmarks/latest.json
5. Commit: git commit -m 'Update benchmark dashboard for 0.3.14'
6. Push: git push
```
**What it does:**
- Generates Docusaurus-formatted markdown
- Updates dashboard JSON with history preservation
- Validates JSON structure (version matches input exactly)
- Clears Docusaurus build cache
- Provides next steps for testing and committing
- Accepts version with or without 'v' prefix
### `scripts/extract_changelog_metrics.sh [json_file]`
Extract benchmark metrics from dashboard JSON for CHANGELOG.
**Usage:**
```bash
.claude/skills/post-release/scripts/extract_changelog_metrics.sh
# Or specify JSON file:
.claude/skills/post-release/scripts/extract_changelog_metrics.sh docs/static/benchmarks/latest.json
```
**Output:**
```
Extracting metrics from docs/static/benchmarks/latest.json...
=== CHANGELOG.md Template ===
### Benchmark Results (M-EVAL)
**Overall Performance**: 59.1% success rate (399 total runs)
**By Language:**
- **AILANG**: 33.0% - New language, learning curve
- **Python**: 87.0% - Baseline for comparison
- **Gap: 54.0 percentage points (expected for new language)
**Comparison**: -15.2% AILANG regression from 0.3.14 (48.2% → 33.0%)
=== End Template ===
Use this template in CHANGELOG.md for 0.3.15
```
**What it does:**
- Parses dashboard JSON for metrics
- Calculates percentages and gap between AILANG/Python
- **Auto-compares with previous version from history**
- **Formats comparison text automatically** (+X% improvement or -X% regression)
- Generates ready-to-paste CHANGELOG template with no manual work needed
### `scripts/cleanup_design_docs.sh <version> [--dry-run] [--force] [--check-only]`
Move design docs from planned/ to implemented/ after a release.
**Features:**
- Detects **duplicates** (docs already in implemented/)
- Detects **misplaced docs** (Target: field doesn't match folder version)
- Only moves docs with "Status: Implemented" in their frontmatter
- Docs with other statuses are flagged for review
**Usage:**
```bash
# Check-only: Report issues without making changes
.claude/skills/post-release/scripts/cleanup_design_docs.sh 0.5.9 --check-only
# Preview what would be moved/deleted/relocated
.claude/skills/post-release/scripts/cleanup_design_docs.sh 0.5.9 --dry-run
# Execute: Move implemented docs, delete duplicates, relocate misplaced
.claude/skills/post-release/scripts/cleanup_design_docs.sh 0.5.9
# Force move all docs regardless of status
.claude/skills/post-release/scripts/cleanup_design_docs.sh 0.5.9 --force
```
**Output:**
```
Design Doc Cleanup for v0.5.9
==================================
Checking 5 design doc(s) in design_docs/planned/v0_5_9/:
Phase 1: Detecting issues...
[DUPLICATE] m-fix-if-else-let-block.md
Already exists in design_docs/implemented/v0_5_9/
[MISPLACED] m-codegen-value-types.md
Target: v0.5.10 (folder: v0_5_9)
Should be in: design_docs/planned/v0_5_10/
Issues found:
- 1 duplicate(s) (can be deleted)
- 1 misplaced doc(s) (wrong version folder)
Phase 2: Processing docs...
[DELETED] m-fix-if-else-let-block.md (duplicate - already in implemented/)
[RELOCATED] m-codegen-value-types.md → design_docs/planned/v0_5_10/
[MOVED] m-dx11-cycles.md → design_docs/implemented/v0_5_9/
[NEEDS REVIEW] m-unfinished-feature.md
Found: **Status**: Planned
Summary:
✓ Deleted 1 duplicate(s)
✓ Relocated 1 doc(s) to correct version folder
✓ Moved 1 doc(s) to design_docs/implemented/v0_5_9/
⚠ 1 doc(s) need review (not marked as Implemented)
```
**What it does:**
- **Phase 1 (Detection)**: Identifies duplicates and misplaced docs
- **Phase 2 (Processing)**:
- Deletes duplicates (same file already exists in implemented/)
- Relocates misplaced docs (Target: version doesn't match folder)
- Moves docs with "Status: Implemented" to implemented/
- Flags docs without Implemented status for review
- Creates target folders if needed
- Removes empty planned folder after cleanup
- Use `--check-only` to only report issues, `--dry-run` to preview actions, `--force` to move all
## Post-Release Workflow
### 1. Verify Release Exists
```bash
git tag -l vX.X.X
gh release view vX.X.X
```
If release doesn't exist, run `release-manager` skill first.
### 2. Run Eval Baseline
**🚨 CRITICAL: ALWAYS use --full for releases!**
**Correct workflow:**
```bash
# ✅ ALWAYS do this for releases - ONE command, let it complete
.claude/skills/post-release/scripts/run_eval_baseline.sh X.X.X --full
```
This runs all 6 production models with both AILANG and Python.
**Cost**: ~$0.50-1.00
**Time**: ~15-20 minutes
**❌ WRONG workflow (what happened with v0.3.22):**
```bash
# DON'T do this for releases!
.claude/skills/post-release/scripts/run_eval_baseline.sh X.X.X # Missing --full!
# Then try to add production models later with --skip-existing
# Result: Confusion, multiple processes, incomplete baseline
```
**If baseline times out or is interrupted:**
```bash
# Resume with ALL 6 models (maintains --full semantics)
ailang eval-suite --full --langs python,ailang --parallel 5 \
--output eval_results/baselines/X.X.X --skip-existing
```
The `--skip-existing` flag skips benchmarks that already have result files, allowing resumption of interrupted runs. But ONLY use this for recovery, not as a strategy to "add more models later".
### 3. Update Website Dashboard
**Use the automation script:**
```bash
.claude/skills/post-release/scripts/update_dashboard.sh X.X.X
```
**IMPORTANT**: This script automatically:
- Generates Docusaurus markdown (docs/docs/benchmarks/performance.md)
- Updates JSON with history preservation (docs/static/benchmarks/latest.json)
- Does NOT overwrite historical data - merges new version into existing history
- Validates JSON structure before writing
- Clears Docusaurus cache to prevent webpack errors
**Test locally (optional but recommended):**
```bash
cd docs && npm start
# Visit: http://localhost:3000/ailang/docs/benchmarks/performance
```
Verify:
- Timeline shows vX.X.X
- Success rate matches eval results
- No errors or warnings
**Commit dashboard updates:**
```bash
git add docs/docs/benchmarks/performance.md docs/static/benchmarks/latest.json
git commit -m "Update benchmark dashboard for vX.X.X"
git push
```
### 4. Update Axiom Scorecard
**Review and update the axiom scorecard:**
```bash
# View current scorecard
ailang axioms
# The scorecard is at docs/static/benchmarks/axiom_scorecard.json
# Update scores if features were added/improved that affect axiom compliance
```
**When to update scores:**
- +1 → +2 if a partial implementation becomes complete
- New feature aligns with an axiom → update evidence
- Gaps were fixed → remove from gaps array
- Add to history array to track KPI over time
**Update history entry:**
```json
{
"version": "vX.X.X",
"date": "YYYY-MM-DD",
"score": 18,
"maxScore": 24,
"percentage": 75.0,
"notes": "Added capability budgets (A9 +1)"
}
```
### 5. Extract Metrics for CHANGELOG
**Generate metrics template:**
```bash
.claude/skills/post-release/scripts/extract_changelog_metrics.sh X.X.X
```
This outputs a formatted template with:
- Overall success rate
- Standard eval metrics (0-shot, final with repair, repair effectiveness)
- Agent eval metrics by language
- **Automatic comparison with previous version** (no manual work!)
**Update CHANGELOG.md automatically:**
1. Run the script to generate the template
2. Insert the "Benchmark Results (M-EVAL)" section into CHANGELOG.md
3. Place it after the feature/fix sections and before the next version
For CHANGELOG template format, see [`resources/version_notes.md`](resources/version_notes.md).
### 5a. Analyze Agent Evaluation Results
```bash
# Get KPIs (turns, tokens, cost by language)
.claude/skills/eval-analyzer/scripts/agent_kpis.sh eval_results/baselines/X.X.X
```
Target metrics: Avg Turns ≤1.5x gap, Avg Tokens ≤2.0x gap vs Python.
For detailed agent analysis guide, see [`resources/version_notes.md`](resources/version_notes.md).
### 6. Move Design Docs to Implemented
**Step 1: Check for issues (duplicates, misplaced docs):**
```bash
.claude/skills/post-release/scripts/cleanup_design_docs.sh X.X.X --check-only
```
**Step 2: Preview all changes:**
```bash
.claude/skills/post-release/scripts/cleanup_design_docs.sh X.X.X --dry-run
```
**Step 3: Check any flagged docs:**
- `[DUPLICATE]` - These will be deleted (already in implemented/)
- `[MISPLACED]` - These will be relocated to correct version folder
- `[NEEDS REVIEW]` - Update `**Status**:` to `Implemented` if done, or leave for next version
**Step 4: Execute the cleanup:**
```bash
.claude/skills/post-release/scripts/cleanup_design_docs.sh X.X.X
```
**Step 5: Commit the changes:**
```bash
git add design_docs/
git commit -m "docs: cleanup design docs for vX_Y_Z"
```
The script automatically:
- Deletes duplicates (same file already in implemented/)
- Relocates misplaced docs (Target: version doesn't match folder)
- Moves docs with "Status: Implemented" to implemented/
- Flags remaining docs for manual review
### 7. Update Public Documentation
- Update `prompts/` with latest AILANG syntax
- Update website docs (`docs/`) with latest features
- Remove outdated examples or references
- Add new examples to website
- Update `docs/guides/evaluation/` if significant benchmark improvements
- **Update `docs/LIMITATIONS.md`**:
- Remove limitations that were fixed in this release
- Add new known limitations discovered during development/testing
- Update workarounds if they changed
- Update version numbers in "Since" and "Fixed in" fields
- **Test examples**: Verify that limitations listed still exist and workarounds still work
```bash
# Test examples from LIMITATIONS.md
# Example: Test Y-combinator still fails (should fail)
echo 'let Y = \f. (\x. f(x(x)))(\x. f(x(x))) in Y' | ailang repl
# Example: Test named recursion works (should succeed)
ailang run examples/factorial.ail
# Example: Test polymorphic operator limitation (should panic with floats)
# Create test file and verify behavior matches documentation
```
- Commit changes: `git add docs/LIMITATIONS.md && git commit -m "Update LIMITATIONS.md for vX.X.X"`
### 8. Run Documentation Sync Check
**Run docs-sync to verify website accuracy:**
```bash
# Check version constants are correct
.claude/skills/docs-sync/scripts/check_versions.sh
# Audit design docs vs website claims
.claude/skills/docs-sync/scripts/audit_design_docs.sh
# Generate full sync report
.claude/skills/docs-sync/scripts/generate_report.sh
```
**What docs-sync checks:**
- Version constants in `docs/src/constants/version.js` match git tag
- Teaching prompt references point to latest version
- Architecture pages have PLANNED banners for unimplemented features
- Design docs status (planned vs implemented) matches website claims
- Examples referenced in website actually work
**If issues found:**
1. Update version.js if stale
2. Add PLANNED banners to theoretical feature pages
3. Move implemented features from roadmap to current sections
4. Fix broken example references
**Commit docs-sync fixes:**
```bash
git add docs/
git commit -m "docs: sync website with vX.X.X implementation"
```
See [docs-sync skill](../docs-sync/SKILL.md) for full documentation.
## Resources
### Post-Release Checklist
See [`resources/post_release_checklist.md`](resources/post_release_checklist.md) for complete step-by-step checklist.
## Prerequisites
- Release vX.X.X completed successfully
- Git tag vX.X.X exists
- GitHub release published with all binaries
- `ailang` binary installed (for eval baseline)
- Node.js/npm installed (for dashboard, optional)
## Common Issues
### Eval Baseline Times Out
**Solution**: Use `--skip-existing` flag to resume:
```bash
bin/ailang eval-suite --full --skip-existing --output eval_results/baselines/vX.X.X
```
### Dashboard Shows "null" for Aggregates
**Cause**: Wrong JSON file (performance matrix vs dashboard JSON)
**Solution**: Use `update_dashboard.sh` script, not manual file copying
### Webpack/Cache Errors in Docusaurus
**Cause**: Stale build cache
**Solution**: Run `cd docs && npm run clear && rm -rf .docusaurus build`
### Dashboard Shows Old Version
**Cause**: Didn't run update_dashboard.sh with correct version
**Solution**: Re-run `update_dashboard.sh X.X.X` with correct version
## Progressive Disclosure
This skill loads information progressively:
1. **Always loaded**: This SKILL.md file (YAML frontmatter + workflow overview)
2. **Execute as needed**: Scripts in `scripts/` directory (automation)
3. **Load on demand**: `resources/post_release_checklist.md` (detailed checklist)
Scripts execute without loading into context window, saving tokens while providing powerful automation.
## Version Notes
For historical improvements and lessons learned, see [`resources/version_notes.md`](resources/version_notes.md).
Key points:
- All scripts accept version with or without 'v' prefix
- Use `--validate` flag to check configuration before running
- Agent eval requires explicit `--benchmarks` list (defined in script)
## Notes
- This skill follows Anthropic's Agent Skills specification (Oct 2025)
- Scripts handle 100% of automation (eval baseline, dashboard, metrics extraction, design docs)
- Can be run hours or even days after release
- Dashboard JSON preserves history - never overwrites historical data
- Always use `--full` flag for release baselines (all production models)
- Design docs cleanup now handles duplicates, misplaced docs, and status-based moves
- `--check-only` to report issues without changes
- `--dry-run` to preview all actions
- `--force` to move all regardless of status
---
## Referenced Files
> The following files are referenced in this skill and included for context.
### resources/version_notes.md
```markdown
# Post-Release Version Notes
Historical improvements and lessons learned from past releases.
## Recent Improvements (v0.3.15)
### Version Format Handling
All scripts now accept version with or without 'v' prefix:
- `0.3.15` works
- `v0.3.15` works
- Scripts pass version to underlying tools exactly as given
- No more "directory not found" errors due to version format mismatch
### Auto-Comparison in Metrics
The `extract_changelog_metrics.sh` script now:
- Automatically finds previous version from history
- Calculates AILANG performance difference
- Formats comparison text ("+X% improvement" or "-X% regression")
- Shows before/after percentages: "(48.2% → 33.0%)"
- **Zero manual jq queries needed!**
### Eliminated Manual Work
Before v0.3.15, you needed to:
- Run 15+ manual jq queries to extract metrics
- Manually compare with previous version
- Format comparison text yourself
- Handle version prefix inconsistencies
After v0.3.15:
- **3 scripts do everything automatically**
- No jq queries needed
- No manual calculations
- Version format doesn't matter
## Lessons Learned (v0.4.8)
### Always Validate Before Long-Running Operations
The pre-release checks now include:
1. **Golden file validation**: `make test-import-errors` to catch stale goldens
2. **Agent eval config validation**: `--validate` flag to verify benchmarks list
**Issue discovered**: Agent eval requires explicit `--benchmarks` list (safety feature), but script didn't have it defined. Now fixed with `AGENT_BENCHMARKS` at top of script.
### Agent Eval Requirements
- Agent mode REQUIRES explicit `--benchmarks` list (46 benchmarks as of v0.4.8)
- The list is defined in `AGENT_BENCHMARKS` variable at top of `run_eval_baseline.sh`
- Keep in sync with `benchmarks/` directory
### Golden Files Can Become Stale
When error behavior changes (e.g., module now exists → different error code), golden files need regeneration:
```bash
make regen-import-error-goldens
```
Pre-release checks now validate goldens match current behavior.
### Validate Script Before Running
Use `--validate` flag to check configuration without running full eval:
```bash
.claude/skills/post-release/scripts/run_eval_baseline.sh --validate
```
This checks:
- AGENT_BENCHMARKS is defined
- Benchmark count (46 expected)
- ailang command exists
- Benchmark files exist
## CHANGELOG Template Example
When updating CHANGELOG.md with benchmark results, use this format:
```markdown
### Benchmark Results (M-EVAL)
**Overall Performance**: 68.1% success rate (480 total runs)
**Standard Eval (0-shot + self-repair):**
| Metric | 0.4.4 | 0.4.5 | Change |
|--------|--------|--------|--------|
| **0-shot (first attempt)** | 55.6% | 64.0% (182/284) | **+8.4%** |
| **Final (with repair)** | 60.5% | 68.6% (195/284) | **+8.1%** |
| **Repair effectiveness** | +4.9pp | +4.6pp | -.3pp |
| **Python (final)** | 73.1% | 76.4% (208/272) | +3.3% |
**Agent Eval (multi-turn iterative problem solving):**
| Language | 0.4.4 | 0.4.5 | Change |
|----------|--------|--------|--------|
| **AILANG** | 92.1% | 100.0% (38/38) | **+7.9%** |
| **Python** | 100.0% | 100.0% (38/38) | 0% |
**Key Findings:**
- Major improvement in 0-shot performance
- Perfect agent eval score achieved
```
## Agent Eval Analysis Guide
### Check agent efficiency metrics:
```bash
# Get KPIs (turns, tokens, cost by language)
.claude/skills/eval-analyzer/scripts/agent_kpis.sh eval_results/baselines/X.X.X
```
### Key metrics to track:
- **Avg Turns**: AILANG vs Python (target: ≤1.5x gap)
- **Avg Tokens**: AILANG vs Python (target: ≤2.0x gap)
- **Success Rate**: Both should be 100% (agent mode corrects mistakes)
### Example good result:
```
🐍 Python: 10.6 avg turns, 58k tokens, 100% success
🔷 AILANG: 12.0 avg turns, 72k tokens, 100% success
Gap: 1.13x turns, 1.24x tokens ✅ (within target!)
```
### Example needs work:
```
🐍 Python: 10.6 avg turns, 58k tokens, 100% success
🔷 AILANG: 18.0 avg turns, 178k tokens, 100% success
Gap: 1.7x turns, 3.0x tokens ⚠️ (needs optimization!)
```
### If AILANG significantly worse than Python:
1. Identify expensive benchmarks (from "Most Expensive" section)
2. View transcripts: `.claude/skills/eval-analyzer/scripts/agent_transcripts.sh eval_results/baselines/X.X.X <benchmark>`
3. File optimization tasks for next release
```
### ../docs-sync/SKILL.md
```markdown
---
name: docs-sync
description: Sync AILANG documentation website with codebase reality. Use after releases, when features are implemented, or when website accuracy is questioned. Checks design docs vs website, validates examples, updates feature status.
---
# Documentation Sync Skill
Keep the AILANG website in sync with actual implementation by checking design docs, validating examples, and tracking feature status.
## Quick Start
```bash
# After a release - full sync check
# User: "sync docs for v0.5.6" or "run docs-sync"
# Check specific area
# User: "verify landing pages" or "check working examples"
```
## When to Use This Skill
Invoke this skill when:
- After any release (post-release should trigger this)
- User questions website accuracy
- Features move from planned → implemented
- Examples may be broken or outdated
- Version constants need updating
## Workflow
### 1. Pre-flight Checks
Run all diagnostic scripts to understand current state:
```bash
# Check design docs status
.claude/skills/docs-sync/scripts/audit_design_docs.sh
# Check version constants
.claude/skills/docs-sync/scripts/check_versions.sh
# Validate working examples
.claude/skills/docs-sync/scripts/check_examples.sh
```
### 2. Review Feature Themes
Features are grouped into themes (see [resources/feature_themes.md](resources/feature_themes.md)):
| Theme | Status Page | Description |
|-------|-------------|-------------|
| Core Language | `/reference/language-syntax` | Types, ADTs, pattern matching |
| Effect System | `/reference/effects` | Capabilities, IO, FS, Net |
| Module System | `/reference/modules` | Imports, exports, aliasing |
| Go Codegen | `/guides/go-codegen` | Compilation to Go |
| AI Integration | `/guides/ai-integration` | Prompts, benchmarks, agents |
| Testing | `/guides/testing` | Inline tests, property-based |
| Developer Experience | `/guides/development` | REPL, debugging, CLI |
| **Roadmap: Execution Profiles** | `/roadmap/execution-profiles` | v0.6.0 planned |
| **Roadmap: Shared Semantic State** | `/roadmap/shared-semantic-state` | v0.6.0 planned |
| **Roadmap: Deterministic Tooling** | `/roadmap/deterministic-tooling` | v0.7.0 planned |
### 3. Fix Priority Order
1. **Version Constants** - Update `docs/src/constants/version.js`
2. **Landing Pages** - intro.mdx, vision.mdx, why-ailang.mdx
3. **Working Examples** - Ensure examples use raw-loader, all pass
4. **Status Banners** - Add "PLANNED" banners to future features
5. **Feature Docs** - Create missing docs for implemented features
6. **Roadmap Section** - Move theoretical pages to roadmap
### 4. Update Checklist
After fixing, verify:
```bash
# Rebuild and check
cd docs && npm run build
# Verify no broken links
npm run serve # Manual check
# Commit changes
git add docs/
git commit -m "docs: sync website with v0.X.X implementation"
```
## Scripts
| Script | Purpose |
|--------|---------|
| `audit_design_docs.sh` | Compare planned vs implemented design docs |
| `derive_roadmap_versions.sh` | **Derive target versions from design doc folders** |
| `check_versions.sh` | Verify version constants match releases |
| `check_examples.sh` | Validate example files compile/run |
| `generate_report.sh` | Generate sync status report |
### CLI Example Verification (External)
The Makefile provides CLI example verification that complements this skill:
```bash
# Verify all CLI examples documented in examples/cli_examples.txt
make verify-cli-examples
# Full verification: code examples + CLI examples
make verify-examples && make verify-cli-examples
```
**CLI Examples File Format** (`examples/cli_examples.txt`):
```
# Comment explaining the example
$ ailang run --caps IO --entry main examples/runnable/hello.ail
Hello, AILANG!
```
This ensures CLI syntax in documentation matches actual behavior.
### Version Derivation Script
```bash
# List all planned features with derived target versions
.claude/skills/docs-sync/scripts/derive_roadmap_versions.sh
# Full lifecycle: planned + implemented features
.claude/skills/docs-sync/scripts/derive_roadmap_versions.sh --full
# Check website consistency (exits 1 if mismatches)
.claude/skills/docs-sync/scripts/derive_roadmap_versions.sh --check
# JSON output for automation
.claude/skills/docs-sync/scripts/derive_roadmap_versions.sh --json --full
# Full validation: all features + website check
.claude/skills/docs-sync/scripts/derive_roadmap_versions.sh --full --check
```
## Resources
| Resource | Content |
|----------|---------|
| `feature_themes.md` | Feature groupings and expected pages |
| `landing_page_checklist.md` | Requirements for main pages |
## Integration with Post-Release
The `post-release` skill should invoke docs-sync automatically:
```bash
# In post-release workflow after eval baselines
# Run docs-sync to update website
```
## Key Principles
1. **Theoretical is OK** - Future features can be documented, but must:
- Link to design docs on GitHub
- Have clear status banner (e.g., "PLANNED FOR v0.6.0")
- Be in roadmap section, not current features
2. **Examples Must Work** - Every code example should:
- Use raw-loader to import from `examples/`
- Be tested with `ailang run` or `ailang test`
- Never embed code directly in MDX
- **CLI commands must be added to `examples/cli_examples.txt`** and verified with `./tools/verify_cli_examples.sh`
- Default entrypoint is `main` - don't use `--entry main` unless showing non-main entry
- Test all commands before documenting: `./bin/ailang run --caps IO examples/runnable/hello.ail`
**Exception for Reference Documentation:**
- Reference pages (e.g., `language-syntax.md`, `effects.md`) may use inline syntax snippets
- These are small patterns showing language constructs, not complete runnable programs
- Tutorial pages (`examples.mdx`, `getting-started.mdx`, `guides/`) must always import from files
- Rule: If the code snippet is a complete runnable program, it MUST be imported
3. **Design Docs = Ultimate Source of Truth** - The folder structure tracks complete feature lifecycle:
**Planned features:**
- `design_docs/planned/v0_6_0/foo.md` → Feature targets v0.6.0
- Website pages must match: `PLANNED FOR v0.6.0` banner
- Website MUST link to design doc on GitHub as ultimate reference
- OVERDUE = design doc folder version ≤ current release but not implemented
**Implemented features:**
- `design_docs/implemented/v0_5_6/foo.md` → Feature shipped in v0.5.6
- Website reference pages can link to implemented design docs
- Moving `planned/` → `implemented/` = feature is done
**Validation:**
- Run `./scripts/derive_roadmap_versions.sh --check` to validate website
- Run `./scripts/derive_roadmap_versions.sh --full` to see complete lifecycle
- Script checks for GitHub links in roadmap pages
4. **One Source of Truth** - Version comes from:
- `git describe --tags` → actual version
- `prompts/versions.json` → latest prompt
- These feed into `docs/src/constants/version.js`
5. **Themes Over Changelog** - Group features by theme, not by version. Users care about "how do effects work?" not "what changed in v0.5.3?"
6. **Evolving Themes** - Themes should evolve as the language grows:
- When a feature doesn't fit existing themes, consider a new theme
- New themes warrant new landing pages
- Update `resources/feature_themes.md` when adding themes
- Current themes: Core Language, Type System, Effect System, Module System, Go Codegen, Arrays, Testing, AI Integration, Developer Experience
- Roadmap themes: Execution Profiles, Deterministic Tooling, Shared Semantic State
## Theme Evolution Guidelines
When reviewing new features, ask:
1. **Does this fit an existing theme?** → Add to that theme's page
2. **Is this a major new capability?** → Consider a new theme
3. **Is this cross-cutting?** → May need multiple mentions but one authoritative page
### Signals for a New Theme
- 3+ related features with no natural home
- A new design doc folder (e.g., `v0_7_0/`) with a coherent focus
- User questions consistently asking "how do I do X?" where X isn't covered
- A planned feature graduating to implemented that's substantial
### Creating a New Theme
1. Add entry to `resources/feature_themes.md`
2. Create website page at appropriate location
3. Update sidebar in `docs/sidebars.js`
4. Cross-link from related themes
5. Update this skill's theme table above
```
### resources/post_release_checklist.md
```markdown
# Post-Release Checklist
## Prerequisites
- [ ] Release vX.X.X completed successfully
- [ ] Git tag vX.X.X exists
- [ ] GitHub release published with all binaries
## 1. Run Eval Baseline
Use the automation script:
```bash
.claude/skills/post-release/scripts/run_eval_baseline.sh X.X.X --full
```
**Manual alternative:**
```bash
make eval-baseline EVAL_VERSION=X.X.X FULL=true
```
**If baseline times out or is interrupted:**
```bash
bin/ailang eval-suite --full --langs python,ailang --parallel 5 \
--output ./eval_results/baselines/vX.X.X --self-repair --skip-existing
```
- [ ] Eval baseline complete (eval_results/baselines/vX.X.X/)
- [ ] Results include both AILANG and Python
- [ ] All models run successfully
## 2. Update Website Dashboard
Use the automation script:
```bash
.claude/skills/post-release/scripts/update_dashboard.sh X.X.X
```
This will:
- Generate docs/docs/benchmarks/performance.md
- Update docs/static/benchmarks/latest.json (preserves history)
- Validate JSON output
- Clear Docusaurus cache
- [ ] Markdown generated (docs/docs/benchmarks/performance.md)
- [ ] JSON updated (docs/static/benchmarks/latest.json)
- [ ] JSON validation passed
- [ ] Cache cleared
## 3. Test Dashboard Locally (Optional)
```bash
cd docs && npm start
# Visit: http://localhost:3000/ailang/docs/benchmarks/performance
```
- [ ] Timeline shows vX.X.X
- [ ] Success rate matches eval results
- [ ] No webpack/cache errors
## 4. Update Axiom Scorecard (if applicable)
Review axiom compliance if the release includes features that affect design axioms:
```bash
# View current scorecard
ailang axioms
# Edit scorecard JSON if needed
# File: docs/static/benchmarks/axiom_scorecard.json
```
Update if:
- A partial implementation became complete (+1 → +2)
- New feature aligns with an axiom (add evidence)
- Gaps were fixed (remove from gaps array)
Always add history entry:
```json
{
"version": "vX.X.X",
"date": "YYYY-MM-DD",
"score": 18,
"maxScore": 24,
"percentage": 75.0,
"notes": "Brief description of changes"
}
```
- [ ] Axiom scorecard reviewed
- [ ] Scores updated if applicable
- [ ] History entry added
## 5. Extract Metrics for CHANGELOG
Use the automation script:
```bash
.claude/skills/post-release/scripts/extract_changelog_metrics.sh
```
This outputs a CHANGELOG.md template with:
- Overall success rate
- AILANG-only rate
- Python baseline rate
- Gap analysis
- [ ] Metrics extracted
- [ ] CHANGELOG.md updated with benchmark results
## 6. Commit Dashboard Updates
```bash
git add docs/docs/benchmarks/performance.md docs/static/benchmarks/latest.json
git commit -m "Update benchmark dashboard for vX.X.X"
git push
```
- [ ] Files staged
- [ ] Committed
- [ ] Pushed
## 7. Verify Sprint JSON Tracking
Check that sprint state JSON files are properly completed:
```bash
# List sprint JSON files in project
ls -la .ailang/state/sprints/
# Verify sprint completion for this release
cat .ailang/state/sprints/sprint_<MILESTONE>.json | jq '.'
```
**For each sprint JSON file related to this release:**
- [ ] Sprint status is `"completed"` (not `"in_progress"`)
- [ ] All milestones marked with `"passes": true`
- [ ] `completed` timestamp is filled out for each milestone
- [ ] `actual_loc` values are present (not 0)
- [ ] `velocity` section has final metrics calculated
- [ ] `completion_summary` section exists with:
- [ ] Total milestones count
- [ ] Key deliverable counts (tests, files, functions, etc.)
- [ ] Phase 2 message ID if applicable
- [ ] Any important metrics specific to the sprint
**If sprint JSON is incomplete:**
1. Review sprint completion documents (e.g., `M-<MILESTONE>-SPRINT-COMPLETE.md`)
2. Update sprint JSON with correct values
3. Create tracked completion summary if missing (sprint JSONs are gitignored)
**Common issues:**
- Sprint left in `"in_progress"` status
- Milestones missing completion timestamps
- `actual_loc` not calculated
- `velocity.efficiency` not computed
- `completion_summary` section missing
## 8. Update Design Docs
- [ ] Move completed design docs to design_docs/implemented/vX_Y/
- [ ] Update design docs with what was actually implemented
- [ ] Create new design docs in design_docs/planned/ for deferred features
## 9. Update Public Documentation
- [ ] Ensure prompts/ reflects latest AILANG syntax
- [ ] Update website docs (docs/) with latest features
- [ ] Remove old references or outdated examples
- [ ] Add new examples to website if applicable
- [ ] Update docs/guides/evaluation/ with significant improvements
- [ ] **Update docs/LIMITATIONS.md**:
- [ ] Remove limitations fixed in this release
- [ ] Add new limitations discovered during development
- [ ] Update workarounds if they changed
- [ ] Update version numbers ("Since", "Fixed in" fields)
- [ ] Test examples in LIMITATIONS.md still work/fail as documented
- [ ] Commit: `git add docs/LIMITATIONS.md && git commit -m "Update LIMITATIONS.md for vX.X.X"`
## 10. Run Documentation Sync Check
Use the docs-sync skill to verify website accuracy:
```bash
# Check version constants match git tag
.claude/skills/docs-sync/scripts/check_versions.sh
# Audit design docs vs website claims
.claude/skills/docs-sync/scripts/audit_design_docs.sh
# Generate full sync report
.claude/skills/docs-sync/scripts/generate_report.sh
```
- [ ] Version constants match git tag (docs/src/constants/version.js)
- [ ] Teaching prompt references point to latest version
- [ ] Architecture pages have PLANNED banners for unimplemented features
- [ ] No unimplemented features claimed as current
- [ ] Examples referenced in website actually work
**If issues found:**
- [ ] Update version.js if stale
- [ ] Add PLANNED banners to theoretical feature pages
- [ ] Move implemented features from roadmap to current sections
- [ ] Fix broken example references
- [ ] Commit: `git add docs/ && git commit -m "docs: sync website with vX.X.X"`
## Final Verification
- [ ] Eval baseline complete for vX.X.X
- [ ] CHANGELOG.md has benchmark results (AILANG + combined)
- [ ] Website dashboard shows vX.X.X as latest
- [ ] Dashboard timeline includes vX.X.X data point
- [ ] Dashboard JSON preserves history (multiple versions)
- [ ] **Axiom scorecard reviewed and updated if applicable**
- [ ] **Axiom history entry added for this version**
- [ ] Design docs moved to implemented/
- [ ] Public docs updated
- [ ] **docs/LIMITATIONS.md updated and tested**
- [ ] **docs-sync report shows no critical issues**
- [ ] All changes committed and pushed
## Notes
- Can be run hours or days after release
- Eval baselines cost ~\$0.50-1.00 (full) or ~\$0.10-0.20 (dev)
- Dashboard requires Node.js/npm installed
- Design doc migration is manual review
```
### scripts/run_eval_baseline.sh
```bash
#!/usr/bin/env bash
# Run evaluation baseline for a release version
set -euo pipefail
# Agent benchmarks list (46 benchmarks as of v0.4.8)
# This MUST be kept in sync with benchmarks/ directory
AGENT_BENCHMARKS="adt_option,api_call_json,balanced_parens,binary_tree_sum,canonical_normalization,cli_args,config_file_parser,csv_to_json_converter,effect_composition,effect_pure_separation,effect_tracking_io_fs,error_handling,exhaustive_pattern_matching,explicit_state_threading,expression_evaluator,fizzbuzz,float_eq,fold_reduce,gcd_lcm,graph_bfs,higher_order_functions,immutable_data_structures,inline_tests,json_encode,json_parse,json_transform,lambda_calc,list_comprehension,log_file_analyzer,merge_sort,mini_interpreter,nested_records,no_runtime_crashes_option,numeric_modulo,pattern_matching_complex,pipeline,record_update,records_person,recursion_fibonacci,red_black_tree,run_length_encode,state_machine_traffic_light,symbolic_diff,tree_transformation_pipeline,type_safe_record_access,type_unify"
# Validation mode: check configuration without running full eval
if [[ "${1:-}" == "--validate" ]]; then
echo "Validating agent eval configuration..."
# Check AGENT_BENCHMARKS is defined and non-empty
if [[ -z "${AGENT_BENCHMARKS:-}" ]]; then
echo "ERROR: AGENT_BENCHMARKS not defined"
exit 1
fi
# Count benchmarks
BENCHMARK_COUNT=$(echo "$AGENT_BENCHMARKS" | tr ',' '\n' | wc -l | tr -d ' ')
echo " Benchmarks defined: $BENCHMARK_COUNT"
# Verify ailang command exists
if ! command -v ailang &> /dev/null; then
echo "ERROR: ailang command not found"
exit 1
fi
echo " ailang command: found"
# Check that benchmark files exist (spot check first benchmark)
FIRST_BENCHMARK=$(echo "$AGENT_BENCHMARKS" | cut -d',' -f1)
if [[ -f "benchmarks/${FIRST_BENCHMARK}.yml" ]]; then
echo " Benchmark files: found (checked $FIRST_BENCHMARK.yml)"
else
echo "WARNING: Benchmark file not found: benchmarks/${FIRST_BENCHMARK}.yml"
fi
echo " ✓ Agent eval configuration valid"
exit 0
fi
# Progress monitoring (runs in background)
monitor_progress() {
local results_dir="$1"
local expected_count="$2"
local phase="$3"
echo "[$phase] Monitoring progress..."
while true; do
sleep 120 # Report every 2 minutes
if [[ -d "$results_dir" ]]; then
current=$(find "$results_dir" -name "*.json" 2>/dev/null | wc -l | tr -d ' ')
percent=$((current * 100 / expected_count))
echo "[$phase] Progress: $current/$expected_count files ($percent%)"
fi
done
}
if [[ $# -eq 0 ]]; then
echo "Usage: $0 <version> [--full]" >&2
echo "Example: $0 0.3.14 --full" >&2
echo "" >&2
echo "Options:" >&2
echo " --full Run with all production models (default: dev models only)" >&2
exit 1
fi
VERSION="$1"
FULL_FLAG=""
# Normalize version: ensure directory always has "v" prefix
# Strip any existing "v" prefix first, then add it
VERSION_NORMALIZED="${VERSION#v}"
RESULTS_DIR="eval_results/baselines/v$VERSION_NORMALIZED"
if [[ $# -gt 1 ]] && [[ "$2" == "--full" ]]; then
FULL_FLAG="FULL=true"
fi
echo "Running eval baseline for $VERSION..."
if [[ -n "$FULL_FLAG" ]]; then
echo "Mode: FULL (all 6 production models)"
echo "Expected cost: ~\$0.50-1.00"
echo "Expected time: ~15-20 minutes"
else
echo "Mode: DEV (3 cheap models only)"
echo "Expected cost: ~\$0.10-0.20"
echo "Expected time: ~5-10 minutes"
fi
echo
# Step 1: Run standard eval baseline (pass normalized version WITH v prefix)
echo "=== Step 1/2: Standard Eval (0-shot + repair) ==="
if [[ -n "$FULL_FLAG" ]]; then
monitor_progress "$RESULTS_DIR" 480 "Standard" &
MONITOR_PID=$!
make eval-baseline EVAL_VERSION="v$VERSION_NORMALIZED" FULL=true
kill $MONITOR_PID 2>/dev/null || true
else
monitor_progress "$RESULTS_DIR" 246 "Standard" &
MONITOR_PID=$!
make eval-baseline EVAL_VERSION="v$VERSION_NORMALIZED"
kill $MONITOR_PID 2>/dev/null || true
fi
# Step 2: Run agent eval on curated benchmarks
echo
echo "=== Step 2/2: Agent Eval (multi-turn) ==="
echo "Running agent eval on curated benchmarks..."
# AGENT_BENCHMARKS is defined at the top of the script (46 benchmarks)
# See: benchmark-manager skill for benchmark categories
echo "Benchmarks: all 46 benchmarks (see AGENT_BENCHMARKS at top of script)"
echo
# Pre-flight validation and configuration summary
echo "=== Pre-Flight Check ==="
echo "Version: $VERSION"
echo "Mode: ${FULL_FLAG:-DEV}"
echo "Benchmarks: 46 (1 easy, ~19 medium, ~17 hard, 6 stretch)"
echo "Agent models: claude-sonnet-4-5 (Claude executor), gemini-3-flash (Gemini executor)"
echo "Agent parallelism: 2"
echo "Agent timeout: default (no override)"
echo "Prompt version: latest (auto-selected)"
echo
echo "Expected results:"
if [[ -n "$FULL_FLAG" ]]; then
echo " Standard eval: ~552 files (46 benchmarks × 6 models × 2 langs)"
echo " Agent eval: ~184 files (46 benchmarks × 2 executors × 2 langs)"
echo " Total: ~736 files"
else
echo " Standard eval: ~276 files (46 benchmarks × 3 models × 2 langs)"
echo " Agent eval: ~92 files (46 benchmarks × 2 executors × 1 lang)"
echo " Total: ~368 files"
fi
echo
echo "Starting in 3 seconds... (Ctrl-C to abort)"
sleep 3
echo
# Agent eval uses both Claude and Gemini executors (multi-executor support v0.6.0+)
# --benchmarks is required for agent mode (safety feature)
# Models: claude-sonnet-4-5 (Claude executor), gemini-3-flash (Gemini executor)
AGENT_MODELS="claude-sonnet-4-5,gemini-3-flash"
if [[ -n "$FULL_FLAG" ]]; then
# Full mode: run both models on both languages
echo "Mode: FULL (Claude Sonnet + Gemini 3 Flash)"
echo "Executors: claude, gemini"
monitor_progress "$RESULTS_DIR" 184 "Agent" &
MONITOR_PID=$!
ailang eval-suite --agent \
--models "$AGENT_MODELS" \
--benchmarks "$AGENT_BENCHMARKS" \
--langs ailang,python \
--agent-parallel 2 \
--output "$RESULTS_DIR"
kill $MONITOR_PID 2>/dev/null || true
else
# Dev mode: both executors but AILANG only (faster)
echo "Mode: DEV (Claude Sonnet + Gemini 3 Flash, AILANG only)"
echo "Executors: claude, gemini"
monitor_progress "$RESULTS_DIR" 92 "Agent" &
MONITOR_PID=$!
ailang eval-suite --agent \
--models "$AGENT_MODELS" \
--benchmarks "$AGENT_BENCHMARKS" \
--langs ailang \
--agent-parallel 2 \
--output "$RESULTS_DIR"
kill $MONITOR_PID 2>/dev/null || true
fi
# Show combined results
echo
if [[ -d "$RESULTS_DIR" ]]; then
STANDARD_COUNT=$(find "$RESULTS_DIR/standard" -name "*.json" 2>/dev/null | wc -l | tr -d ' ')
AGENT_COUNT=$(find "$RESULTS_DIR/agent" -name "*.json" 2>/dev/null | wc -l | tr -d ' ')
TOTAL_COUNT=$((STANDARD_COUNT + AGENT_COUNT))
echo "✓ Baseline complete"
echo " Results: $RESULTS_DIR"
echo " Standard eval: $STANDARD_COUNT files"
echo " Agent eval: $AGENT_COUNT files"
echo " Total: $TOTAL_COUNT files"
else
echo "✗ Results directory not found: $RESULTS_DIR" >&2
exit 1
fi
# Validation
echo
echo "=== Validation ==="
# Check file counts (46 benchmarks × models × 2 langs)
if [[ -n "$FULL_FLAG" ]]; then
EXPECTED_STANDARD=552 # 46 × 6 × 2
EXPECTED_AGENT=184 # 46 × 2 × 2
EXPECTED_TOTAL=736
else
EXPECTED_STANDARD=276 # 46 × 3 × 2
EXPECTED_AGENT=92 # 46 × 1 × 2
EXPECTED_TOTAL=368
fi
# Allow 5% tolerance for filtering/failures
MIN_STANDARD=$((EXPECTED_STANDARD * 95 / 100))
MIN_AGENT=$((EXPECTED_AGENT * 85 / 100)) # Agent eval can have more variance
MIN_TOTAL=$((EXPECTED_TOTAL * 90 / 100))
if [[ $STANDARD_COUNT -lt $MIN_STANDARD ]]; then
echo "⚠️ WARNING: Standard eval produced fewer files than expected"
echo " Expected: ~$EXPECTED_STANDARD, Got: $STANDARD_COUNT (minimum: $MIN_STANDARD)"
fi
if [[ $AGENT_COUNT -lt $MIN_AGENT ]]; then
echo "⚠️ WARNING: Agent eval produced fewer files than expected"
echo " Expected: ~$EXPECTED_AGENT, Got: $AGENT_COUNT (minimum: $MIN_AGENT)"
fi
if [[ $TOTAL_COUNT -lt $MIN_TOTAL ]]; then
echo "⚠️ WARNING: Total file count below expected minimum"
echo " Expected: ~$EXPECTED_TOTAL, Got: $TOTAL_COUNT (minimum: $MIN_TOTAL)"
echo " This may indicate interrupted runs or configuration issues"
else
echo "✓ File counts within expected range"
fi
```
### scripts/update_dashboard.sh
```bash
#!/usr/bin/env bash
# Update website benchmark dashboard with new release data
set -euo pipefail
if [[ $# -eq 0 ]]; then
echo "Usage: $0 <version>" >&2
echo "Example: $0 0.3.14" >&2
exit 1
fi
VERSION="$1"
# Normalize version: ensure directory always has "v" prefix
# Strip any existing "v" prefix first, then add it
VERSION_NORMALIZED="${VERSION#v}"
RESULTS_DIR="eval_results/baselines/v$VERSION_NORMALIZED"
MARKDOWN_FILE="docs/docs/benchmarks/performance.md"
JSON_FILE="docs/static/benchmarks/latest.json"
# Verify results directory exists
if [[ ! -d "$RESULTS_DIR" ]]; then
echo "Error: Results directory not found: $RESULTS_DIR" >&2
echo "Run eval baseline first with run_eval_baseline.sh" >&2
exit 1
fi
echo "Updating dashboard for $VERSION..."
echo
# NOTE: We do NOT regenerate performance.md - it's a static template with React components
# The React components read data from latest.json, so we only update the JSON file
# Generate JSON with history preservation (writes to docs/static/benchmarks/latest.json automatically)
echo "1/3 Generating dashboard JSON with history..."
if ailang eval-report "$RESULTS_DIR" "$VERSION" --format=json > /dev/null; then
echo " ✓ Written to $JSON_FILE (history preserved)"
else
echo " ✗ Failed to generate JSON" >&2
exit 1
fi
echo
# Validate JSON (version in JSON matches what we passed)
echo "2/3 Validating JSON..."
if VERSION_CHECK=$(jq -r '.version' "$JSON_FILE" 2>/dev/null) && [[ "$VERSION_CHECK" == "$VERSION" ]]; then
SUCCESS_RATE=$(jq -r '.aggregates.finalSuccess' "$JSON_FILE" 2>/dev/null)
echo " ✓ Version: $VERSION_CHECK"
echo " ✓ Success rate: $SUCCESS_RATE"
else
echo " ✗ JSON validation failed (expected: $VERSION, got: $VERSION_CHECK)" >&2
exit 1
fi
echo
# Clear Docusaurus cache
echo "3/3 Clearing Docusaurus cache..."
if (cd docs && npm run clear > /dev/null 2>&1); then
echo " ✓ Cache cleared"
else
echo " ⚠ Cache clear failed (npm may not be installed)"
fi
echo
# Summary
echo "✓ Dashboard JSON updated for $VERSION"
echo " Data file: $JSON_FILE"
echo " Template: $MARKDOWN_FILE (static - not modified)"
echo
echo "Next steps:"
echo " 1. Test locally: cd docs && npm start"
echo " 2. Visit: http://localhost:3000/ailang/docs/benchmarks/performance"
echo " 3. Verify radar charts and timeline show $VERSION"
echo " 4. Commit: git add $JSON_FILE"
echo " 5. Commit: git commit -m 'Update benchmark dashboard for $VERSION'"
echo " 6. Push: git push"
echo
echo "Note: performance.md is a static template with React components."
echo " The components read data from latest.json - no need to regenerate it."
```
### scripts/extract_changelog_metrics.sh
```bash
#!/usr/bin/env bash
# Extract comprehensive benchmark metrics for CHANGELOG
# Extracts: 0-shot, final, repair effectiveness, and agent eval metrics
set -euo pipefail
VERSION="${1:-}"
if [[ -z "$VERSION" ]]; then
echo "Usage: $0 <version> [results_dir]" >&2
echo "Example: $0 0.4.1" >&2
exit 1
fi
# Normalize version: ensure directory always has "v" prefix
VERSION_NORMALIZED="${VERSION#v}"
RESULTS_DIR="${2:-eval_results/baselines/v$VERSION_NORMALIZED}"
JSON_FILE="docs/static/benchmarks/latest.json"
if [[ ! -d "$RESULTS_DIR" ]]; then
echo "Error: Results directory not found: $RESULTS_DIR" >&2
exit 1
fi
if [[ ! -f "$RESULTS_DIR/summary.jsonl" ]]; then
echo "Error: summary.jsonl not found. Run 'ailang eval-summary $RESULTS_DIR' first" >&2
exit 1
fi
if [[ ! -f "$JSON_FILE" ]]; then
echo "Error: Dashboard JSON not found: $JSON_FILE" >&2
echo "Run update_dashboard.sh first" >&2
exit 1
fi
SUMMARY="$RESULTS_DIR/summary.jsonl"
echo "Extracting metrics from $RESULTS_DIR..."
echo
# Extract standard eval metrics (AILANG only)
AILANG_TOTAL=$(jq -s 'map(select(.lang == "ailang")) | length' "$SUMMARY")
AILANG_0SHOT=$(jq -s 'map(select(.lang == "ailang" and .first_attempt_ok == true)) | length' "$SUMMARY")
AILANG_FINAL=$(jq -s 'map(select(.lang == "ailang" and .stdout_ok == true)) | length' "$SUMMARY")
# Calculate percentages
AILANG_0SHOT_PCT=$(echo "scale=1; $AILANG_0SHOT * 100 / $AILANG_TOTAL" | bc)
AILANG_FINAL_PCT=$(echo "scale=1; $AILANG_FINAL * 100 / $AILANG_TOTAL" | bc)
AILANG_REPAIR_GAIN=$(echo "scale=1; $AILANG_FINAL_PCT - $AILANG_0SHOT_PCT" | bc)
# Python metrics
PYTHON_TOTAL=$(jq -s 'map(select(.lang == "python")) | length' "$SUMMARY")
PYTHON_FINAL=$(jq -s 'map(select(.lang == "python" and .stdout_ok == true)) | length' "$SUMMARY")
PYTHON_FINAL_PCT=$(echo "scale=1; $PYTHON_FINAL * 100 / $PYTHON_TOTAL" | bc)
# Agent eval metrics
AILANG_AGENT_TOTAL=$(jq -s 'map(select(.lang == "ailang" and .eval_mode == "agent")) | length' "$SUMMARY")
AILANG_AGENT_SUCCESS=$(jq -s 'map(select(.lang == "ailang" and .eval_mode == "agent" and .stdout_ok == true)) | length' "$SUMMARY")
AILANG_AGENT_PCT=$(echo "scale=1; $AILANG_AGENT_SUCCESS * 100 / $AILANG_AGENT_TOTAL" | bc 2>/dev/null || echo "0.0")
PYTHON_AGENT_TOTAL=$(jq -s 'map(select(.lang == "python" and .eval_mode == "agent")) | length' "$SUMMARY")
PYTHON_AGENT_SUCCESS=$(jq -s 'map(select(.lang == "python" and .eval_mode == "agent" and .stdout_ok == true)) | length' "$SUMMARY")
PYTHON_AGENT_PCT=$(echo "scale=1; $PYTHON_AGENT_SUCCESS * 100 / $PYTHON_AGENT_TOTAL" | bc 2>/dev/null || echo "0.0")
# Overall success rate
OVERALL_RATE=$(jq -r '.aggregates.finalSuccess' "$JSON_FILE")
TOTAL_RUNS=$(jq -r '.totalRuns' "$JSON_FILE")
OVERALL_PCT=$(echo "$OVERALL_RATE * 100" | bc -l | xargs printf "%.1f")
# Find previous version in history
PREV_VERSION=$(jq -r '.history | sort_by(.timestamp) | reverse | .[1].version // "none"' "$JSON_FILE" 2>/dev/null || echo "none")
# Try to get previous version's summary.jsonl if it exists
# Normalize previous version path too
if [[ "$PREV_VERSION" != "none" ]] && [[ "$PREV_VERSION" != "null" ]]; then
PREV_VERSION_NORMALIZED="${PREV_VERSION#v}"
PREV_RESULTS_DIR="eval_results/baselines/v$PREV_VERSION_NORMALIZED"
else
PREV_RESULTS_DIR="eval_results/baselines/$PREV_VERSION"
fi
PREV_SUMMARY="$PREV_RESULTS_DIR/summary.jsonl"
if [[ "$PREV_VERSION" != "none" ]] && [[ "$PREV_VERSION" != "null" ]] && [[ -f "$PREV_SUMMARY" ]]; then
# Extract previous metrics
PREV_AILANG_0SHOT=$(jq -s 'map(select(.lang == "ailang" and .first_attempt_ok == true)) | length' "$PREV_SUMMARY")
PREV_AILANG_FINAL=$(jq -s 'map(select(.lang == "ailang" and .stdout_ok == true)) | length' "$PREV_SUMMARY")
PREV_AILANG_TOTAL=$(jq -s 'map(select(.lang == "ailang")) | length' "$PREV_SUMMARY")
PREV_AILANG_0SHOT_PCT=$(echo "scale=1; $PREV_AILANG_0SHOT * 100 / $PREV_AILANG_TOTAL" | bc)
PREV_AILANG_FINAL_PCT=$(echo "scale=1; $PREV_AILANG_FINAL * 100 / $PREV_AILANG_TOTAL" | bc)
PREV_AILANG_REPAIR_GAIN=$(echo "scale=1; $PREV_AILANG_FINAL_PCT - $PREV_AILANG_0SHOT_PCT" | bc)
PREV_PYTHON_TOTAL=$(jq -s 'map(select(.lang == "python")) | length' "$PREV_SUMMARY")
PREV_PYTHON_FINAL=$(jq -s 'map(select(.lang == "python" and .stdout_ok == true)) | length' "$PREV_SUMMARY")
PREV_PYTHON_FINAL_PCT=$(echo "scale=1; $PREV_PYTHON_FINAL * 100 / $PREV_PYTHON_TOTAL" | bc)
# Agent eval
PREV_AILANG_AGENT_TOTAL=$(jq -s 'map(select(.lang == "ailang" and .eval_mode == "agent")) | length' "$PREV_SUMMARY")
PREV_AILANG_AGENT_SUCCESS=$(jq -s 'map(select(.lang == "ailang" and .eval_mode == "agent" and .stdout_ok == true)) | length' "$PREV_SUMMARY")
PREV_AILANG_AGENT_PCT=$(echo "scale=1; $PREV_AILANG_AGENT_SUCCESS * 100 / $PREV_AILANG_AGENT_TOTAL" | bc 2>/dev/null || echo "0.0")
PREV_PYTHON_AGENT_TOTAL=$(jq -s 'map(select(.lang == "python" and .eval_mode == "agent")) | length' "$PREV_SUMMARY")
PREV_PYTHON_AGENT_SUCCESS=$(jq -s 'map(select(.lang == "python" and .eval_mode == "agent" and .stdout_ok == true)) | length' "$PREV_SUMMARY")
PREV_PYTHON_AGENT_PCT=$(echo "scale=1; $PREV_PYTHON_AGENT_SUCCESS * 100 / $PREV_PYTHON_AGENT_TOTAL" | bc 2>/dev/null || echo "0.0")
# Calculate changes
CHANGE_0SHOT=$(echo "scale=1; $AILANG_0SHOT_PCT - $PREV_AILANG_0SHOT_PCT" | bc)
CHANGE_FINAL=$(echo "scale=1; $AILANG_FINAL_PCT - $PREV_AILANG_FINAL_PCT" | bc)
CHANGE_REPAIR=$(echo "scale=1; $AILANG_REPAIR_GAIN - $PREV_AILANG_REPAIR_GAIN" | bc)
CHANGE_PYTHON=$(echo "scale=1; $PYTHON_FINAL_PCT - $PREV_PYTHON_FINAL_PCT" | bc)
CHANGE_AGENT_AILANG=$(echo "scale=1; $AILANG_AGENT_PCT - $PREV_AILANG_AGENT_PCT" | bc)
CHANGE_AGENT_PYTHON=$(echo "scale=1; $PYTHON_AGENT_PCT - $PREV_PYTHON_AGENT_PCT" | bc)
# Format change strings with +/- signs
[[ $(echo "$CHANGE_0SHOT > 0" | bc) -eq 1 ]] && CHANGE_0SHOT="+$CHANGE_0SHOT"
[[ $(echo "$CHANGE_FINAL > 0" | bc) -eq 1 ]] && CHANGE_FINAL="+$CHANGE_FINAL"
[[ $(echo "$CHANGE_REPAIR > 0" | bc) -eq 1 ]] && CHANGE_REPAIR="+$CHANGE_REPAIR"
[[ $(echo "$CHANGE_PYTHON > 0" | bc) -eq 1 ]] && CHANGE_PYTHON="+$CHANGE_PYTHON"
[[ $(echo "$CHANGE_AGENT_AILANG > 0" | bc) -eq 1 ]] && CHANGE_AGENT_AILANG="+$CHANGE_AGENT_AILANG"
[[ $(echo "$CHANGE_AGENT_PYTHON > 0" | bc) -eq 1 ]] && CHANGE_AGENT_PYTHON="+$CHANGE_AGENT_PYTHON"
else
PREV_VERSION="none"
PREV_AILANG_0SHOT_PCT="N/A"
PREV_AILANG_FINAL_PCT="N/A"
PREV_AILANG_REPAIR_GAIN="N/A"
PREV_PYTHON_FINAL_PCT="N/A"
PREV_AILANG_AGENT_PCT="N/A"
PREV_PYTHON_AGENT_PCT="N/A"
CHANGE_0SHOT="N/A"
CHANGE_FINAL="N/A"
CHANGE_REPAIR="N/A"
CHANGE_PYTHON="N/A"
CHANGE_AGENT_AILANG="N/A"
CHANGE_AGENT_PYTHON="N/A"
fi
# Output CHANGELOG template
echo "=== CHANGELOG.md Template ==="
echo
cat << EOF
### Benchmark Results (M-EVAL)
**Overall Performance**: ${OVERALL_PCT}% success rate ($TOTAL_RUNS total runs)
**Standard Eval (0-shot + self-repair):**
| Metric | $PREV_VERSION | $VERSION | Change |
|--------|--------|--------|--------|
| **0-shot (first attempt)** | ${PREV_AILANG_0SHOT_PCT}% | ${AILANG_0SHOT_PCT}% ($AILANG_0SHOT/$AILANG_TOTAL) | **${CHANGE_0SHOT}%** |
| **Final (with repair)** | ${PREV_AILANG_FINAL_PCT}% | ${AILANG_FINAL_PCT}% ($AILANG_FINAL/$AILANG_TOTAL) | **${CHANGE_FINAL}%** |
| **Repair effectiveness** | +${PREV_AILANG_REPAIR_GAIN}pp | +${AILANG_REPAIR_GAIN}pp | **${CHANGE_REPAIR}pp** |
| **Python (final)** | ${PREV_PYTHON_FINAL_PCT}% | ${PYTHON_FINAL_PCT}% ($PYTHON_FINAL/$PYTHON_TOTAL) | ${CHANGE_PYTHON}% |
**Agent Eval (multi-turn iterative problem solving):**
| Language | $PREV_VERSION | $VERSION | Change |
|----------|--------|--------|--------|
| **AILANG** | ${PREV_AILANG_AGENT_PCT}% | ${AILANG_AGENT_PCT}% ($AILANG_AGENT_SUCCESS/$AILANG_AGENT_TOTAL) | **${CHANGE_AGENT_AILANG}%** |
| **Python** | ${PREV_PYTHON_AGENT_PCT}% | ${PYTHON_AGENT_PCT}% ($PYTHON_AGENT_SUCCESS/$PYTHON_AGENT_TOTAL) | **${CHANGE_AGENT_PYTHON}%** |
**Key Findings:**
[Add analysis based on the data above]
EOF
echo "=== End Template ==="
echo
echo "Template generated for $VERSION (compared to $PREV_VERSION)"
```
### scripts/cleanup_design_docs.sh
```bash
#!/bin/bash
#
# Move design docs from planned/ to implemented/ after a release.
# Only moves docs with "Status: Implemented" in their frontmatter.
# Docs without this status are flagged for review.
#
# Features:
# - Detects duplicates (docs already in implemented/)
# - Detects misplaced docs (Target: field doesn't match folder)
# - Suggests removal of superseded docs
#
# Usage: cleanup_design_docs.sh <version> [--dry-run] [--force] [--check-only]
# Example: cleanup_design_docs.sh 0.5.7
# cleanup_design_docs.sh 0.5.7 --dry-run # Preview without changes
# cleanup_design_docs.sh 0.5.7 --force # Move all, ignore status
# cleanup_design_docs.sh 0.5.7 --check-only # Only report issues, don't move
#
set -e
VERSION="${1:-}"
DRY_RUN=false
FORCE=false
CHECK_ONLY=false
# Check for flags
for arg in "$@"; do
case "$arg" in
--dry-run) DRY_RUN=true ;;
--force) FORCE=true ;;
--check-only) CHECK_ONLY=true ;;
esac
done
# Remove 'v' prefix if present
VERSION="${VERSION#v}"
if [ -z "$VERSION" ]; then
echo "Usage: $0 <version> [--dry-run] [--force] [--check-only]"
echo "Example: $0 0.5.7"
echo " $0 0.5.7 --dry-run # Preview without changes"
echo " $0 0.5.7 --force # Move all docs, ignore status"
echo " $0 0.5.7 --check-only # Report issues only"
exit 1
fi
# Convert version to folder format (0.5.7 -> v0_5_7)
FOLDER_VERSION="v${VERSION//./_}"
PLANNED_DIR="design_docs/planned"
IMPLEMENTED_DIR="design_docs/implemented"
echo "Design Doc Cleanup for v${VERSION}"
echo "=================================="
echo ""
if [ "$CHECK_ONLY" = true ]; then
echo "Mode: CHECK ONLY (reporting issues, no changes)"
echo ""
elif [ "$DRY_RUN" = true ]; then
echo "Mode: DRY RUN (no changes will be made)"
echo ""
fi
if [ "$FORCE" = true ] && [ "$CHECK_ONLY" = false ]; then
echo "Mode: FORCE (moving all docs regardless of status)"
echo ""
fi
# Check if planned folder for this version exists
if [ ! -d "${PLANNED_DIR}/${FOLDER_VERSION}" ]; then
echo "No ${PLANNED_DIR}/${FOLDER_VERSION}/ folder found."
echo "Nothing to move."
exit 0
fi
# Count docs
DOC_COUNT=$(find "${PLANNED_DIR}/${FOLDER_VERSION}" -name "*.md" -type f 2>/dev/null | wc -l | tr -d ' ')
if [ "$DOC_COUNT" -eq 0 ]; then
echo "No design docs found in ${PLANNED_DIR}/${FOLDER_VERSION}/"
echo "Nothing to move."
exit 0
fi
echo "Checking ${DOC_COUNT} design doc(s) in ${PLANNED_DIR}/${FOLDER_VERSION}/:"
echo ""
# --- Phase 1: Issue Detection ---
echo "Phase 1: Detecting issues..."
echo ""
DUPLICATE_COUNT=0
MISPLACED_COUNT=0
DUPLICATE_DOCS=""
MISPLACED_DOCS=""
for doc in "${PLANNED_DIR}/${FOLDER_VERSION}"/*.md; do
if [ -f "$doc" ]; then
basename="$(basename "$doc")"
# Check for duplicates (already exists in implemented/)
if [ -f "${IMPLEMENTED_DIR}/${FOLDER_VERSION}/${basename}" ]; then
echo " [DUPLICATE] $basename"
echo " Already exists in ${IMPLEMENTED_DIR}/${FOLDER_VERSION}/"
DUPLICATE_COUNT=$((DUPLICATE_COUNT + 1))
DUPLICATE_DOCS="$DUPLICATE_DOCS$basename\n"
continue
fi
# Check for misplaced docs (Target: field doesn't match folder version)
# Handles formats: **Target**: v0.5.10, Target: v0.5.10, **Target:** 0.5.10
target_version=$(head -20 "$doc" | grep -iE "Target" | grep -oE "v?[0-9]+\.[0-9]+\.[0-9]+" | sed 's/^v//' | head -1 || echo "")
if [ -n "$target_version" ] && [ "$target_version" != "$VERSION" ]; then
target_folder="v${target_version//./_}"
echo " [MISPLACED] $basename"
echo " Target: v${target_version} (folder: ${FOLDER_VERSION})"
echo " Should be in: ${PLANNED_DIR}/${target_folder}/"
MISPLACED_COUNT=$((MISPLACED_COUNT + 1))
MISPLACED_DOCS="$MISPLACED_DOCS$basename:$target_folder\n"
fi
fi
done
if [ "$DUPLICATE_COUNT" -gt 0 ] || [ "$MISPLACED_COUNT" -gt 0 ]; then
echo ""
echo "Issues found:"
[ "$DUPLICATE_COUNT" -gt 0 ] && echo " - $DUPLICATE_COUNT duplicate(s) (can be deleted)"
[ "$MISPLACED_COUNT" -gt 0 ] && echo " - $MISPLACED_COUNT misplaced doc(s) (wrong version folder)"
echo ""
fi
# In check-only mode, stop here
if [ "$CHECK_ONLY" = true ]; then
echo "Check complete. Run without --check-only to process docs."
exit 0
fi
# --- Phase 2: Move implemented docs ---
echo "Phase 2: Processing docs..."
echo ""
MOVED_COUNT=0
NEEDS_REVIEW_COUNT=0
NEEDS_REVIEW_DOCS=""
SKIPPED_DUPLICATE=0
SKIPPED_MISPLACED=0
# Check each doc
for doc in "${PLANNED_DIR}/${FOLDER_VERSION}"/*.md; do
if [ -f "$doc" ]; then
basename="$(basename "$doc")"
# Skip duplicates (already flagged in Phase 1)
if [ -f "${IMPLEMENTED_DIR}/${FOLDER_VERSION}/${basename}" ]; then
if [ "$DRY_RUN" = true ]; then
echo " [WOULD DELETE] $basename (duplicate)"
else
rm "$doc"
echo " [DELETED] $basename (duplicate - already in implemented/)"
fi
SKIPPED_DUPLICATE=$((SKIPPED_DUPLICATE + 1))
continue
fi
# Check for misplaced docs
# Handles formats: **Target**: v0.5.10, Target: v0.5.10, **Target:** 0.5.10
target_version=$(head -20 "$doc" | grep -iE "Target" | grep -oE "v?[0-9]+\.[0-9]+\.[0-9]+" | sed 's/^v//' | head -1 || echo "")
if [ -n "$target_version" ] && [ "$target_version" != "$VERSION" ]; then
target_folder="v${target_version//./_}"
if [ "$DRY_RUN" = true ]; then
echo " [WOULD RELOCATE] $basename → ${PLANNED_DIR}/${target_folder}/"
else
mkdir -p "${PLANNED_DIR}/${target_folder}"
mv "$doc" "${PLANNED_DIR}/${target_folder}/"
echo " [RELOCATED] $basename → ${PLANNED_DIR}/${target_folder}/"
fi
SKIPPED_MISPLACED=$((SKIPPED_MISPLACED + 1))
continue
fi
# Check for "Status: Implemented" in first 15 lines (handles various formats)
is_implemented=false
if head -15 "$doc" | grep -qiE "^\*?\*?Status\*?\*?:?\s*\*?\*?Implemented"; then
is_implemented=true
fi
if [ "$FORCE" = true ] || [ "$is_implemented" = true ]; then
if [ "$DRY_RUN" = true ]; then
if [ "$is_implemented" = true ]; then
echo " [WOULD MOVE] $basename (Status: Implemented)"
else
echo " [WOULD MOVE] $basename (forced)"
fi
else
# Create implemented folder if needed
mkdir -p "${IMPLEMENTED_DIR}/${FOLDER_VERSION}"
# Move the doc
mv "$doc" "${IMPLEMENTED_DIR}/${FOLDER_VERSION}/"
if [ "$is_implemented" = true ]; then
echo " [MOVED] $basename → ${IMPLEMENTED_DIR}/${FOLDER_VERSION}/"
else
echo " [MOVED] $basename → ${IMPLEMENTED_DIR}/${FOLDER_VERSION}/ (forced)"
fi
fi
MOVED_COUNT=$((MOVED_COUNT + 1))
else
# Check what the actual status is
status_line=$(head -15 "$doc" | grep -iE "^\*?\*?Status\*?\*?:" | head -1 || echo "")
if [ -n "$status_line" ]; then
echo " [NEEDS REVIEW] $basename"
echo " Found: $status_line"
else
echo " [NEEDS REVIEW] $basename (no Status field found)"
fi
NEEDS_REVIEW_COUNT=$((NEEDS_REVIEW_COUNT + 1))
NEEDS_REVIEW_DOCS="$NEEDS_REVIEW_DOCS$basename\n"
fi
fi
done
echo ""
# Check if planned folder is now empty and remove it
if [ "$DRY_RUN" = false ] && [ "$MOVED_COUNT" -gt 0 ]; then
remaining=$(find "${PLANNED_DIR}/${FOLDER_VERSION}" -type f 2>/dev/null | wc -l | tr -d ' ')
if [ "$remaining" -eq 0 ]; then
rmdir "${PLANNED_DIR}/${FOLDER_VERSION}" 2>/dev/null || true
echo "Removed empty ${PLANNED_DIR}/${FOLDER_VERSION}/ folder"
echo ""
fi
fi
# Summary
echo "Summary:"
if [ "$DRY_RUN" = true ]; then
[ "$SKIPPED_DUPLICATE" -gt 0 ] && echo " Would delete: $SKIPPED_DUPLICATE duplicate(s)"
[ "$SKIPPED_MISPLACED" -gt 0 ] && echo " Would relocate: $SKIPPED_MISPLACED misplaced doc(s)"
[ "$MOVED_COUNT" -gt 0 ] && echo " Would move: $MOVED_COUNT doc(s) to implemented/"
[ "$NEEDS_REVIEW_COUNT" -gt 0 ] && echo " Needs review: $NEEDS_REVIEW_COUNT doc(s)"
echo ""
echo "Run without --dry-run to apply changes."
else
if [ "$SKIPPED_DUPLICATE" -gt 0 ]; then
echo " ✓ Deleted $SKIPPED_DUPLICATE duplicate(s)"
fi
if [ "$SKIPPED_MISPLACED" -gt 0 ]; then
echo " ✓ Relocated $SKIPPED_MISPLACED doc(s) to correct version folder"
fi
if [ "$MOVED_COUNT" -gt 0 ]; then
echo " ✓ Moved $MOVED_COUNT doc(s) to ${IMPLEMENTED_DIR}/${FOLDER_VERSION}/"
fi
if [ "$NEEDS_REVIEW_COUNT" -gt 0 ]; then
echo " ⚠ $NEEDS_REVIEW_COUNT doc(s) need review (not marked as Implemented)"
echo ""
echo "To fix docs needing review:"
echo " 1. Update the doc's Status field to 'Implemented'"
echo " 2. Re-run this script"
echo " Or use --force to move regardless of status"
fi
fi
TOTAL_CHANGES=$((SKIPPED_DUPLICATE + SKIPPED_MISPLACED + MOVED_COUNT))
if [ "$TOTAL_CHANGES" -gt 0 ] && [ "$DRY_RUN" = false ]; then
echo ""
echo "Next steps:"
echo " git add design_docs/"
echo " git commit -m 'docs: cleanup design docs for ${FOLDER_VERSION}'"
fi
```