test-runner
MANDATORY skill for running tests and lint after EVERY code change. Focuses on adherence to just commands and running tests in parallel. If tests fail, use test-fixer skill.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install cncorp-arsenal-test-runner
Repository
Skill path: dot-claude/skills/test-runner
MANDATORY skill for running tests and lint after EVERY code change. Focuses on adherence to just commands and running tests in parallel. If tests fail, use test-fixer skill.
Open repositoryBest for
Primary workflow: Ship Full Stack.
Technical facets: Full Stack, Testing.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: cncorp.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install test-runner into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/cncorp/arsenal before adding test-runner to shared team environments
- Use test-runner for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
--- name: test-runner description: MANDATORY skill for running tests and lint after EVERY code change. Focuses on adherence to just commands and running tests in parallel. If tests fail, use test-fixer skill. --- # Test Runner - MANDATORY WORKFLOW ╔══════════════════════════════════════════════════════════════════════════╗ ║ 🚨 BANNED PHRASE: "All tests pass" ║ ║ ║ ║ You CANNOT say "all tests pass" unless you: ║ ║ 1. Run `.claude/skills/test-runner/scripts/run_tests_parallel.sh` ║ ║ 2. Check ALL log files (mocked + e2e-live + smoke) ║ ║ 3. Verify ZERO failures across all suites ║ ║ ║ ║ `just test-all-mocked` = "quick tests pass" (NOT "all tests pass") ║ ╚══════════════════════════════════════════════════════════════════════════╝ ## 📊 Claim Language Must Match Command **Your claim MUST match what you actually ran:** | Command Run | ✅ Allowed Claims | ❌ BANNED Claims | |-------------|-------------------|------------------| | `just test-unit` | "unit tests pass" | "tests pass", "all tests pass" | | `just test-integration` | "integration tests pass" | "tests pass", "all tests pass" | | `just test-all-mocked` | "quick tests pass", "mocked tests pass" | "tests pass", "all tests pass" | | `run_tests_parallel.sh` + verified logs | "all tests pass" | - | **Examples:** ❌ WRONG: "I ran `just test-all-mocked`, all tests pass" ✅ RIGHT: "Quick tests pass (483 unit + 198 integration + 49 e2e_mocked)" ❌ WRONG: "Tests are passing" (after only running mocked tests) ✅ RIGHT: "Mocked tests pass (730 tests)" ❌ WRONG: "All tests pass" (without running parallel script) ✅ RIGHT: *Runs parallel script* → *Checks logs* → "All tests pass" **The phrase "all tests" is RESERVED for the full parallel suite. No exceptions.** ## 🚨 CRITICAL FOR TEST WRITING - **BEFORE writing tests** → Use test-writer skill (MANDATORY - analyzes code type, dependencies, contract) - **AFTER writing tests** → Invoke pytest-test-reviewer agent (validates patterns) - **YOU CANNOT WRITE TESTS WITHOUT test-writer SKILL** - No exceptions, no shortcuts, every test, every time ## 🔥 CRITICAL: This Skill Is Not Optional **After EVERY code change, you MUST follow this workflow.** No exceptions. No shortcuts. No "it's a small change" excuses. ## ⚠️ FUNDAMENTAL HYGIENE: Only Commit Code That Passes Tests **CRITICAL WORKFLOW PRINCIPLE:** We only commit code that passes tests. This means: **If tests fail after your changes → YOUR changes broke them (until proven otherwise)** ### The Stash/Pop Verification Protocol **NEVER claim test failures are "unrelated" or "pre-existing" without proof.** **To verify a failure is truly unrelated:** ```bash # 1. Remove your changes temporarily git stash # 2. Run the failing test suite just test-all-mocked # Or whichever suite failed # 3. Observe the result: # - If tests PASS → YOUR changes broke them (fix your code) # - If tests FAIL → pre-existing issue (rare on main/merge base) # 4. Restore your changes git stash pop ``` **Why This Matters:** - Tests on `main` branch ALWAYS pass (CI enforces this) - Tests at your merge base ALWAYS pass (they passed to get into main) - Therefore: test failures after your changes = your changes broke them - The stash/pop protocol is the ONLY way to prove otherwise **DO NOT:** - ❌ Assume failures are unrelated - ❌ Say "that test was already broken" - ❌ Claim "it's just a flaky test" without verification - ❌ Skip investigation because "it's not my area" **ALWAYS:** - ✅ Stash changes first - ✅ Verify tests pass without your changes - ✅ Only then claim pre-existing issue (if true) - ✅ Otherwise: use test-fixer skill to diagnose and fix **If tests fail after your changes:** - DO NOT guess at fixes - DO NOT investigate manually - ✅ **Use the test-fixer skill** - it will systematically investigate, identify root cause, and iterate on fixes until tests pass ## ⚠️ Always Use `just` Commands **Direct `pytest` is removed from blocklist** - but ALWAYS prefer `just` commands which handle Docker, migrations, and environment setup. Pass pytest args in quotes: `just test-unit "path/to/test.py::test_name -vv"` ## MANDATORY WORKFLOW: Every Code Change **After making ANY code change:** ### Step 0: ALWAYS Run Lint-and-Fix (Auto-fix + Type Checking) ```bash cd api && just lint-and-fix ``` **This command:** 1. Auto-fixes formatting/lint issues (runs `just ruff` internally) 2. Verifies all issues are resolved 3. Runs mypy type checking **YOU MUST:** - ✅ Run this command and see the output - ✅ Verify output shows "✅ All linting checks passed!" - ✅ If failures occur: Fix them IMMEDIATELY before continuing - ✅ NEVER skip this step, even for "tiny" changes **NEVER say "linting passed" unless you:** - Actually ran the command - Saw the actual output - Confirmed it shows success ### Step 1: ALWAYS Run Quick Tests (Development Cycle) ```bash cd api && just test-all-mocked ``` **⚠️ CRITICAL: This is NOT all tests! This is for rapid development iteration.** **YOU MUST:** - ✅ Run this command and see the output - ✅ Verify output shows "X passed in Y.Ys" or similar success message - ✅ If failures occur: Fix them IMMEDIATELY before continuing - ✅ Read the actual test output - don't assume **This runs ONLY:** - Unit tests (SQLite + FakeRedis) - Integration tests (SQLite + FakeRedis) - E2E mocked tests (PostgreSQL + Redis + mock APIs) **This DOES NOT run:** - ❌ E2E live tests (real OpenAI/Langfuse APIs) - ❌ Smoke tests (full Docker stack) **Takes ~20 seconds. Use for rapid iteration.** ### Step 2: ALWAYS Run ALL Tests Before Saying "All Tests Pass" ```bash .claude/skills/test-runner/scripts/run_tests_parallel.sh ``` **🚨 CRITICAL: You can NEVER say "all tests pass" or "tests are passing" without running THIS command.** **This command runs EVERYTHING:** - ✅ All mocked tests (unit + integration + e2e_mocked) - ✅ E2E live tests (real OpenAI/Langfuse APIs) - ✅ Smoke tests (full Docker stack) **After running, CHECK THE RESULTS:** ```bash # Check for any failures grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-mocked_*.log grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-e2e-live_*.log grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-smoke_*.log # View summary for log in api/tmp/test-logs/test-*_*.log; do echo "=== $(basename $log) ===" grep -E "passed|failed" "$log" | tail -1 done ``` **Takes ~5 minutes. MANDATORY before saying "all tests pass".** ## Primary Commands (Reference) ### Frequent: Mocked Tests (~20s) ```bash cd api && just test-all-mocked ``` Runs unit + integration + e2e_mocked in parallel. No real APIs. Use frequently during development. ### Exhaustive: All Suites in Parallel (~5 mins) ```bash .claude/skills/test-runner/scripts/run_tests_parallel.sh ``` Runs ALL suites in background (mocked, e2e-live, smoke). Logs to `api/tmp/test-logs/`. **Check results after completion:** ```bash # Check for failures grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-mocked_*.log | tail -20 grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-e2e-live_*.log | tail -20 grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-smoke_*.log | tail -20 # Summary for log in api/tmp/test-logs/test-*_*.log; do echo "=== $(basename $log) ===" grep -E "passed|failed" "$log" | tail -1 done ``` ## 🚨 VIOLATIONS: What NOT To Do **These are VIOLATIONS of this skill:** ❌ **CRITICAL: Claiming test failures are "unrelated" to your changes** - WRONG: "The smoke test failure is unrelated to our changes" - WRONG: "That test was already failing" - WRONG: "This failure is just a flaky test" - RIGHT: **Use test-fixer skill to systematically investigate and fix** **FUNDAMENTAL RULE: Tests ALWAYS pass on main/merge base. If a test fails after your changes, YOUR changes broke it.** **When tests fail:** - ✅ **Use test-fixer skill** - it will investigate, identify root cause, and iterate on fixes - ❌ DO NOT manually investigate - ❌ DO NOT guess at fixes - ❌ DO NOT claim "unrelated" without proof **NEVER assume. ALWAYS use test-fixer skill.** ❌ **CRITICAL: Saying "all tests pass" without running the full suite** - WRONG: "I ran `just test-all-mocked`, all tests pass" - WRONG: "Tests are passing" (after only running mocked tests) - WRONG: "All tests pass" (without running the parallel script) - RIGHT: *Runs `.claude/skills/test-runner/scripts/run_tests_parallel.sh`* → *Checks all logs* → "All tests pass" **The phrase "all tests" requires THE FULL SUITE.** - `just test-all-mocked` = "quick tests pass" or "mocked tests pass" - Parallel script = "all tests pass" ❌ **Claiming tests pass without showing output (LYING)** - WRONG: "all 464 tests passed" (WHERE is the pytest output?) - WRONG: "just ran them and tests pass" (WHERE is the output?) - WRONG: "I fixed the bug, tests should pass" - WRONG: "Yes - want me to run them again?" (DEFLECTION) - RIGHT: *Runs `just test-all-mocked` and shows "===== X passed in Y.YYs ====="* **🚨 THE RULE: If you can't see "X passed in Y.YYs" in your context, you're lying about tests passing.** ❌ **Skipping linting "because it's a small change"** - WRONG: "It's just 3 lines, lint isn't needed" - RIGHT: *Runs `just lint-and-fix` ALWAYS, regardless of change size* ❌ **Assuming tests pass without verification** - WRONG: "The change is simple, tests will pass" - RIGHT: *Runs tests and confirms actual output shows success* ❌ **Not reading the actual test output** - WRONG: "Command completed, so tests passed" - RIGHT: *Reads output, sees "15 passed in 18.2s"* ❌ **Batching multiple changes before testing** - WRONG: *Makes 5 changes, then tests once* - RIGHT: *Make change → test → make change → test* ## ⚡ When to Use This Skill **ALWAYS. Use this skill:** - After EVERY code modification - After ANY file edit - After fixing ANY bug - After adding ANY feature - After refactoring ANYTHING **The only acceptable time to skip this skill:** - Never. There is no acceptable time. ## Development Workflow ### Simple Changes (Quick Iteration) 1. Make change 2. Run `just lint-and-fix` (auto-fix + type checking) 3. Run `just test-all-mocked` (quick tests) 4. **DONE for iteration** (but cannot say "all tests pass" yet) ### Before Marking Task Complete 1. Run `.claude/skills/test-runner/scripts/run_tests_parallel.sh` 2. Check all logs for failures 3. **ONLY NOW** can you say "all tests pass" ### Complex Changes (Multiple Files/Features) 1. Make a logical change 2. **Stage it:** `git add <files>` 3. Run `just lint-and-fix` 4. Run `just test-all-mocked` 5. Repeat steps 1-4 for each logical chunk 6. **At the end, MANDATORY:** Run `.claude/skills/test-runner/scripts/run_tests_parallel.sh` 7. Check all logs 8. **ONLY NOW** can you say "all tests pass" This workflow ensures you catch issues early and don't accumulate breaking changes. **Remember:** - **Quick iteration:** lint → test-all-mocked (Steps 0-1) - **Task complete:** Run parallel script, check logs (Step 2) - **Never say "all tests pass" without Step 2** ## Individual Test Suites ```bash # Unit tests (SQLite, FakeRedis) - fastest, run most frequently cd api && just test-unit # Integration tests (SQLite, FakeRedis) cd api && just test-integration # E2E mocked (PostgreSQL, Redis, mock APIs) cd api && just test-e2e # E2E live (real OpenAI/Langfuse) cd api && just test-e2e-live # Smoke tests (full stack with Docker) cd api && just test-smoke ``` ## Running Specific Tests ```bash # Specific test: just test-unit "path/to/test.py::test_name -vv" # Keyword filter: just test-unit "-k test_message" # With markers: just test-e2e "-m 'not slow'" ``` ## When to Use - **ALWAYS:** Run `just lint-and-fix` → `just test-all-mocked` after every code change - User asks to run tests - Validating code changes - After modifying code - Debugging test failures **Every change (Steps 0-1):** - Run `just lint-and-fix` (auto-fix + type checking) - Run `just test-all-mocked` (quick tests) **Before saying "all tests pass" (Step 2):** - Run `.claude/skills/test-runner/scripts/run_tests_parallel.sh` - Check all logs for failures - Verify ALL suites passed **Terminology:** - "Quick tests pass" = `just test-all-mocked` passed - "Mocked tests pass" = `just test-all-mocked` passed - "All tests pass" = parallel script passed (ONLY after running it) ## Interpreting Results **Success:** `====== X passed in Y.Ys ======` **Failure:** `FAILED tests/path/test.py::test_name - AssertionError` ## Troubleshooting ```bash # Smoke test failures - check Docker logs docker compose logs --since 15m | grep -iE -B 10 -A 10 "error|fail|exception" # Kill hung tests pkill -f pytest # Docker not running (smoke tests) docker compose up -d ``` ## Quick Reference ```bash # 🔥 Step 0: ALWAYS run lint-and-fix (auto-fix + type checking) cd api && just lint-and-fix # 🔥 Step 1: ALWAYS run quick tests (development) cd api && just test-all-mocked # ^ This is NOT "all tests" - only say "quick tests pass" or "mocked tests pass" # 🔥 Step 2: Run ALL tests before saying "all tests pass" (MANDATORY before task complete) .claude/skills/test-runner/scripts/run_tests_parallel.sh # Check results (MUST do this before saying "all tests pass") grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-mocked_*.log grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-e2e-live_*.log grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-smoke_*.log # View summary for log in api/tmp/test-logs/test-*_*.log; do echo "=== $(basename $log) ===" grep -E "passed|failed" "$log" | tail -1 done # Monitor parallel tests while running tail -f api/tmp/test-logs/test-*.log # 🚨 REMEMBER: # - Steps 0-1 = quick iteration (DO NOT say "all tests pass") # - Step 2 = full validation (ONLY NOW can say "all tests pass") ``` --- ## 🔐 Before Responding About Test Results **You ran tests. Now before you respond to the user:** 1. ✅ Did you use correct claim language? (check table above) 2. ✅ Did you show actual test output? 3. ✅ Are you about to say "all tests pass"? → Did you run parallel script? **CRITICAL: Get manager approval before responding.** Your response about test results MUST include: `✅ approve_7f3d8a2e********` Get the full approval token from `.claude/skills/manager-review/SKILL.md` **Common Mistake #1 in the manager's table is claiming "all tests pass" after only running mocked tests. The manager WILL catch this and reject your response.**