SkillHub ClubRun DevOpsDevOpsFull Stack

recover-from-error

Provides structured recovery procedures for Claude agent execution errors, focusing on build failures, agent crashes, and session interruptions. It offers concrete bash scripts to diagnose issues and decision trees for appropriate actions.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

Hot score

Updated

March 20, 2026

Overall rating

A7.5

Composite score

5.2

Best-practice grade

B77.6

Install command

npx @skill-hub/cli install cowwoc-styler-recover-from-error

error-recoveryagent-workflowbuild-troubleshootingsession-management

Repository

cowwoc/styler

Skill path: .claude/skills/recover-from-error

Open repository

Best for

Primary workflow: Run DevOps.

Technical facets: DevOps, Full Stack.

Target audience: Developers using Claude agents for automated task execution who need error recovery procedures.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: cowwoc.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install recover-from-error into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/cowwoc/styler before adding recover-from-error to shared team environments
Use recover-from-error for devops workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: recover-from-error
description: Diagnose task protocol errors and provide targeted recovery procedures
allowed-tools: Bash, Read, Write
---

# Recovery from Error Skill

**Purpose**: Diagnose errors during task protocol execution and provide step-by-step recovery procedures.

**Performance**: Reduces recovery time from 15+ minutes to 2-5 minutes through targeted diagnosis.

## When to Use This Skill

### Use recover-from-error When:

- Build failure occurs during VALIDATION
- Agent fails or times out during REQUIREMENTS/IMPLEMENTATION
- Session interrupted and needs to resume
- State corruption detected (lock file issues, worktree problems)
- Cannot transition to next state (prerequisites not met)

### Do NOT Use When:

- Normal protocol execution (no errors)
- User requests different approach (not an error)
- Task completed successfully

---

## Error Classification Decision Tree

**Step 1: Identify Error Category**

```
What type of error occurred?
    │
    ├─ Build/Test Failure ────────────► Section A: Build Recovery
    │
    ├─ Agent Incomplete ──────────────► Section B: Agent Recovery
    │
    ├─ Session Interrupted ───────────► Section C: Session Recovery
    │
    ├─ State Mismatch ────────────────► Section D: State Recovery
    │
    └─ Lock/Worktree Issue ───────────► Section E: Infrastructure Recovery
```

---

## Section A: Build Failure Recovery {#build-recovery}

### A1: Diagnose Build Failure Type

```bash
# Run build and capture output
./mvnw verify 2>&1 | tee /tmp/build-output.log

# Classify failure type
if grep -q "COMPILATION ERROR" /tmp/build-output.log; then
    echo "TYPE: Compilation Error → See A2"
elif grep -q "There are test failures" /tmp/build-output.log; then
    echo "TYPE: Test Failure → See A3"
elif grep -q "Checkstyle violations" /tmp/build-output.log; then
    echo "TYPE: Style Violation → See A4"
elif grep -q "PMD Failure" /tmp/build-output.log; then
    echo "TYPE: PMD Violation → See A4"
else
    echo "TYPE: Unknown → See A5"
fi
```

### A2: Compilation Error Recovery

**Quick Fixes (Main Agent Can Fix):**
- Missing imports: Add import statements
- Symbol not found (typo): Fix identifier names
- Module not exported: Update module-info.java

**Delegation Required (Re-invoke Agent):**
- Missing method implementations
- Type mismatches requiring design changes
- Interface contract violations

```bash
# Extract compilation errors
grep -A5 "COMPILATION ERROR" /tmp/build-output.log

# If simple fix (import/typo): Fix directly
# If complex (design issue): Re-invoke architect agent
```

### A3: Test Failure Recovery

```bash
# List failing tests
grep -E "Tests run:.*Failures: [1-9]" /tmp/build-output.log

# Get failure details
grep -B5 "FAILURE!" /tmp/build-output.log

# Decision:
# - Test logic wrong → Re-invoke tester agent
# - Implementation wrong → Re-invoke architect agent
# - Both → Re-invoke both agents
```

### A4: Style Violation Recovery

```bash
# Count violations
CHECKSTYLE_COUNT=$(grep -c "Checkstyle violation" /tmp/build-output.log || echo 0)
PMD_COUNT=$(grep -c "PMD violation" /tmp/build-output.log || echo 0)

echo "Checkstyle: $CHECKSTYLE_COUNT, PMD: $PMD_COUNT"

# Decision tree:
if [ $((CHECKSTYLE_COUNT + PMD_COUNT)) -le 5 ]; then
    echo "Small count - Main agent can fix directly"
else
    echo "Many violations - Re-invoke formatter agent"
fi
```

### A5: Unknown Build Failure

```bash
# Check for common issues
echo "=== Checking common causes ==="

# 1. Module resolution
grep -i "module" /tmp/build-output.log | grep -i "error"

# 2. Dependency issues
grep -i "dependency" /tmp/build-output.log | grep -i "error"

# 3. Resource issues
grep -i "resource" /tmp/build-output.log | grep -i "not found"

# If still unclear: Present full log to user
echo "Unclassified failure - manual investigation required"
cat /tmp/build-output.log | head -100
```

---

## Section B: Agent Recovery {#agent-recovery}

### B1: Diagnose Agent Status

```bash
TASK_NAME="${TASK_NAME}"  # Set task name
AGENTS=$(jq -r '.required_agents[]' "/workspace/tasks/$TASK_NAME/task.json")

echo "=== Agent Status Check ==="
for agent in $AGENTS; do
    STATUS_FILE="/workspace/tasks/$TASK_NAME/agents/$agent/status.json"
    if [ -f "$STATUS_FILE" ]; then
        STATUS=$(jq -r '.status' "$STATUS_FILE")
        DECISION=$(jq -r '.decision // "N/A"' "$STATUS_FILE")
        echo "$agent: status=$STATUS, decision=$DECISION"
    else
        echo "$agent: NO STATUS FILE (not invoked or failed early)"
    fi
done
```

### B2: Agent Missing Status File

**Cause**: Agent never started or crashed before creating status.json

**Recovery**:
```bash
# 1. Check if agent worktree exists
ls -la /workspace/tasks/$TASK_NAME/agents/$agent/code

# 2. If worktree exists but no status: Agent crashed mid-execution
#    → Re-invoke agent with same requirements

# 3. If no worktree: Agent was never invoked
#    → Create worktree and invoke agent
git worktree add /workspace/tasks/$TASK_NAME/agents/$agent/code -b $TASK_NAME-$agent
```

### B3: Agent Status = ERROR

```bash
agent="$AGENT_NAME"
ERROR_MSG=$(jq -r '.error_message // "Unknown"' "/workspace/tasks/$TASK_NAME/agents/$agent/status.json")
echo "Agent $agent ERROR: $ERROR_MSG"

# Common errors and fixes:
# - "Merge conflict" → Resolve conflicts in agent worktree, retry merge
# - "Build failed" → Fix build in agent worktree, retry
# - "Tool error" → Re-invoke with adjusted scope
# - "Timeout" → Check if work partial, resume or restart
```

### B4: Agent Status = WORKING (Stuck)

```bash
agent="$AGENT_NAME"
LAST_UPDATE=$(jq -r '.updated_at' "/workspace/tasks/$TASK_NAME/agents/$agent/status.json")
AGE_MINUTES=$(( ($(date +%s) - $(date -d "$LAST_UPDATE" +%s)) / 60 ))

if [ $AGE_MINUTES -gt 60 ]; then
    echo "Agent $agent stuck for $AGE_MINUTES minutes"

    # Check retry count
    RETRIES=$(jq -r '.retry_count // 0' "/workspace/tasks/$TASK_NAME/agents/$agent/status.json")
    if [ $RETRIES -ge 3 ]; then
        echo "Max retries exceeded - escalate to user"
    else
        echo "Re-invoking agent (retry $((RETRIES + 1))/3)"
        # Re-invoke via Task tool
    fi
fi
```

---

## Section C: Session Recovery {#session-recovery}

### C1: Detect Current Session State

```bash
SESSION_ID="${SESSION_ID}"
TASK_DIR=$(find /workspace/tasks -maxdepth 2 -name "task.json" -exec grep -l "$SESSION_ID" {} \; | head -1)

if [ -n "$TASK_DIR" ]; then
    TASK_NAME=$(dirname "$TASK_DIR" | xargs basename)
    CURRENT_STATE=$(jq -r '.state' "$TASK_DIR")
    echo "Active task: $TASK_NAME (State: $CURRENT_STATE)"
else
    echo "No active task for this session"
fi
```

### C2: Resume From State

Based on current state, resume appropriately:

| State | Resume Action |
|-------|---------------|
| INIT/CLASSIFIED | Safe to restart from beginning |
| REQUIREMENTS | Check agent reports, re-invoke incomplete agents |
| SYNTHESIS | Check if plan exists, present to user if complete |
| IMPLEMENTATION | Check agent status, resume incomplete agents |
| VALIDATION | Re-run build verification |
| AWAITING_USER_APPROVAL | Re-present changes for approval |
| COMPLETE | Proceed to CLEANUP |

```bash
case "$CURRENT_STATE" in
    "SYNTHESIS")
        # Check for approval flag
        if [ -f "/workspace/tasks/$TASK_NAME/user-approved-synthesis.flag" ]; then
            echo "Plan approved - proceed to IMPLEMENTATION"
        else
            echo "Re-present plan to user for approval"
        fi
        ;;
    "AWAITING_USER_APPROVAL")
        COMMIT=$(jq -r '.checkpoint.commit_sha' "/workspace/tasks/$TASK_NAME/task.json")
        echo "Re-present changes (commit $COMMIT) for user approval"
        ;;
    *)
        echo "Resume from $CURRENT_STATE state"
        ;;
esac
```

---

## Section D: State Recovery {#state-recovery}

### D1: State Mismatch Detection

```bash
TASK_NAME="${TASK_NAME}"
LOCK_STATE=$(jq -r '.state' "/workspace/tasks/$TASK_NAME/task.json")

# Check what work actually exists
HAS_REQUIREMENTS=$(ls /workspace/tasks/$TASK_NAME/*-requirements.md 2>/dev/null | wc -l)
HAS_APPROVAL_FLAG=$(test -f "/workspace/tasks/$TASK_NAME/user-approved-synthesis.flag" && echo 1 || echo 0)
HAS_IMPL_COMMITS=$(git -C /workspace/tasks/$TASK_NAME/code log --oneline -10 | grep -c "\[.*\]" || echo 0)

echo "Lock state: $LOCK_STATE"
echo "Requirements reports: $HAS_REQUIREMENTS"
echo "Approval flag: $HAS_APPROVAL_FLAG"
echo "Implementation commits: $HAS_IMPL_COMMITS"

# Detect mismatches
if [ "$LOCK_STATE" = "INIT" ] && [ $HAS_REQUIREMENTS -gt 0 ]; then
    echo "MISMATCH: Lock says INIT but requirements exist"
fi
```

### D2: Fix State Mismatch

```bash
# Update lock state to match actual work
CORRECT_STATE="REQUIREMENTS"  # Determine correct state from evidence

jq --arg state "$CORRECT_STATE" \
   --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
   '.state = $state | .transition_log += [{"type": "recovery", "to": $state, "timestamp": $ts}]' \
   /workspace/tasks/$TASK_NAME/task.json > /tmp/task.json.tmp

mv /tmp/task.json.tmp /workspace/tasks/$TASK_NAME/task.json
echo "State corrected to: $CORRECT_STATE"
```

---

## Section E: Infrastructure Recovery {#infrastructure-recovery}

### E1: Lock File Issues

```bash
TASK_NAME="${TASK_NAME}"
LOCK_FILE="/workspace/tasks/$TASK_NAME/task.json"

# Check if lock file exists and is valid
if [ ! -f "$LOCK_FILE" ]; then
    echo "Lock file missing"
    # If worktree exists: May be crashed session
    # Ask user whether to clean up or recover
elif ! jq empty "$LOCK_FILE" 2>/dev/null; then
    echo "Lock file corrupted (invalid JSON)"
    # Backup and recreate
    cp "$LOCK_FILE" "${LOCK_FILE}.backup"
    echo "Backed up to ${LOCK_FILE}.backup"
fi
```

### E2: Worktree Issues

```bash
TASK_NAME="${TASK_NAME}"

# Check task worktree
if [ ! -d "/workspace/tasks/$TASK_NAME/code" ]; then
    echo "Task worktree missing - recreating"
    git worktree add /workspace/tasks/$TASK_NAME/code -b $TASK_NAME
fi

# Check agent worktrees
for agent in architect formatter tester; do
    if [ ! -d "/workspace/tasks/$TASK_NAME/agents/$agent/code" ]; then
        echo "Agent worktree missing: $agent - recreating"
        mkdir -p /workspace/tasks/$TASK_NAME/agents/$agent
        git worktree add /workspace/tasks/$TASK_NAME/agents/$agent/code -b $TASK_NAME-$agent
    fi
done
```

---

## Quick Reference: Error → Recovery

| Error | Section | Quick Action |
|-------|---------|--------------|
| Compilation error | A2 | Fix imports/typos or re-invoke architect |
| Test failure | A3 | Re-invoke tester or architect |
| Style violations | A4 | Fix directly (<5) or re-invoke formatter (>5) |
| Agent missing status | B2 | Re-invoke agent |
| Agent ERROR | B3 | Check error, fix, re-invoke |
| Agent stuck | B4 | Re-invoke (max 3 retries) |
| Session interrupted | C | Resume from detected state |
| State mismatch | D | Correct lock file state |
| Lock corrupted | E1 | Backup and recreate |
| Worktree missing | E2 | Recreate worktree |

---

## Related Skills

- **state-transition**: Safe state transitions with validation
- **reinvoke-agent-fixes**: Re-invoke agents for specific fixes
- **validate-style-complete**: Three-component style validation
- **build-test-report**: Run build and capture results