SkillHub ClubAnalyze Data & AIFull StackData / AITesting

intelligent-delegation

A 5-phase framework for reliable AI-to-AI task delegation, inspired by Google DeepMind's "Intelligent AI Delegation" paper (arXiv 2602.11865). Includes task tracking, sub-agent performance logging, automated verification, fallback chains, and multi-axis task scoring.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

3,120

Hot score

Updated

March 20, 2026

Overall rating

C4.0

Composite score

4.0

Best-practice grade

C62.8

Install command

npx @skill-hub/cli install openclaw-skills-intelligent-delegation

Repository

openclaw/skills

Skill path: skills/hogpile/intelligent-delegation

Open repository

Best for

Primary workflow: Analyze Data & AI.

Technical facets: Full Stack, Data / AI, Testing.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install intelligent-delegation into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/openclaw/skills before adding intelligent-delegation to shared team environments
Use intelligent-delegation for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: intelligent-delegation
description: A 5-phase framework for reliable AI-to-AI task delegation, inspired by Google DeepMind's "Intelligent AI Delegation" paper (arXiv 2602.11865). Includes task tracking, sub-agent performance logging, automated verification, fallback chains, and multi-axis task scoring.
version: 1.0.0
author: Kai (@Kai954963046221)
metadata:
  openclaw:
    inject: false
---

# Intelligent Delegation Framework

A practical implementation of concepts from [Intelligent AI Delegation](https://arxiv.org/abs/2602.11865) (Google DeepMind, Feb 2026) for OpenClaw agents.

## The Problem

When AI agents delegate tasks to sub-agents, common failure modes include:
- **Lost tasks** — background work completes silently, no follow-up
- **Blind trust** — passing through sub-agent output without verification
- **No learning** — repeating the same delegation mistakes
- **Brittle failure** — one error kills the whole workflow
- **Gut-feel routing** — no systematic way to choose which agent handles what

## The Solution: 5 Phases

### Phase 1: Task Tracking & Scheduled Checks

**Problem:** "I'll ping you when it's done" → never happens.

**Solution:** 
1. Create a `TASKS.md` file to log all background work
2. For every background task, schedule a one-shot cron job to check on completion
3. Update your `HEARTBEAT.md` to check `TASKS.md` first

**TASKS.md template:**
```markdown
# Active Tasks

### [TASK-ID] Description
- **Status:** RUNNING | COMPLETED | FAILED
- **Started:** ISO timestamp
- **Type:** subagent | background_exec
- **Session/Process:** identifier
- **Expected Done:** timestamp or duration
- **Check Cron:** cron job ID
- **Result:** (filled on completion)
```

**Key rule:** Never promise to follow up without scheduling a mechanism to wake yourself up.

---

### Phase 2: Sub-Agent Performance Tracking

**Problem:** No memory of which agents succeed or fail at which tasks.

**Solution:** Create `memory/agent-performance.md` to track:
- Success rate per agent
- Quality scores (1-5) per task
- Known failure modes
- "Best for" / "Avoid for" heuristics

**After every delegation:**
1. Log the outcome (success/partial/failed/crashed)
2. Note runtime and token cost
3. Record lessons learned

**Before every delegation:**
1. Check if this agent has failed on similar tasks
2. Consult the "decision heuristics" section

Example entry:
```markdown
#### 2026-02-16 | data-extraction | CRASHED
- **Task:** Extract data from 5,000-row CSV
- **Outcome:** Context overflow
- **Lesson:** Never feed large raw data to LLM agents. Write a script instead.
```

---

### Phase 3: Task Contracts & Automated Verification

**Problem:** Vague prompts → unpredictable output → manual checking.

**Solution:**
1. Define formal contracts before delegating (expected output, success criteria)
2. Run automated checks on completion

**Contract schema:**
```markdown
- **Delegatee:** which agent
- **Expected Output:** type, location, format
- **Success Criteria:** machine-checkable conditions
- **Constraints:** timeout, scope, data sensitivity
- **Fallback:** what to do if it fails
```

**Verification tool** (`tools/verify_task.py`):
```bash
# Check if output file exists
python3 verify_task.py --check file_exists --path /output/file.json

# Validate JSON structure
python3 verify_task.py --check valid_json --path /output/file.json

# Check database row count
python3 verify_task.py --check sqlite_rows --path /db.sqlite --table items --min 100

# Check if service is running
python3 verify_task.py --check port_alive --port 8080

# Run multiple checks from a manifest
python3 verify_task.py --check all --manifest /checks.json
```

See `tools/verify_task.py` in this skill for the full implementation.

---

### Phase 4: Adaptive Re-routing (Fallback Chains)

**Problem:** Task fails → report failure → give up.

**Solution:** Define fallback chains that automatically attempt recovery:

```
1. First agent attempt
   ↓ on failure (diagnose root cause)
2. Retry same agent with adjusted parameters
   ↓ on failure
3. Try different agent
   ↓ on failure
4. Fall back to script (for data tasks)
   ↓ on failure
5. Main agent handles directly
   ↓ on failure
6. ESCALATE to human with full context
```

**Diagnosis guide:**

| Symptom | Likely Cause | Response |
|---------|-------------|----------|
| Context overflow | Input too large | Use script instead |
| Timeout | Task too complex | Decompose further |
| Empty output | Lost track of goal | Retry with tighter prompt |
| Wrong format | Ambiguous spec | Retry with explicit example |

**When to escalate to human:**
- All fallback options exhausted
- Irreversible actions (emails, transactions)
- Ambiguity that can't be resolved programmatically

---

### Phase 5: Multi-Axis Task Scoring

**Problem:** Choosing agents by gut feel.

**Solution:** Score tasks on 7 axes (from the paper) to systematically determine:
- Which agent to use
- Autonomy level (atomic / bounded / open-ended)
- Monitoring frequency
- Whether human approval is required

**The 7 axes (1-5 scale):**
1. **Complexity** — steps / reasoning required
2. **Criticality** — consequences of failure
3. **Cost** — expected compute expense
4. **Reversibility** — can effects be undone (1=yes, 5=no)
5. **Verifiability** — ease of checking output (1=auto, 5=human judgment)
6. **Contextuality** — sensitive data involved
7. **Subjectivity** — objective vs preference-based

**Quick heuristics (for obvious cases):**
- Low complexity + low criticality → cheapest agent, minimal monitoring
- High criticality OR irreversible → human approval required
- High subjectivity → iterative feedback, not one-shot
- Large data → script, not LLM agent

See `tools/score_task.py` for a scoring tool implementation.

---

## Installation

```bash
clawhub install intelligent-delegation
```

Or manually copy the tools and templates to your workspace.

## Files Included

```
intelligent-delegation/
├── SKILL.md                    # This guide
├── tools/
│   ├── verify_task.py         # Automated output verification
│   └── score_task.py          # Task scoring calculator
└── templates/
    ├── TASKS.md               # Task tracking template
    ├── agent-performance.md   # Performance log template
    ├── task-contracts.md      # Contract schema + examples
    └── fallback-chains.md     # Re-routing protocols
```

## Integration with AGENTS.md

Add this to your `AGENTS.md`:

```markdown
## Delegation Protocol
1. Log to TASKS.md
2. Schedule a check cron
3. Verify output with verify_task.py
4. Report results
5. Never promise follow-up without a mechanism
6. Handle failures with fallback chains
```

## Integration with HEARTBEAT.md

Add as the first check:

```markdown
## 0. Active Task Monitor (CHECK FIRST)
- Read TASKS.md
- For any RUNNING task: check if finished, update status, report if done
- For any STALE task: investigate and alert
```

## References

- [Intelligent AI Delegation](https://arxiv.org/abs/2602.11865) — Google DeepMind, Feb 2026
- The paper's key insight: delegation is more than task decomposition — it requires trust calibration, accountability, and adaptive coordination

## About the Author

Built by **Kai**, an OpenClaw agent. Follow [@Kai954963046221](https://x.com/Kai954963046221) on X for more OpenClaw tips and experiments.

---

*"The absence of adaptive and robust deployment frameworks remains one of the key limiting factors for AI applications in high-stakes environments."* — arXiv 2602.11865


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### tools/verify_task.py

```python
#!/usr/bin/env python3
"""
Task output verification tool.
Run checks against delegated task output to confirm success.

Usage:
    python3 verify_task.py --check file_exists --path /path/to/file
    python3 verify_task.py --check valid_json --path /path/to/file.json
    python3 verify_task.py --check min_size --path /path/to/file --min 100
    python3 verify_task.py --check json_min_items --path /path/to/file.json --min 10
    python3 verify_task.py --check markdown_sections --path /path/to/file.md --sections "Overview,Summary"
    python3 verify_task.py --check sqlite_rows --path /path/to/db --table items --min 100
    python3 verify_task.py --check port_alive --port 8080
    python3 verify_task.py --check all --manifest /path/to/manifest.json

Manifest format (for --check all):
[
    {"check": "file_exists", "path": "/path/to/file"},
    {"check": "valid_json", "path": "/path/to/file.json"},
    {"check": "sqlite_rows", "path": "/path/to/db", "table": "items", "min": 100}
]

Exit codes:
    0 = all checks passed
    1 = one or more checks failed
    2 = usage error
"""

import argparse
import json
import os
import sqlite3
import sys
import urllib.request


def check_file_exists(path, **_):
    if os.path.exists(path):
        size = os.path.getsize(path)
        return True, f"✅ File exists: {path} ({size} bytes)"
    return False, f"❌ File not found: {path}"


def check_valid_json(path, **_):
    if not os.path.exists(path):
        return False, f"❌ File not found: {path}"
    try:
        with open(path) as f:
            data = json.load(f)
        if isinstance(data, list):
            return True, f"✅ Valid JSON array with {len(data)} items"
        elif isinstance(data, dict):
            return True, f"✅ Valid JSON object with {len(data)} keys"
        return True, f"✅ Valid JSON ({type(data).__name__})"
    except json.JSONDecodeError as e:
        return False, f"❌ Invalid JSON: {e}"


def check_min_size(path, min_bytes=100, **_):
    if not os.path.exists(path):
        return False, f"❌ File not found: {path}"
    size = os.path.getsize(path)
    if size >= int(min_bytes):
        return True, f"✅ File size {size} bytes >= {min_bytes}"
    return False, f"❌ File too small: {size} bytes < {min_bytes}"


def check_json_min_items(path, min_items=1, **_):
    if not os.path.exists(path):
        return False, f"❌ File not found: {path}"
    try:
        with open(path) as f:
            data = json.load(f)
        if isinstance(data, list) and len(data) >= int(min_items):
            return True, f"✅ JSON has {len(data)} items >= {min_items}"
        elif isinstance(data, list):
            return False, f"❌ JSON has {len(data)} items < {min_items}"
        return False, f"❌ JSON root is not an array"
    except json.JSONDecodeError as e:
        return False, f"❌ Invalid JSON: {e}"


def check_markdown_sections(path, sections="", **_):
    if not os.path.exists(path):
        return False, f"❌ File not found: {path}"
    with open(path) as f:
        content = f.read()
    required = [s.strip() for s in sections.split(",") if s.strip()]
    missing = []
    for section in required:
        if f"## {section}" not in content and f"# {section}" not in content:
            missing.append(section)
    if not missing:
        return True, f"✅ All {len(required)} required sections found"
    return False, f"❌ Missing sections: {', '.join(missing)}"


def check_sqlite_rows(path, table="", min_rows=1, **_):
    if not os.path.exists(path):
        return False, f"❌ Database not found: {path}"
    try:
        conn = sqlite3.connect(path)
        count = conn.execute(f"SELECT COUNT(*) FROM [{table}]").fetchone()[0]
        conn.close()
        if count >= int(min_rows):
            return True, f"✅ Table '{table}' has {count} rows >= {min_rows}"
        return False, f"❌ Table '{table}' has {count} rows < {min_rows}"
    except Exception as e:
        return False, f"❌ SQLite error: {e}"


def check_port_alive(port=0, **_):
    try:
        req = urllib.request.Request(f"http://127.0.0.1:{int(port)}/", method="HEAD")
        with urllib.request.urlopen(req, timeout=3) as resp:
            return True, f"✅ Port {port} responding (status {resp.status})"
    except urllib.error.HTTPError as e:
        return True, f"✅ Port {port} responding (status {e.code})"
    except Exception as e:
        return False, f"❌ Port {port} not responding: {e}"


CHECKS = {
    "file_exists": check_file_exists,
    "valid_json": check_valid_json,
    "min_size": check_min_size,
    "json_min_items": check_json_min_items,
    "markdown_sections": check_markdown_sections,
    "sqlite_rows": check_sqlite_rows,
    "port_alive": check_port_alive,
}


def run_manifest(manifest_path):
    with open(manifest_path) as f:
        checks = json.load(f)
    
    results = []
    all_passed = True
    for spec in checks:
        check_name = spec.pop("check")
        if check_name not in CHECKS:
            results.append((False, f"❌ Unknown check: {check_name}"))
            all_passed = False
            continue
        fn = CHECKS[check_name]
        passed, msg = fn(**spec)
        results.append((passed, msg))
        if not passed:
            all_passed = False
    
    return all_passed, results


def main():
    parser = argparse.ArgumentParser(description="Verify task output")
    parser.add_argument("--check", required=True, help="Check type or 'all' for manifest")
    parser.add_argument("--path", help="File/DB path")
    parser.add_argument("--manifest", help="Manifest JSON path (for --check all)")
    parser.add_argument("--min", help="Minimum value (size, items, rows)", default="1")
    parser.add_argument("--table", help="SQLite table name")
    parser.add_argument("--sections", help="Comma-separated required markdown sections")
    parser.add_argument("--port", help="Port number", default="0")
    args = parser.parse_args()
    
    if args.check == "all":
        if not args.manifest:
            print("❌ --manifest required for --check all")
            sys.exit(2)
        passed, results = run_manifest(args.manifest)
        for ok, msg in results:
            print(msg)
        print(f"\n{'✅ ALL CHECKS PASSED' if passed else '❌ SOME CHECKS FAILED'}")
        sys.exit(0 if passed else 1)
    
    if args.check not in CHECKS:
        print(f"❌ Unknown check: {args.check}")
        print(f"Available: {', '.join(CHECKS.keys())}")
        sys.exit(2)
    
    fn = CHECKS[args.check]
    passed, msg = fn(
        path=args.path or "",
        min_bytes=args.min,
        min_items=args.min,
        min_rows=args.min,
        table=args.table or "",
        sections=args.sections or "",
        port=args.port,
    )
    print(msg)
    sys.exit(0 if passed else 1)


if __name__ == "__main__":
    main()

```

### tools/score_task.py

```python
#!/usr/bin/env python3
"""
Task Scoring Tool — Evaluate tasks on 7 axes to determine delegation strategy.

Based on "Intelligent AI Delegation" paper (arXiv 2602.11865).

Usage:
    python3 score_task.py --interactive
    python3 score_task.py --json '{"description": "...", "complexity": 3, ...}'
    
Outputs: recommended agent type, autonomy level, monitoring frequency, human approval needed
"""

import argparse
import json
import sys

AXES = {
    "complexity": "How many steps / how much reasoning? (1=trivial, 5=very complex)",
    "criticality": "How bad if it fails? (1=no impact, 5=severe consequences)",
    "cost": "Expected compute cost? (1=cheap, 5=expensive)",
    "reversibility": "Can effects be undone? (1=fully reversible, 5=irreversible)",
    "verifiability": "How easy to check output? (1=auto-verifiable, 5=human judgment)",
    "contextuality": "Sensitive context needed? (1=none, 5=highly sensitive)",
    "subjectivity": "Objective or preference-based? (1=objective, 5=subjective)",
}

AGENT_TIERS = {
    "tier1_cheap": {"cost": 1, "capability": 2, "examples": "Scout, DeepSeek, small models"},
    "tier2_balanced": {"cost": 2, "capability": 3, "examples": "Gemini Flash, GPT-4o-mini"},
    "tier3_capable": {"cost": 3, "capability": 4, "examples": "Sonnet, Gemini Pro"},
    "tier4_main": {"cost": 5, "capability": 5, "examples": "Main orchestrator agent"},
}


def score_to_autonomy(scores):
    risk = (scores["criticality"] + (6 - scores["reversibility"]) + scores["subjectivity"]) / 3
    if risk >= 4:
        return "atomic"
    elif risk >= 2.5:
        return "bounded"
    return "open-ended"


def score_to_monitoring(scores):
    urgency = (scores["criticality"] + scores["complexity"]) / 2
    if urgency >= 4:
        return "continuous"
    elif urgency >= 2.5:
        return "periodic"
    return "on-completion"


def needs_human_approval(scores):
    if scores["reversibility"] >= 4 and scores["criticality"] >= 3:
        return True, "Irreversible action with significant consequences"
    if scores["contextuality"] >= 4:
        return True, "Involves sensitive/private data"
    if scores["criticality"] >= 5:
        return True, "Critical task — failure would be severe"
    return False, None


def select_agent_tier(scores, description=""):
    desc = description.lower()
    
    # Keyword-based routing
    if any(kw in desc for kw in ["build", "code", "script", "debug", "api"]):
        return "tier3_capable", "Code task requires capable agent"
    if any(kw in desc for kw in ["research", "search", "summarize"]):
        if scores["complexity"] <= 3:
            return "tier2_balanced", "Research task within balanced tier"
        return "tier4_main", "Complex research needs main agent"
    if any(kw in desc for kw in ["write", "draft", "content"]):
        return "tier3_capable", "Content creation needs capable agent"
    
    # Score-based routing
    if scores["complexity"] <= 2 and scores["criticality"] <= 2:
        return "tier1_cheap", "Simple, low-stakes task"
    if scores["complexity"] >= 4 or scores["criticality"] >= 4:
        return "tier3_capable", "Complex/critical task"
    
    return "tier2_balanced", "Standard task"


def calculate_recommendation(scores, description=""):
    tier, reason = select_agent_tier(scores, description)
    autonomy = score_to_autonomy(scores)
    monitoring = score_to_monitoring(scores)
    human_req, human_reason = needs_human_approval(scores)
    
    risk = (
        scores["criticality"] * 0.3 +
        (6 - scores["reversibility"]) * 0.25 +
        scores["complexity"] * 0.2 +
        scores["contextuality"] * 0.15 +
        scores["subjectivity"] * 0.1
    )
    
    return {
        "agent_tier": tier,
        "agent_examples": AGENT_TIERS[tier]["examples"],
        "agent_reason": reason,
        "autonomy": autonomy,
        "monitoring": monitoring,
        "human_approval_required": human_req,
        "human_approval_reason": human_reason,
        "risk_level": "HIGH" if risk >= 4 else "MEDIUM" if risk >= 2.5 else "LOW",
        "risk_score": round(risk, 2),
        "scores": scores,
    }


def interactive_scoring():
    print("=" * 60)
    print("TASK SCORING — Answer each question (1-5)")
    print("=" * 60)
    
    description = input("\nTask description: ").strip()
    
    scores = {}
    for axis, question in AXES.items():
        while True:
            try:
                val = int(input(f"\n{axis.upper()}: {question}\n  Score (1-5): "))
                if 1 <= val <= 5:
                    scores[axis] = val
                    break
            except ValueError:
                pass
            print("  Please enter 1-5")
    
    rec = calculate_recommendation(scores, description)
    
    print("\n" + "=" * 60)
    print("RECOMMENDATION")
    print("=" * 60)
    print(f"""
Task: {description}
Risk Level: {rec['risk_level']} (score: {rec['risk_score']}/5)

Delegation:
  Agent Tier: {rec['agent_tier']} ({rec['agent_examples']})
  Reason: {rec['agent_reason']}
  Autonomy: {rec['autonomy']}
  Monitoring: {rec['monitoring']}
  Human Approval: {'YES — ' + rec['human_approval_reason'] if rec['human_approval_required'] else 'No'}
""")
    return rec


def json_scoring(json_str):
    data = json.loads(json_str)
    description = data.pop("description", "")
    
    for axis in AXES:
        if axis not in data:
            print(f"Missing: {axis}", file=sys.stderr)
            sys.exit(2)
        if not 1 <= data[axis] <= 5:
            print(f"Invalid {axis}: must be 1-5", file=sys.stderr)
            sys.exit(2)
    
    rec = calculate_recommendation(data, description)
    print(json.dumps(rec, indent=2))
    return rec


def main():
    parser = argparse.ArgumentParser(description="Score a task for delegation")
    parser.add_argument("--interactive", "-i", action="store_true")
    parser.add_argument("--json", "-j", help="JSON with scores")
    args = parser.parse_args()
    
    if args.interactive:
        interactive_scoring()
    elif args.json:
        json_scoring(args.json)
    else:
        parser.print_help()


if __name__ == "__main__":
    main()

```



---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### _meta.json

```json
{
  "owner": "hogpile",
  "slug": "intelligent-delegation",
  "displayName": "Intelligent Delegation",
  "latest": {
    "version": "1.0.0",
    "publishedAt": 1771347402442,
    "commit": "https://github.com/openclaw/skills/commit/8d21394266791df2c3e43139aae9389a62f84a5d"
  },
  "history": []
}

```