model-council
Multi-model consensus system β send a query to 3+ different LLMs via OpenRouter simultaneously, then a judge model evaluates all responses and produces a winner, reasoning, and synthesized best answer. Like having a board of AI advisors. Use for important decisions, code review, research verification.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install openclaw-skills-model-council
Repository
Skill path: skills/aiwithabidi/model-council
Multi-model consensus system β send a query to 3+ different LLMs via OpenRouter simultaneously, then a judge model evaluates all responses and produces a winner, reasoning, and synthesized best answer. Like having a board of AI advisors. Use for important decisions, code review, research verification.
Open repositoryBest for
Primary workflow: Research & Ops.
Technical facets: Full Stack, Data / AI, Testing.
Target audience: everyone.
License: MIT.
Original source
Catalog source: SkillHub Club.
Repository owner: openclaw.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install model-council into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/openclaw/skills before adding model-council to shared team environments
- Use model-council for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: model-council
description: Multi-model consensus system β send a query to 3+ different LLMs via OpenRouter simultaneously, then a judge model evaluates all responses and produces a winner, reasoning, and synthesized best answer. Like having a board of AI advisors. Use for important decisions, code review, research verification.
homepage: https://www.agxntsix.ai
license: MIT
compatibility: Python 3.10+, OpenRouter API key
metadata: {"openclaw": {"emoji": "\ud83c\udfdb\ufe0f", "requires": {"env": ["OPENROUTER_API_KEY"]}, "primaryEnv": "OPENROUTER_API_KEY", "homepage": "https://www.agxntsix.ai"}}
---
# Model Council ποΈ
**Get consensus from multiple AI models on any question.**
Send your query to 3+ different LLMs simultaneously via OpenRouter. A judge model evaluates all responses and produces a winner, reasoning, and synthesized best answer.
## When to Use
- **Important decisions** β Don't trust one model's opinion
- **Code review** β Get multiple perspectives on architecture choices
- **Research verification** β Cross-check facts across models
- **Creative work** β Compare writing styles and pick the best
- **Debugging** β When one model is stuck, others might see the issue
## How It Works
```
Your Question
ββββ Claude Sonnet 4 βββ Response A
ββββ GPT-4o βββ Response B
ββββ Gemini 2.0 Flash βββ Response C
β
Judge (Opus) evaluates all
β
βββ Winner + Reasoning
βββ Synthesized Best Answer
βββ Cost Breakdown
```
## Quick Start
```bash
# Basic usage
python3 {baseDir}/scripts/model_council.py "What's the best database for a real-time analytics dashboard?"
# Custom models
python3 {baseDir}/scripts/model_council.py --models "anthropic/claude-sonnet-4,openai/gpt-4o,google/gemini-2.5-pro" "Your question"
# Custom judge
python3 {baseDir}/scripts/model_council.py --judge "openai/gpt-4o" "Your question"
# JSON output
python3 {baseDir}/scripts/model_council.py --json "Your question"
# Set max tokens per response
python3 {baseDir}/scripts/model_council.py --max-tokens 2000 "Your question"
```
## Configuration
| Flag | Default | Description |
|------|---------|-------------|
| `--models` | claude-sonnet-4, gpt-4o, gemini-2.0-flash | Comma-separated model list |
| `--judge` | anthropic/claude-opus-4-6 | Judge model |
| `--max-tokens` | 1024 | Max tokens per council member |
| `--json` | false | Output as JSON |
| `--timeout` | 60 | Timeout per model (seconds) |
## Environment
Requires `OPENROUTER_API_KEY` environment variable.
## Output Example
```
βββ MODEL COUNCIL RESULTS βββ
Question: What's the best way to handle auth in a microservices architecture?
ββ Council Member Responses ββ
π€ anthropic/claude-sonnet-4 ($0.0043)
Use a centralized auth service with JWT tokens...
π€ openai/gpt-4o ($0.0038)
Implement OAuth 2.0 with an API gateway...
π€ google/gemini-2.0-flash-001 ($0.0012)
Consider using service mesh with mTLS...
ββ Judge Verdict (anthropic/claude-opus-4-6, $0.0125) ββ
π Winner: anthropic/claude-sonnet-4
Reasoning: Most comprehensive and practical approach...
π Synthesized Answer:
The best approach combines elements from all three...
π° Total Cost: $0.0218
```
## Credits
Built by [M. Abidi](https://www.linkedin.com/in/mohammad-ali-abidi) | [agxntsix.ai](https://www.agxntsix.ai)
[YouTube](https://youtube.com/@aiwithabidi) | [GitHub](https://github.com/aiwithabidi)
Part of the **AgxntSix Skill Suite** for OpenClaw agents.
π
**Need help setting up OpenClaw for your business?** [Book a free consultation](https://cal.com/agxntsix/abidi-openclaw)
---
## Skill Companion Files
> Additional files collected from the skill directory layout.
### _meta.json
```json
{
"owner": "aiwithabidi",
"slug": "model-council",
"displayName": "Model Council",
"latest": {
"version": "1.0.0",
"publishedAt": 1772736754164,
"commit": "https://github.com/openclaw/skills/commit/f0abdf286c7b5b6cead3fd97c8d6a40155a1bd9b"
},
"history": []
}
```
### scripts/model_council.py
```python
#!/usr/bin/env python3
"""Model Council β Multi-model consensus system via OpenRouter."""
import argparse
import json
import os
import sys
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
OPENROUTER_URL = "https://openrouter.ai/api/v1/chat/completions"
DEFAULT_MODELS = [
"anthropic/claude-sonnet-4",
"openai/gpt-4o",
"google/gemini-2.0-flash-001",
]
DEFAULT_JUDGE = "anthropic/claude-opus-4-6"
def get_api_key():
key = os.environ.get("OPENROUTER_API_KEY")
if not key:
print("ERROR: OPENROUTER_API_KEY environment variable not set.", file=sys.stderr)
sys.exit(1)
return key
def call_model(api_key, model, prompt, max_tokens=1024, timeout=60):
"""Call a single model via OpenRouter. Returns dict with response, cost, timing."""
start = time.time()
payload = json.dumps({
"model": model,
"messages": [{"role": "user", "content": prompt}],
"max_tokens": max_tokens,
"temperature": 0.7,
}).encode()
req = Request(
OPENROUTER_URL,
data=payload,
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"HTTP-Referer": "https://agxntsix.ai",
"X-Title": "Model Council",
},
)
try:
with urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
except HTTPError as e:
body = e.read().decode() if e.fp else ""
return {
"model": model,
"response": None,
"error": f"HTTP {e.code}: {body[:200]}",
"cost": 0,
"duration": time.time() - start,
}
except (URLError, TimeoutError) as e:
return {
"model": model,
"response": None,
"error": str(e),
"cost": 0,
"duration": time.time() - start,
}
content = ""
if data.get("choices"):
content = data["choices"][0].get("message", {}).get("content", "")
usage = data.get("usage", {})
# OpenRouter returns cost in the response headers or usage
# Estimate from token counts and known pricing
prompt_tokens = usage.get("prompt_tokens", 0)
completion_tokens = usage.get("completion_tokens", 0)
total_cost = data.get("usage", {}).get("cost", 0)
# If cost not in response, try to get from generation stats
if not total_cost and "id" in data:
try:
gen_req = Request(
f"https://openrouter.ai/api/v1/generation?id={data['id']}",
headers={"Authorization": f"Bearer {api_key}"},
)
with urlopen(gen_req, timeout=10) as gen_resp:
gen_data = json.loads(gen_resp.read().decode())
total_cost = gen_data.get("data", {}).get("total_cost", 0)
except Exception:
total_cost = 0
return {
"model": model,
"response": content,
"error": None,
"cost": total_cost or 0,
"duration": time.time() - start,
"prompt_tokens": prompt_tokens,
"completion_tokens": completion_tokens,
}
def judge_responses(api_key, judge_model, question, responses, max_tokens=2048, timeout=90):
"""Have the judge model evaluate all council responses."""
resp_text = ""
for i, r in enumerate(responses, 1):
if r["error"]:
resp_text += f"\n--- Response {i} ({r['model']}) ---\n[ERROR: {r['error']}]\n"
else:
resp_text += f"\n--- Response {i} ({r['model']}) ---\n{r['response']}\n"
judge_prompt = f"""You are an expert judge evaluating multiple AI model responses to a question.
QUESTION:
{question}
RESPONSES:
{resp_text}
Evaluate each response for accuracy, completeness, clarity, and usefulness.
Provide your verdict in this EXACT format:
WINNER: [model name that gave the best response]
REASONING: [2-3 sentences explaining why this response won]
SYNTHESIZED ANSWER: [Your synthesized best answer combining the strongest elements from all responses. Be thorough.]"""
result = call_model(api_key, judge_model, judge_prompt, max_tokens, timeout)
return result
def parse_judge_verdict(text):
"""Parse the judge's structured response."""
verdict = {"winner": "", "reasoning": "", "synthesized": ""}
if not text:
return verdict
lines = text.split("\n")
current = None
for line in lines:
upper = line.strip().upper()
if upper.startswith("WINNER:"):
verdict["winner"] = line.split(":", 1)[1].strip()
current = "winner"
elif upper.startswith("REASONING:"):
verdict["reasoning"] = line.split(":", 1)[1].strip()
current = "reasoning"
elif upper.startswith("SYNTHESIZED ANSWER:"):
verdict["synthesized"] = line.split(":", 1)[1].strip()
current = "synthesized"
elif current == "reasoning" and line.strip():
verdict["reasoning"] += " " + line.strip()
elif current == "synthesized" and line.strip():
verdict["synthesized"] += "\n" + line.strip()
return verdict
def print_human(question, responses, judge_result, verdict):
"""Print human-readable output."""
print("\n" + "β" * 50)
print(" MODEL COUNCIL RESULTS")
print("β" * 50)
print(f"\nQuestion: {question}\n")
print("ββ Council Member Responses ββ\n")
for r in responses:
cost_str = f"${r['cost']:.4f}" if r['cost'] else "N/A"
dur_str = f"{r['duration']:.1f}s"
print(f"π€ {r['model']} ({cost_str}, {dur_str})")
if r["error"]:
print(f" β Error: {r['error']}")
else:
# Truncate for display
text = r["response"] or ""
if len(text) > 500:
text = text[:500] + "..."
for line in text.split("\n"):
print(f" {line}")
print()
judge_cost = f"${judge_result['cost']:.4f}" if judge_result['cost'] else "N/A"
print(f"ββ Judge Verdict ({judge_result['model']}, {judge_cost}) ββ\n")
if verdict["winner"]:
print(f"π Winner: {verdict['winner']}")
if verdict["reasoning"]:
print(f"π Reasoning: {verdict['reasoning']}")
if verdict["synthesized"]:
print(f"\nπ Synthesized Answer:\n{verdict['synthesized']}")
total_cost = sum(r["cost"] for r in responses) + (judge_result["cost"] or 0)
print(f"\nπ° Total Cost: ${total_cost:.4f}")
print("β" * 50)
def main():
parser = argparse.ArgumentParser(description="Model Council β Multi-model consensus")
parser.add_argument("question", help="Question to ask the council")
parser.add_argument("--models", default=",".join(DEFAULT_MODELS),
help="Comma-separated list of models")
parser.add_argument("--judge", default=DEFAULT_JUDGE, help="Judge model")
parser.add_argument("--max-tokens", type=int, default=1024, help="Max tokens per response")
parser.add_argument("--timeout", type=int, default=60, help="Timeout per model (seconds)")
parser.add_argument("--json", action="store_true", dest="json_output", help="JSON output")
args = parser.parse_args()
api_key = get_api_key()
models = [m.strip() for m in args.models.split(",") if m.strip()]
if not args.json_output:
print(f"ποΈ Convening council with {len(models)} models...")
for m in models:
print(f" β’ {m}")
print(f" Judge: {args.judge}\n")
# Query all models in parallel
responses = []
with ThreadPoolExecutor(max_workers=len(models)) as executor:
futures = {
executor.submit(call_model, api_key, m, args.question, args.max_tokens, args.timeout): m
for m in models
}
for future in as_completed(futures):
result = future.result()
responses.append(result)
if not args.json_output:
status = "β" if not result["error"] else f"β {result['error'][:50]}"
print(f" [{status}] {result['model']} ({result['duration']:.1f}s)")
# Sort by original model order
model_order = {m: i for i, m in enumerate(models)}
responses.sort(key=lambda r: model_order.get(r["model"], 99))
# Filter successful responses for judging
valid = [r for r in responses if not r["error"]]
if not valid:
print("ERROR: All models failed. No consensus possible.", file=sys.stderr)
sys.exit(1)
if not args.json_output:
print("\nβοΈ Judge evaluating responses...")
judge_result = judge_responses(api_key, args.judge, args.question, responses,
args.max_tokens * 2, args.timeout + 30)
if judge_result["error"]:
print(f"ERROR: Judge failed: {judge_result['error']}", file=sys.stderr)
sys.exit(1)
verdict = parse_judge_verdict(judge_result["response"])
if args.json_output:
output = {
"question": args.question,
"models": models,
"judge": args.judge,
"responses": [
{
"model": r["model"],
"response": r["response"],
"error": r["error"],
"cost": r["cost"],
"duration": r["duration"],
"prompt_tokens": r.get("prompt_tokens", 0),
"completion_tokens": r.get("completion_tokens", 0),
}
for r in responses
],
"verdict": verdict,
"judge_cost": judge_result["cost"],
"total_cost": sum(r["cost"] for r in responses) + (judge_result["cost"] or 0),
}
print(json.dumps(output, indent=2))
else:
print_human(args.question, responses, judge_result, verdict)
if __name__ == "__main__":
main()
```