Evals
Agent evaluation framework based on Anthropic's best practices. USE WHEN eval, evaluate, test agent, benchmark, verify behavior, regression test, capability test. Includes three grader types (code-based, model-based, human), transcript capture, pass@k/pass^k metrics, and ALGORITHM integration.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install majiayu000-claude-skill-registry-evals-majiayu000-claude-skill-registr
Repository
Skill path: skills/other/evals-majiayu000-claude-skill-registr
Agent evaluation framework based on Anthropic's best practices. USE WHEN eval, evaluate, test agent, benchmark, verify behavior, regression test, capability test. Includes three grader types (code-based, model-based, human), transcript capture, pass@k/pass^k metrics, and ALGORITHM integration.
Open repositoryBest for
Primary workflow: Analyze Data & AI.
Technical facets: Full Stack, Data / AI, Testing, Integration.
Target audience: Development teams looking for install-ready agent workflows..
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: majiayu000.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install Evals into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/majiayu000/claude-skill-registry before adding Evals to shared team environments
- Use Evals for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.