evaluate-model
Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.
Packaged view
This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.
Install command
npx @skill-hub/cli install mvillmow-projectodyssey-evaluate-model
Repository
Skill path: .claude/skills/tier-2/evaluate-model
Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.
Open repositoryBest for
Primary workflow: Ship Full Stack.
Technical facets: Full Stack, Testing.
Target audience: everyone.
License: Unknown.
Original source
Catalog source: SkillHub Club.
Repository owner: mvillmow.
This is still a mirrored public skill entry. Review the repository before installing into production workflows.
What it helps with
- Install evaluate-model into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
- Review https://github.com/mvillmow/ProjectOdyssey before adding evaluate-model to shared team environments
- Use evaluate-model for development workflows
Works across
Favorites: 0.
Sub-skills: 0.
Aggregator: No.
Original source / Raw SKILL.md
---
name: evaluate-model
description: "Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics."
mcp_fallback: none
category: ml
tier: 2
---
# Evaluate Model
Measure machine learning model performance using appropriate metrics for the task (classification, regression, etc.).
## When to Use
- Comparing different model architectures
- Assessing performance on test/validation datasets
- Detecting overfitting or underfitting
- Reporting model accuracy for papers and documentation
## Quick Reference
```mojo
# Mojo model evaluation pattern
struct ModelEvaluator:
fn evaluate_classification(
mut self,
predictions: ExTensor,
ground_truth: ExTensor
) -> Tuple[Float32, Float32, Float32]:
# Returns accuracy, precision, recall
...
fn evaluate_regression(
mut self,
predictions: ExTensor,
ground_truth: ExTensor
) -> Tuple[Float32, Float32]:
# Returns MSE, MAE
...
```
## Workflow
1. **Load test data**: Prepare test/validation dataset
2. **Generate predictions**: Run model inference on test set
3. **Select metrics**: Choose appropriate metrics (accuracy, precision, recall, F1, AUC, MSE, etc.)
4. **Calculate metrics**: Compute performance metrics
5. **Analyze results**: Compare to baseline and identify strengths/weaknesses
## Output Format
Evaluation report:
- Task type (classification, regression, etc.)
- Metrics (accuracy, precision, recall, F1, AUC, etc.)
- Per-class breakdown (if applicable)
- Comparison to baseline model
- Confusion matrix (classification)
- Error analysis
## References
- See CLAUDE.md > Language Preference (Mojo for ML models)
- See `train-model` skill for model training
- See `/notes/review/mojo-ml-patterns.md` for Mojo tensor operations