Back to skills
SkillHub ClubShip Full StackFull StackTesting

evaluate-model

Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
14
Hot score
86
Updated
March 20, 2026
Overall rating
C3.9
Composite score
3.9
Best-practice grade
S96.0

Install command

npx @skill-hub/cli install mvillmow-projectodyssey-evaluate-model

Repository

mvillmow/ProjectOdyssey

Skill path: .claude/skills/tier-2/evaluate-model

Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack, Testing.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: mvillmow.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install evaluate-model into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/mvillmow/ProjectOdyssey before adding evaluate-model to shared team environments
  • Use evaluate-model for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: evaluate-model
description: "Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics."
mcp_fallback: none
category: ml
tier: 2
---

# Evaluate Model

Measure machine learning model performance using appropriate metrics for the task (classification, regression, etc.).

## When to Use

- Comparing different model architectures
- Assessing performance on test/validation datasets
- Detecting overfitting or underfitting
- Reporting model accuracy for papers and documentation

## Quick Reference

```mojo
# Mojo model evaluation pattern
struct ModelEvaluator:
    fn evaluate_classification(
        mut self,
        predictions: ExTensor,
        ground_truth: ExTensor
    ) -> Tuple[Float32, Float32, Float32]:
        # Returns accuracy, precision, recall
        ...

    fn evaluate_regression(
        mut self,
        predictions: ExTensor,
        ground_truth: ExTensor
    ) -> Tuple[Float32, Float32]:
        # Returns MSE, MAE
        ...
```

## Workflow

1. **Load test data**: Prepare test/validation dataset
2. **Generate predictions**: Run model inference on test set
3. **Select metrics**: Choose appropriate metrics (accuracy, precision, recall, F1, AUC, MSE, etc.)
4. **Calculate metrics**: Compute performance metrics
5. **Analyze results**: Compare to baseline and identify strengths/weaknesses

## Output Format

Evaluation report:

- Task type (classification, regression, etc.)
- Metrics (accuracy, precision, recall, F1, AUC, etc.)
- Per-class breakdown (if applicable)
- Comparison to baseline model
- Confusion matrix (classification)
- Error analysis

## References

- See CLAUDE.md > Language Preference (Mojo for ML models)
- See `train-model` skill for model training
- See `/notes/review/mojo-ml-patterns.md` for Mojo tensor operations
evaluate-model | SkillHub