SkillHub ClubShip Full StackFull StackTesting

evaluate-model

Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars

Hot score

Updated

March 20, 2026

Overall rating

C3.9

Composite score

3.9

Best-practice grade

S96.0

Install command

npx @skill-hub/cli install mvillmow-projectodyssey-evaluate-model

Repository

mvillmow/ProjectOdyssey

Skill path: .claude/skills/tier-2/evaluate-model

Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics.

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack, Testing.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: mvillmow.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

Install evaluate-model into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
Review https://github.com/mvillmow/ProjectOdyssey before adding evaluate-model to shared team environments
Use evaluate-model for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: evaluate-model
description: "Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics."
mcp_fallback: none
category: ml
tier: 2
---

# Evaluate Model

Measure machine learning model performance using appropriate metrics for the task (classification, regression, etc.).

## When to Use

- Comparing different model architectures
- Assessing performance on test/validation datasets
- Detecting overfitting or underfitting
- Reporting model accuracy for papers and documentation

## Quick Reference

```mojo
# Mojo model evaluation pattern
struct ModelEvaluator:
    fn evaluate_classification(
        mut self,
        predictions: ExTensor,
        ground_truth: ExTensor
    ) -> Tuple[Float32, Float32, Float32]:
        # Returns accuracy, precision, recall
        ...

    fn evaluate_regression(
        mut self,
        predictions: ExTensor,
        ground_truth: ExTensor
    ) -> Tuple[Float32, Float32]:
        # Returns MSE, MAE
        ...
```

## Workflow

1. **Load test data**: Prepare test/validation dataset
2. **Generate predictions**: Run model inference on test set
3. **Select metrics**: Choose appropriate metrics (accuracy, precision, recall, F1, AUC, MSE, etc.)
4. **Calculate metrics**: Compute performance metrics
5. **Analyze results**: Compare to baseline and identify strengths/weaknesses

## Output Format

Evaluation report:

- Task type (classification, regression, etc.)
- Metrics (accuracy, precision, recall, F1, AUC, etc.)
- Per-class breakdown (if applicable)
- Comparison to baseline model
- Confusion matrix (classification)
- Error analysis

## References

- See CLAUDE.md > Language Preference (Mojo for ML models)
- See `train-model` skill for model training
- See `/notes/review/mojo-ml-patterns.md` for Mojo tensor operations