Back to skills
SkillHub ClubAnalyze Data & AIFull StackData / AI

ml-experiment-tracker

Plan reproducible ML experiment runs with explicit parameters, metrics, and artifacts. Use before model training to standardize tracking-ready experiment definitions.

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
3,071
Hot score
99
Updated
March 20, 2026
Overall rating
C4.0
Composite score
4.0
Best-practice grade
B80.4

Install command

npx @skill-hub/cli install openclaw-skills-ml-experiment-tracker

Repository

openclaw/skills

Skill path: skills/0x-professor/ml-experiment-tracker

Plan reproducible ML experiment runs with explicit parameters, metrics, and artifacts. Use before model training to standardize tracking-ready experiment definitions.

Open repository

Best for

Primary workflow: Analyze Data & AI.

Technical facets: Full Stack, Data / AI.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: openclaw.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install ml-experiment-tracker into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/openclaw/skills before adding ml-experiment-tracker to shared team environments
  • Use ml-experiment-tracker for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: ml-experiment-tracker
description: Plan reproducible ML experiment runs with explicit parameters, metrics, and artifacts. Use before model training to standardize tracking-ready experiment definitions.
---

# ML Experiment Tracker

## Overview

Generate structured experiment plans that can be logged consistently in experiment tracking systems.

## Workflow

1. Define dataset, target task, model family, and parameter search space.
2. Define metrics and acceptance thresholds before training.
3. Produce run plan with version and artifact expectations.
4. Export the run plan for execution in tracking tools.

## Use Bundled Resources

- Run `scripts/build_experiment_plan.py` to generate consistent run plans.
- Read `references/tracking-guide.md` for reproducibility checklist.

## Guardrails

- Keep inputs explicit and machine-readable.
- Always include metrics and baseline criteria.


---

## Referenced Files

> The following files are referenced in this skill and included for context.

### scripts/build_experiment_plan.py

```python
#!/usr/bin/env python3
from __future__ import annotations

import argparse
import csv
import json
from pathlib import Path

MAX_INPUT_BYTES = 1_048_576


def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(description="Build a reproducible ML experiment plan.")
    parser.add_argument("--input", required=False, help="Path to JSON input.")
    parser.add_argument("--output", required=True, help="Path to output artifact.")
    parser.add_argument("--format", choices=["json", "md", "csv"], default="json")
    parser.add_argument("--dry-run", action="store_true", help="Run without side effects.")
    return parser.parse_args()


def load_payload(path: str | None, max_input_bytes: int = MAX_INPUT_BYTES) -> dict:
    if not path:
        return {}
    p = Path(path)
    if not p.exists():
        raise FileNotFoundError(f"Input file not found: {p}")
    if p.stat().st_size > max_input_bytes:
        raise ValueError(f"Input file exceeds {max_input_bytes} bytes: {p}")
    return json.loads(p.read_text(encoding="utf-8"))


def render(result: dict, output_path: Path, fmt: str) -> None:
    output_path.parent.mkdir(parents=True, exist_ok=True)

    if fmt == "json":
        output_path.write_text(json.dumps(result, indent=2), encoding="utf-8")
        return

    if fmt == "md":
        details = result["details"]
        lines = [
            f"# {result['summary']}",
            "",
            f"- status: {result['status']}",
            f"- experiment_name: {details['experiment_name']}",
            f"- dataset: {details['dataset']}",
            "",
            "## Metrics",
        ]
        for metric in details["metrics"]:
            lines.append(f"- {metric}")
        lines.extend(["", "## Parameters"])
        for key, value in details["parameters"].items():
            lines.append(f"- {key}: {value}")
        output_path.write_text("\n".join(lines) + "\n", encoding="utf-8")
        return

    details = result["details"]
    with output_path.open("w", newline="", encoding="utf-8") as handle:
        writer = csv.writer(handle)
        writer.writerow(["field", "value"])
        writer.writerow(["experiment_name", details["experiment_name"]])
        writer.writerow(["dataset", details["dataset"]])
        writer.writerow(["metrics", ",".join(details["metrics"])])
        for key, value in details["parameters"].items():
            writer.writerow([f"param:{key}", value])


def main() -> int:
    args = parse_args()
    payload = load_payload(args.input)

    experiment_name = str(payload.get("experiment_name", "baseline-experiment"))
    dataset = str(payload.get("dataset", "data/dataset.csv"))
    parameters = payload.get("parameters", {})
    metrics = payload.get("metrics", ["accuracy", "f1"])

    if not isinstance(parameters, dict):
        parameters = {}
    if not isinstance(metrics, list):
        metrics = ["accuracy"]

    details = {
        "experiment_name": experiment_name,
        "dataset": dataset,
        "metrics": [str(metric) for metric in metrics],
        "parameters": {str(key): value for key, value in parameters.items()},
        "tracking_tags": {
            "owner": str(payload.get("owner", "ml-team")),
            "environment": str(payload.get("environment", "dev")),
        },
        "artifact_expectations": [
            "metrics.json",
            "model.pkl or model.bin",
            "feature_schema.json",
        ],
        "dry_run": args.dry_run,
    }

    result = {
        "status": "ok",
        "summary": f"Built experiment plan for '{experiment_name}'",
        "artifacts": [str(Path(args.output))],
        "details": details,
    }

    render(result, Path(args.output), args.format)
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

```

### references/tracking-guide.md

```markdown
# Tracking Guide

## Reproducibility Checklist

- Capture dataset version or snapshot ID.
- Capture model family and hyperparameters.
- Capture metrics and threshold criteria.
- Capture run environment metadata.
- Capture artifact paths for model and schema outputs.

## Minimum Experiment Fields

- `experiment_name`
- `dataset`
- `parameters`
- `metrics`

## Recommended Next Steps

- Log plan output into MLflow or equivalent tracker.
- Compare baseline and candidate runs using consistent metric names.

```



---

## Skill Companion Files

> Additional files collected from the skill directory layout.

### _meta.json

```json
{
  "owner": "0x-professor",
  "slug": "ml-experiment-tracker",
  "displayName": "Ml Experiment Tracker",
  "latest": {
    "version": "0.1.0",
    "publishedAt": 1772136555876,
    "commit": "https://github.com/openclaw/skills/commit/1861bb488b994c3375ef62021a07e9c40aeb0a2d"
  },
  "history": []
}

```

ml-experiment-tracker | SkillHub