Back to skills
SkillHub ClubShip Full StackFull Stack

local-finetune

local-finetune

Packaged view

This page reorganizes the original catalog entry around fit, installability, and workflow context first. The original raw source lives below.

Stars
10
Hot score
84
Updated
March 20, 2026
Overall rating
C3.7
Composite score
3.7
Best-practice grade
B75.1

Install command

npx @skill-hub/cli install plurigrid-asi-local-finetune

Repository

plurigrid/asi

Skill path: skills/local-finetune

local-finetune

Open repository

Best for

Primary workflow: Ship Full Stack.

Technical facets: Full Stack.

Target audience: everyone.

License: Unknown.

Original source

Catalog source: SkillHub Club.

Repository owner: plurigrid.

This is still a mirrored public skill entry. Review the repository before installing into production workflows.

What it helps with

  • Install local-finetune into Claude Code, Codex CLI, Gemini CLI, or OpenCode workflows
  • Review https://github.com/plurigrid/asi before adding local-finetune to shared team environments
  • Use local-finetune for development workflows

Works across

Claude CodeCodex CLIGemini CLIOpenCode

Favorites: 0.

Sub-skills: 0.

Aggregator: No.

Original source / Raw SKILL.md

---
name: local-finetune
description: local-finetune
version: 1.0.0
---

# local-finetune

> Local model fine-tuning pipeline using ACSets + DuckDB + MLX

**Trit**: 0 (Coordinator - orchestrates data flow)
**Bundle**: substrate
**Requires**: duckdb, mlx-lm, acsets skill

## Overview

Pipeline for embedding skills into local models via LoRA fine-tuning on Apple Silicon.

```
┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   ACSets    │───▶│   DuckDB    │───▶│   JSONL     │───▶│  mlx-lm     │
│   Schema    │    │   Corpus    │    │  Training   │    │  LoRA       │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
```

## Database Location

```
~/skill-substrate/skill_corpus.duckdb
```

## Schema (ACSet-inspired)

```sql
-- Objects: Skill, Example, Category
-- Morphisms: skill_of, category_of, trit_of

CREATE TABLE skills (
    id INTEGER PRIMARY KEY,
    name VARCHAR UNIQUE,
    description TEXT,
    location VARCHAR,
    fingerprint UBIGINT,
    color_hex VARCHAR,
    trit INTEGER CHECK (trit IN (-1, 0, 1))
);

CREATE TABLE examples (
    id INTEGER PRIMARY KEY,
    skill_id INTEGER REFERENCES skills(id),
    instruction TEXT NOT NULL,
    input TEXT,
    output TEXT,
    fingerprint UBIGINT,
    trit INTEGER
);

CREATE TABLE claude_history (
    id INTEGER,
    content TEXT,
    ts TIMESTAMP,
    project VARCHAR,
    sessionId VARCHAR,
    fingerprint UBIGINT,
    color_hex VARCHAR,
    trit INTEGER
);
```

## Ingest Claude History

```sql
CREATE TABLE claude_history AS
SELECT
    row_number() OVER () as id,
    display as content,
    to_timestamp(timestamp/1000) as ts,
    project,
    sessionId,
    hash(display || COALESCE(project,'')) as fingerprint,
    '#' || printf('%06x', ABS(hash(display)) % 16777216) as color_hex,
    CAST(ABS(hash(display)) % 3 AS INTEGER) - 1 as trit
FROM read_json('~/.claude/history.jsonl',
    format='newline_delimited',
    ignore_errors=true,
    columns={display: 'VARCHAR', timestamp: 'BIGINT', project: 'VARCHAR', sessionId: 'VARCHAR'}
)
WHERE display IS NOT NULL AND LENGTH(display) > 10;
```

## Ingest Skills from Filesystem

```sql
CREATE TABLE skill_files AS
SELECT
    row_number() OVER () as id,
    regexp_extract(file, '/([^/]+)/[^/]+\.md$', 1) as skill_name,
    file as path,
    hash(file) as fingerprint,
    CAST(ABS(hash(file)) % 3 AS INTEGER) - 1 as trit
FROM glob('~/.claude/skills/*/*.md');

INSERT INTO skills (id, name, location, fingerprint, trit)
SELECT MIN(id), skill_name, FIRST(path), FIRST(fingerprint), FIRST(trit)
FROM skill_files WHERE skill_name IS NOT NULL
GROUP BY skill_name;
```

## Generate Training Pairs from History

```sql
CREATE VIEW training_candidates AS
WITH consecutive AS (
    SELECT
        id, content, ts, project,
        LAG(content) OVER (PARTITION BY project ORDER BY ts) as prev_content,
        LAG(ts) OVER (PARTITION BY project ORDER BY ts) as prev_ts,
        trit, fingerprint
    FROM claude_history
    WHERE LENGTH(content) > 20
)
SELECT
    prev_content as instruction,
    content as output,
    project as category,
    trit, fingerprint, ts
FROM consecutive
WHERE prev_content IS NOT NULL
  AND LENGTH(prev_content) > 10
  AND LENGTH(content) > 50
  AND ts - prev_ts < INTERVAL '5 minutes';
```

## Export to JSONL (Chat Format)

```sql
COPY (
    SELECT json_object(
        'messages', json_array(
            json_object('role', 'user', 'content', instruction),
            json_object('role', 'assistant', 'content', output)
        )
    ) as json_line
    FROM training_candidates
    WHERE LENGTH(instruction) < 2000 AND LENGTH(output) < 4000
    ORDER BY RANDOM()
) TO 'skills_train.jsonl' (FORMAT CSV, QUOTE '', HEADER false);
```

## Generate Skill Knowledge Examples (Python)

```python
#!/usr/bin/env python3
"""Generate training data from skill markdown files."""
import json, os, re
from pathlib import Path

skills_dir = Path.home() / ".claude" / "skills"
output = []

for skill_dir in sorted(skills_dir.iterdir()):
    if not skill_dir.is_dir():
        continue

    skill_name = skill_dir.name
    readme = skill_dir / "README.md"

    if not readme.exists():
        mds = list(skill_dir.glob("*.md"))
        if mds:
            readme = mds[0]
        else:
            continue

    content = readme.read_text()[:8000]

    # Extract description
    desc_match = re.search(r'^#[^#].*?\n\n(.+?)(?:\n\n|\n#)', content, re.DOTALL)
    description = desc_match.group(1).strip() if desc_match else content[:500]

    # Q&A: "What is X skill?"
    output.append({
        "messages": [
            {"role": "user", "content": f"What is the {skill_name} skill?"},
            {"role": "assistant", "content": description[:1500]}
        ]
    })

    # Extract code blocks as examples
    code_blocks = re.findall(r'```(\w+)?\n(.+?)```', content, re.DOTALL)
    for lang, code in code_blocks[:3]:
        if 50 < len(code) < 2000:
            output.append({
                "messages": [
                    {"role": "user", "content": f"Show me an example of using {skill_name}" + (f" in {lang}" if lang else "")},
                    {"role": "assistant", "content": f"```{lang or ''}\n{code.strip()}\n```"}
                ]
            })

with open("skill_knowledge.jsonl", "w") as f:
    for item in output:
        f.write(json.dumps(item) + "\n")

print(f"Generated {len(output)} examples")
```

## Split Train/Valid/Test

```bash
cd ~/skill-substrate
cat skills_train.jsonl skill_knowledge.jsonl | \
  awk 'BEGIN{srand()}{print rand()"\t"$0}' | sort -n | cut -f2- > combined_train.jsonl

total=$(wc -l < combined_train.jsonl)
train_n=$((total * 80 / 100))
valid_n=$((total * 10 / 100))

mkdir -p train_data
head -n $train_n combined_train.jsonl > train_data/train.jsonl
tail -n +$((train_n + 1)) combined_train.jsonl | head -n $valid_n > train_data/valid.jsonl
tail -n $valid_n combined_train.jsonl > train_data/test.jsonl
```

## MLX LoRA Fine-Tuning

**IMPORTANT**: Must run from native arm64 shell (not Rosetta).

```bash
# Check architecture first
arch  # Should show 'arm64', not 'i386'

# If i386, wrap with:
arch -arm64 /bin/zsh

# Then run:
mlx_lm.lora \
  --model mlx-community/Qwen2.5-Coder-7B-Instruct-4bit \
  --train \
  --data train_data \
  --batch-size 2 \
  --iters 200 \
  --learning-rate 1e-5 \
  --num-layers 8 \
  --steps-per-report 10 \
  --adapter-path adapters/skill-substrate \
  --seed 1069
```

## Model Recommendations by RAM

| RAM | Model | Batch Size |
|-----|-------|------------|
| 16GB | Qwen2.5-0.5B-4bit | 4 |
| 24GB | Qwen2.5-Coder-7B-4bit | 2 |
| 32GB | Qwen2.5-14B-4bit | 1 |
| 64GB+ | Qwen2.5-32B-4bit | 1 |

## Inference with Adapter

```bash
mlx_lm.generate \
  --model mlx-community/Qwen2.5-Coder-7B-Instruct-4bit \
  --adapter-path adapters/skill-substrate \
  --prompt "What is the acsets skill?"
```

## Fuse Adapter into Model

```bash
mlx_lm.fuse \
  --model mlx-community/Qwen2.5-Coder-7B-Instruct-4bit \
  --adapter-path adapters/skill-substrate \
  --save-path models/skill-substrate-7B
```

## GF(3) Conservation Check

```sql
SELECT
    'skills' as source, COUNT(*) as n, SUM(trit) as gf3,
    CASE WHEN SUM(trit) % 3 = 0 THEN '✓' ELSE '⚠' END as status
FROM skills
UNION ALL
SELECT 'history', COUNT(*), SUM(trit),
    CASE WHEN SUM(trit) % 3 = 0 THEN '✓' ELSE '⚠' END
FROM claude_history;
```

## Troubleshooting

### MLX float16_t Error

**Symptom**: `error: no member named 'float16_t' in the global namespace`

**Cause**: Running under Rosetta (x86_64) instead of native arm64.

**Fix**:
```bash
# Check current arch
arch  # If 'i386', you're in Rosetta

# Option 1: Force arm64
arch -arm64 /bin/zsh -c "mlx_lm.lora ..."

# Option 2: Use native Terminal.app
# System Settings > Terminal > disable "Open using Rosetta"
```

### Out of Memory

Reduce batch-size or num-layers:
```bash
--batch-size 1 --num-layers 4
```

## CLI Quick Reference

```bash
# Status check
duckdb ~/skill-substrate/skill_corpus.duckdb -c \
  "SELECT source, COUNT(*), SUM(trit) FROM (
    SELECT 'skills' as source, trit FROM skills
    UNION ALL SELECT 'history', trit FROM claude_history
  ) GROUP BY source;"

# Regenerate training data
duckdb ~/skill-substrate/skill_corpus.duckdb -c \
  "COPY (SELECT * FROM training_candidates LIMIT 1000)
   TO 'new_train.jsonl' (FORMAT JSON);"

# Test model
mlx_lm.chat --model mlx-community/Qwen2.5-Coder-7B-Instruct-4bit \
  --adapter-path adapters/skill-substrate
```

## Files

```
~/skill-substrate/
├── skill_corpus.duckdb       # Main database
├── combined_train.jsonl      # All training examples
├── skill_knowledge.jsonl     # Skill-derived examples
├── skills_train.jsonl        # History-derived examples
├── generate_skill_data.py    # Skill extraction script
├── train_data/
│   ├── train.jsonl          # 80%
│   ├── valid.jsonl          # 10%
│   └── test.jsonl           # 10%
└── adapters/
    └── skill-substrate/     # LoRA weights
```

## Related Skills

- `acsets` - Schema design foundation
- `duckdb-ies` - Interactome analytics
- `gay-mcp` - Deterministic coloring
- `mlx-whisper` - Audio transcription (same MLX stack)

## GF(3) Triad

| Trit | Role | Skill |
|------|------|-------|
| -1 | Data source | duckdb-ies |
| 0 | Orchestrator | **local-finetune** |
| +1 | Model output | mlx-lm inference |

Conservation: (-1) + (0) + (+1) = 0 ✓
local-finetune | SkillHub