The sft-wagmi pipeline: deliberately rudimentary

The repository DealExMachina/sft-wagmi holds our supervised fine-tuning (SFT) workflow for Wagmi, the small assistant that backs part of this site. The sibling repo DealExMachina/dexm-one-page generates the training data. This post describes the glue between them — and why we are comfortable calling it rudimentary.

If you want the product story (RAG + SFT + autotune on a 1.5B CPU model), read Taming a Small Model on CPU first. Here we stay close to the filesystem and the shell.

1. What “rudimentary” means here

Not broken — small-surface. There is no feature store, no lineage service, no Kubernetes operator for training jobs. Instead:

Flat files: train.jsonl, eval.jsonl, and metadata.json sit under sft-wagmi/data/. They are plain chat-formatted JSONL, good enough for Unsloth and similar trainers.
A thin orchestrator: scripts/pipeline.py runs preflight checks, optionally calls npm run dataset:wagmi:refresh in a checked-out copy of dexm-one-page next door, then executes baseline.py, train.py, autotune.py, eval_sft.py, eval_sft_rag.py, and export_gguf.py in sequence when you pass --all. It is mostly subprocess.run and path checks — not a workflow engine. (From dexm itself we usually run pnpm run dataset:wagmi:refresh; the Python orchestrator still shells out to npm today.)
Notebooks as fallback: If a .py step is missing, the launcher can try jupyter nbconvert --execute on the matching notebook. That is a compatibility shim, not a design goal.
Secrets via env: HF_TOKEN, OPENAI_API_KEY, optional .env in the repo root — nothing fancier.

The point is to ship a credible small-model behaviour for one product, not to win a platform bake-off.

2. Where the data comes from

Authoritative generation happens in dexm-one-page:

npx tsx scripts/generate-wagmi-sft-dataset.ts

That script walks the blog, wagmi-skills.md, ai.txt, optional Obsidian notes (OBSIDIAN_VAULT_PATH + wagmi_sft: true), and synthetic guardrail rows, then writes datasets/wagmi-sft/*.jsonl. Row counts and tag histograms live in datasets/wagmi-sft/metadata.json after each run — treat that file as the source of truth, not a README table that went stale last week.

Copying into sft-wagmi/data/ is either manual or handled by pnpm run dataset:wagmi:sync / pnpm run dataset:wagmi:refresh from dexm, depending on how your trees are laid out. The pipeline assumes the three files exist before training.

3. What `pipeline.py` actually does

Rough order when you run python3 scripts/pipeline.py --all:

Preflight — Verifies data/*.jsonl, looks for Python scripts, nags if HF_TOKEN or OPENAI_API_KEY is missing.
Sync — If ../dexm-one-page exists, runs npm run dataset:wagmi:refresh there (see pipeline.py); otherwise skips with a message.
Baseline — baseline.py (or notebook): measure the base model before SFT.
Train — train.py (or train.ipynb): Unsloth + LoRA on Qwen2.5-1.5B-Instruct (profile small vs auth switches paths and model family via MODEL_PROFILE).
Autotune — Optional judge-and-correct loop (autotune.py); needs a capable closed-model API. Expensive and opinionated — we use it sparingly.
Eval / eval-rag — Script evals with and without retrieval context.
Export — export_gguf.py: merge adapter, convert to GGUF, quantize for CPU inference (llama.cpp).

--dry-run only prints commands; useful when you are wiring a new machine.

4. Honest limitations

Path coupling: Default sync assumes dexm-one-page sits beside sft-wagmi on disk. Rename or move clones and you adjust paths or sync by hand.
Single-machine mindset: We run on one GPU box (e.g. Hugging Face Spaces or a rented L40), not a queue of jobs with autoscaling.
Autotune is not science: Scores depend on the judge model and prompts; it is a heuristic loop, not a published benchmark.
Documentation drift: The README in sft-wagmi may still show older row counts. Regenerate metadata in dexm after content changes.

5. Why keep it this way

For a 1.5B instruct model scoped to one company’s public voice, a rudimentary pipeline is fast to change: edit the generator in TypeScript, re-export JSONL, retrain, push an adapter. The complexity we refuse to add (for now) is the complexity we do not have to operate at midnight.

When the cost of coordination exceeds the cost of a few manual steps, we will promote pieces into something stricter. Until then, this is the honest shape of the system: files in, weights out, with a short Python script holding the checklist.

Further reading: sft-wagmi README · dexm generate-wagmi-sft-dataset.ts · Obsidian → SFT notes (dataset ingestion section) · RAG + SFT + autotune article

1. What “rudimentary” means here

2. Where the data comes from

3. What pipeline.py actually does

4. Honest limitations

5. Why keep it this way

3. What `pipeline.py` actually does