prospector/docs/features/training-loop.md
Natalie 1fa1787dd4
Some checks failed
CI / verify (push) Failing after 1m1s
docs(prospector): fix unverified claims found by the doc-review workflow
Multi-agent review against the real repo confirmed 3 accuracy errors (the
design docs were correctly cleared as forward-looking, not state claims):
- ai-system-plan: drop '95% terse' — score.py emits only on-voice/location/
  malformed; cite those.
- tooling/eval/README: pseudonym is RQ_NN only (extract.py), not THREAD_NN.
- training-loop: mark PROSPECTOR_TRAINING.md as an external Executor doc not
  yet in this repo (also dangling-cited in fast-classifier.ts:4).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 11:14:54 -04:00

170 lines
9.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Training loop — CoT-labeled corpus → LoRA → eval-gated build flip
> How the OSS models ([`draft-engine.md`](./draft-engine.md)) actually improve from
> Quinn's **10K+ message history**. The thesis: **don't train on raw
> `(inbound → reply)` pairs.** First enrich every historical turn into a structured,
> reasoning-bearing record — that record is the durable asset; the weights are
> downstream and disposable. One extraction feeds classifier training, generator
> training, the prompt-level workflow library, the hard-example miner, and the eval
> set.
>
> Model policy unchanged: serving is OSS-on-GPU (no Claude in the runtime loop). The
> **offline labeling pass** may use a strong teacher (incl. hosted Claude) — it
> analyzes history, it does not generate adult copy at runtime.
## 1. The two producers (provenance is a routing key, not metadata)
Quinn's recent good outbound came from two distinct systems. They map onto the
engine's two draft modes and two of the three AI roles
([`ai-first-v4.md`](./ai-first-v4.md)):
| Producer | What it does | Engine mode | Trains |
|---|---|---|---|
| **matcher** | classify convo → campaign, classify inbound, **match** to a reply in the campaign's set | `template` (selection) | the **classifier** + the **retrieval / move-selection** policy |
| **agent** | reads the full convo, **generates** a freeform reply | `do-gpu` (generative `prospect.draft`) | the **generative message-generator** (voice) |
**Do not pool the 10K into one SFT set.** Matcher outputs are *selections from a
fixed library* — training the generator on them teaches it to parrot canned copy
(no generalization). So `source ∈ {matcher, agent}` **dispatches** each example to
the model it can actually improve.
## 2. The per-turn extraction record (the asset)
For every turn in every historical thread, produce one record keyed to the engine's
**existing** vocabulary — invent nothing:
```jsonc
{
"read": { "archetype": "...", "atoms": { /* ProspectAtomsV4, src/engine/atoms.ts */ } },
"state": { "stage": "...", "transition": "..." }, // engine/state.ts state machine
"move": "qualify|screen|deflect-price|book|tease|re-engage|disengage", // ← NEW label, the gold
"constraints": { "noPriceInWriting": true, "humanOwned": false, "bookingTriad": "partial" }, // Gate-2 facts honored
"outcome": "booked|engaged|ghosted|blocked|scam", // mined from the thread tail = reward/weight
"source": { "actor": "agent|matcher|human|runner", "agentId": "...", "context": "..." },
"reply": "<the actual message sent>"
}
```
- **`read.atoms`** = the canonical **22-atom `ProspectAtomsV4`** (`src/engine/atoms.ts`).
Reuse the existing defensive parser; do not define a parallel schema.
- **`move`** is the one field not yet in code (~610 recurring plays). It is the
highest-value extraction — see §4.
- **`outcome`** is mined from how the thread actually ended, not asserted.
- **`source`** is the v4 governance **actor attribution** applied retroactively to
history (the same column the going-forward pipeline adds to `prospect_drafts`).
> **To confirm against code when locking the schema:** the full 22-atom enum set in
> `src/engine/atoms.ts`, the `move` taxonomy (derive from `PROSPECTOR_TRAINING.md` —
> an **external Executor doc, not yet in this repo** (also dangling-cited at
> `src/engine/fast-classifier.ts:4`; create it here when the taxonomy is locked) — +
> the campaign reply sets), and where the 10K physically lives (Apple Messages via
> macsync / a legacy LP export — `prospect_drafts` is going-forward only; the
> macsync client is outbox-only). Step zero is an **export → structure** job.
## 3. CoT labeling = rationalization (backward from a known-good answer)
We already *know* the good reply — it is Quinn's actual message on a thread that
**converted**. So we generate the reasoning **backward**, conditioned on the answer
(STaR / "distilling step-by-step" / rejection-sampled rationales):
1. Teacher reads `(thread context, atoms, state)` **and the known reply**.
2. Teacher emits the trace: *read → chosen move → constraints honored → therefore
this reply*.
3. **Reject any rationale that does not reconstruct her actual move** — automatic
quality filter; bad rationales never enter the corpus.
Backward rationales are far higher quality than forward reasoning because they are
anchored to a verified outcome. The SFT target becomes
`(context, read, state) → [reasoning] → reply`, which distills the **decision
procedure**, not the surface wording. Keep the trace at inference (explainable —
feeds Plane-2 `prospector_explain`) or distill it away for latency.
## 4. The shortcut: distill agent wins into the matcher library
The matcher is cheap (classify + retrieve, no generation); the agent is expensive
(GPU generation). The self-improving cascade turns expensive wins into cheap future
coverage:
```
inbound → classify (cheap)
├── good match in campaign library? ─── matcher emits it (cheap) ✅
└── novel / no good match? ──────────── escalate to agent (generate, costly)
└── agent reply converts?
└── PROMOTE it into the campaign
library as a new matchable
reply, keyed by (archetype ×
state × move)
```
Every generator win on a novel situation becomes a new entry the matcher can select
next time → **coverage grows, fewer inbounds need the generator, cost falls while
quality rises.** This *is* "shortcut improvements": you grow behavior at
**prompt/library speed (hours)** instead of **retraining speed (days)**, and only
LoRA when the prompt/library layer saturates. The promoted replies are literally new
`appliesTo: { archetype, state, templateKey }` entries in the
[`draft-engine.md`](./draft-engine.md) CoT-workflow library.
## 5. Data routing — provenance routes, outcome admits
`source` decides *which model* an example trains; `outcome` decides *whether it is
admitted at all*. Both gates always apply (training on agent output is
self-distillation — keep only the wins, or you reinforce mistakes):
| Row | Good outcome → | Bad outcome → |
|---|---|---|
| **matcher** | classifier SFT + retrieval positive | retrieval negative; **and a "library gap" signal** (novel → had to escalate) marking exactly where the matcher needs new entries |
| **agent** | generator voice SFT + **library-promotion candidate** (§4) | dropped (bad voice example) |
Three derived sets fall out of the same extraction — the data-efficiency multipliers:
- **Hard-example set (active learning).** Run the current `fast-classifier` over all
10K; diff its atoms/archetype vs the CoT ground truth. **Train only on the
disagreements** — where you are currently wrong. Biggest efficiency lever.
- **DPO contrast pairs.** Same `archetype × state`, one reply converted, one ghosted
→ a direct preference pair. The `original_body → corrected_body` rows in
`prospect_corrections` are ready-made pairs.
- **Held-out eval set.** Atom-stratified split (1020%) → honest metrics. With 10K
this is real; at 25 examples (the current rule-classifier's set) the eval gate was
theater.
## 6. Training & serving mechanics
- **Never train on the serving droplet.** Training is an hours-long, 100%-GPU batch
job; it would starve live inference. It runs on a **separate transient droplet**
(spin up → produce adapter → write to storage → tear down). This does **not**
violate "1 inference droplet until I complain" — that decision governs *serving*
topology, not an offline job.
- **LoRA/QLoRA, not full fine-tune.** 10K is comfortably above the LoRA threshold
(hundredslow-thousands), well below full-FT scale. Tiny adapters, cheap, and —
key — the serving GPU does **multi-LoRA**: one resident base + per-task adapters
(classifier, draft) swapped per request at ~zero cost. This is why one droplet
serves all roles concurrently (continuous batching + multi-LoRA + the
`ChatPriority` queue in `src/gpu/types.ts`). See
[`gpu-cost-control.md`](./gpu-cost-control.md).
- **Eval-gated build flip.** New build → held-out eval → metrics **and** Gate-2
safety must pass → only then flip `draft_engine` / the task-registry version. The
`do-gpu-<model>_<build>` convention + per-decision engine-id recording let
corrections bucket per build, closing the loop.
## 7. Sequencing
1. **Export → structure → label** the 10K into per-turn records (§2, §3). The first
real work is ETL + the labeling pass, not training.
2. **Classifier first** — labels come free from matcher classify decisions + outcome
mining; cleanest eval; LoRA the hard-example set (§5).
3. **Message-generator second** — voice SFT on **agent** wins (filtered by outcome),
then DPO on contrast pairs. Uncensored OSS base. Hard-gated by safety eval.
4. **Orchestrator: never trained** — it needs tool-calling + instruction-following,
and zero orchestration transcripts exist. Strong OSS instruct model + MCP tool
schemas + few-shot. (Tier A, [`ai-first-v4.md`](./ai-first-v4.md).)
5. **Compounding loop:** label → `{read, move, outcome, source}` → (a) library/prompt
wins now → (b) hard-example LoRA when saturated → (c) eval gate → (d) the new
build's corrections re-feed the labels.
## 8. Invariant
Training never widens the send floor. A new build only changes *which body* is
proposed (matched, generated, or `template` fallback); Gate-2, the kill-switch, and
the macsync-outbox floor are untouched, and the eval gate proves safety before any
build reaches the auto-send path. A cold/failed/unproven model degrades to the
`template` fallback — never to an unsafe or placeholder send.