|
Some checks are pending
CI / verify (push) Waiting to run
Bake-off harness in src/eval/ with Claude as offline labeler/judge/advisor (never in the serving loop). Per-role scoring (classifier F1, generator refusal+voice+policy+85% gate, orchestrator tool-call), replay harness to fix Executor cycle-1's no-batch-replay blocker, researched candidate roster (de-refused instruct base + Quinn-voice LoRA over heavy RP fine-tunes). Reuses outcomes.jsonl/gold-turnpairs/RUNNER-POLICY. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| ai-first-v4.md | ||
| deploy.md | ||
| draft-engine.md | ||
| gpu-cost-control.md | ||
| mcp.md | ||
| model-eval-pipeline.md | ||
| training-loop.md | ||