prospector/tooling/eval/score.py
Natalie 19c578bead
Some checks are pending
CI / verify (push) Waiting to run
feat(prospector): add tooling/eval draft-engine bake-off harness
Validated OSS (Qwen3.6-27B-AEON-Uncensored) Quinn-voice drafting against the
agent-matcher reply-queue baseline. Four methodology fixes eliminate the early
weaknesses: json_schema strict (0% malformed), canon few-shot (100% on-voice),
current-facts/location-from-context (0 location errors), and classify-move-first
then reply (matcher-level discipline on defensive moves: withhold address,
redirect harvesters+crude to OF). PII stays under gitignored .data/; scripts
only. Claude is the offline judge/advisor, never the runtime generator.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 01:47:56 -04:00

32 lines
1.4 KiB
Python

#!/usr/bin/env python3
"""Score results.json: malformed %, on-voice %, and move-agreement vs the matcher.
The agent-matcher's tmpl is the move baseline; we report how often the OSS model's
classified move agrees, and surface disagreements for review. No PII to stdout
beyond the client's last line (needed to judge) — keep this terminal local.
Env: DATA_DIR (default ./.data).
"""
import json, os
DATA = os.environ.get("DATA_DIR", os.path.join(os.path.dirname(__file__), ".data"))
r = json.load(open(os.path.join(DATA, "results.json")))
# matcher tmpl -> our move vocabulary
FAM = {"opener": "opener", "opener-q": "opener", "opener-pink": "opener",
"subhour": "subhour", "address": "address", "napa": "out-of-area", "of": "of"}
def voiced(s):
return any(w in s.lower() for w in ["hun", "babe", "💗", "😘", "🥰"])
n = len(r)
malformed = sum(1 for x in r if not x["oss_reply"])
on_voice = sum(voiced(x["oss_reply"]) for x in r)
move_match = sum(1 for x in r if FAM.get(x.get("tmpl")) == x.get("oss_move"))
print(f"n={n} malformed={malformed} ({100*malformed//n}%) "
f"on-voice={on_voice}/{n} move-agrees-matcher={move_match}/{n}")
dis = [x for x in r if FAM.get(x.get("tmpl")) != x.get("oss_move")]
if dis:
print("\nmove disagreements (matcher_tmpl -> oss_move):")
for x in dis:
print(f" [{x['id']}] {x['tmpl']} -> {x.get('oss_move')} | client: {x['their_last'][:60]}")