Replace the brittle keyword verdict with an LLM-consolidated rating profile per
caller, and capture the COMPLETE report history instead of the first screen.
- open_report_detail(): land on the caller detail page (taps the Recent-lookups
row when the number was searched before) — fixes the 0-reports regression
- expand_all_reports() + capture_full_history(): tap "View all N", scroll-capture
every page until the UI dump stops changing; merge_reports() dedupes across pages
- build_rating_profile() (batch SDK, sonnet): 0-100 score + A–F grade + per-axis
sub-scores (reliability/payment/respect/safety) + signals + nuanced_notes.
Domain nuance: deposit mentions weight POSITIVE; law-enforcement forces denied
- result_from_profile(): honors recommendation, score fallback, hard safety override
- decide_result(): kept as deterministic fallback, fixed to never approve over a
model 'denied' / red flag and to match punctuation variants (no-show == no show)
- save_history(): persist full consolidated history + profile per caller
- tests: 18/18 (mapping, dedupe, safety override, full flow); DESIGN.md updated
Verified live against the redroid droplet (45.55.191.82): 15166687821 → 3 reports
consolidated → 18/100 grade F → denied, with multi-axis breakdown.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>