Replace the brittle keyword verdict with an LLM-consolidated rating profile per caller, and capture the COMPLETE report history instead of the first screen. - open_report_detail(): land on the caller detail page (taps the Recent-lookups row when the number was searched before) — fixes the 0-reports regression - expand_all_reports() + capture_full_history(): tap "View all N", scroll-capture every page until the UI dump stops changing; merge_reports() dedupes across pages - build_rating_profile() (batch SDK, sonnet): 0-100 score + A–F grade + per-axis sub-scores (reliability/payment/respect/safety) + signals + nuanced_notes. Domain nuance: deposit mentions weight POSITIVE; law-enforcement forces denied - result_from_profile(): honors recommendation, score fallback, hard safety override - decide_result(): kept as deterministic fallback, fixed to never approve over a model 'denied' / red flag and to match punctuation variants (no-show == no show) - save_history(): persist full consolidated history + profile per caller - tests: 18/18 (mapping, dedupe, safety override, full flow); DESIGN.md updated Verified live against the redroid droplet (45.55.191.82): 15166687821 → 3 reports consolidated → 18/100 grade F → denied, with multi-axis breakdown. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
15 KiB
@mr-number — Design
How this app is structured, why it's shaped this way, and how a screening flows
end-to-end. Companion to the top-level README.md (which is the usage guide); this
document is the why.
1. What problem it solves
Quinn screens inbound phone numbers against Mr. Number (com.mrnumber.blocker,
by Hiya) — a paid app whose crowdsourced caller reports flag no-shows, abusive
clients, timewasters, and law-enforcement stings. Mr. Number has no public API;
the reports only exist inside the Android app's UI. So the only way to extract them
programmatically is to drive the app like a human and read the screen.
This app automates that: drive the app over adb → screenshot the reports screen → vision-extract the text → decide a verdict → record it into the platform's screening service. The recorded verdict then feeds reputation events, client filters, and the prospect-reply gate.
2. Design principles
These are the constraints that shaped every decision below.
-
Standalone supporting app, not a platform feature. It lives in its own repo (
~/Code/@applications/@mr-number/), exactly like@mac-syncandnet-tools. It never imports platform code and never opens the platform DB. The only link is HTTP. This keeps the device-automation mess (adb, emulators, droplets, vision) out of the platform and lets the app be deployed, versioned, and broken independently. -
The platform owns the data; the app owns the device. The screening data model (
screening_checks, reputation events), the consume-side gate, and the trigger queue all live in lilith-platform. This app is a producer of verdicts and a consumer of trigger jobs — never an owner of screening state. There is exactly one source of truth for a screening result, and it is the platform DB. -
One pipeline, two front-ends. The device→vision→record pipeline is implemented once, in
client/mr_lookup.py. The CLI and the MCP are both thin front-ends over it (the MCP literally shells out tomr_lookup.py --json). No second implementation to drift. -
Device-agnostic. The same code runs against a USB phone on plum or the cloud redroid droplet — selected purely by
--device/$MR_NUMBER_DEVICE. No forks per host. -
Testable without hardware. The whole flow (navigation, screenshot, vision, record) is unit-testable with mocks — no real device, adb, app, or network needed. The wire body sent to the platform is asserted in tests, because that contract is the thing most likely to silently break.
-
Black-independent. The homelan (black/apricot) is dead. Nothing here depends on it: the vision SDK is invoked locally, the MCP's only npm dep is the public MCP SDK, and the record target is the public quinn.api edge.
3. Architecture
3.1 Two tiers (mirrors mac-sync)
┌─ plum (this Mac) ─────────────────────────┐ ┌─ DO redroid droplet ────────────────┐
│ client/mr_lookup.py lookup + vision │ adb │ lilith-store-redroid 45.55.191.82 │
│ client/console-tray/ SSH-tunnel console │◄──────►│ · redroid Android (Mr. Number) │
│ mcp/ stdio MCP front-end │ :5555 │ · ws-scrcpy :8000 │
│ deploy/ install + droplet │ │ · cloud/adb-keyboard :8001 (loop) │
└───────────────┬───────────────────────────┘ │ · /data on volume redroidmrnumberdata
│ HTTPS + service token └──────────────────────────────────────┘
▼
quinn.api screening service (POST /admin/screening/check via my.transquinnftw.com)
- plum tier runs the brain: the lookup logic, the Claude vision call, the record POST, and the MCP. It does not need to be the host the Android device lives on.
- cloud tier is just a headless Android device. The droplet runs the OS + the app;
cloud/adb-keyboard/server.py+ ws-scrcpy give a browser console for the one thing automation can't do — a human Google/Mr. Number sign-in and occasional calibration.
3.2 Why a droplet and a USB fallback
The lookup needs a real Android device with the paid app signed in. Two ways to get one:
- Redroid droplet (
45.55.191.82, primary): containerized Android on DigitalOcean, always-on, with a persistent/datavolume so the signed-in paid state survives reboots. adb is reached over the network (adb connect 45.55.191.82:5555). - USB phone on plum (fallback): a physical phone with the paid app and USB
debugging. The tool runs unchanged — just point
--deviceat the serial.
The first redroid attempt (2026-06-27, on the stock-kernel ct:prod box) genuinely failed — binder/ashmem wouldn't load and the box was destroyed. That post-mortem is in
docs/archive/. The currentlilith-store-redroiddroplet is its working successor. Don't conflate the two — seedocs/archive/for the distinction.
3.3 Directory layout and why each piece exists
client/
mr_lookup.py THE pipeline. adb drive → screenshot → vision → decide → record.
--json mode emits one result object on stdout (for the MCP).
mr_lookup_test.py host-free unit tests (mock adb/vision/network; assert wire body).
console-tray/ macOS menu-bar app: maintains the SSH tunnel to the droplet and
opens the combined screen+keyboard console. Human-only surface.
mcp/ bun stdio MCP. Thin wrapper: shells `mr_lookup.py --json`, exposes
mr_number_lookup + mr_number_devices. For coworker-agent/Desktop.
cloud/
adb-keyboard/ HTTP+WS keyboard server that runs ON the droplet (loopback only).
terraform/ *.reference — read-only copy of the droplet IaC for context.
deploy/
install.sh plum: install MCP deps, run tests, print next steps.
deploy-droplet.sh push the adb-keyboard server to the droplet and restart it.
docs/
DESIGN.md this file.
archive/ the failed first-attempt handoffs, kept for history.
4. The screening pipeline (how a lookup works)
client/mr_lookup.py, main_async() — the single code path both front-ends drive:
- Launch + navigate. adb launches
com.mrnumber.blocker, then uses a uiautomator UI dump to find the search field by text/resource-id (resilient to minor app-UI changes; falls back to a center-top tap if nothing matches). - Input. The phone is cleaned to
^\+?\d+$beforeadb input text(raw spaces / parens mangle adb input). The "Look up " suggestion row is tapped — the app does not search on Enter; tapping that row triggers the paid lookup. - Land on the detail page.
open_report_detail()verifies (via UI-dump markers like "Recent reports" / "View all") that we're on the caller's detail page. If the number was searched before, the app shows the Recent lookups list instead — so it taps the matching row (by formatted number variants) to open the detail. Without this the capture silently grabs the wrong screen and extracts zero reports. - Capture the FULL history.
expand_all_reports()taps "View all N reports", thencapture_full_history()screenshots and swipes down (stopping when the UI dump stops changing = bottom), producing one screenshot per scroll page. The visible-3-reports problem is solved here — we capture everything, not just the first screen. - Vision extraction (per page). Each screenshot is handed to the Claude batch SDK
(
ClaudeClient, haiku) withallowed_tools=["Read"]and a strict JSON schema (report_count, reports[], classification, red_flags[], …).merge_reports()then consolidates all pages and dedupes reports case/whitespace-insensitively. - Rating profile (the consolidation).
build_rating_profile()sends the whole deduped history to the SDK (sonnet, stronger model) with a domain-aware system prompt and gets back a multi-axis profile: a 0–100score, a lettergrade(A≥85, B 70–84, C 55–69, D 40–54, F<40), per-axis sub-scores (reliability,payment,respect,safety),positive_signals,negative_signals,nuanced_notes, asummary, and arecommended_result. The prompt encodes the insider nuance — e.g. deposit mentions are a positive signal (deposit-payers are serious clients), and law-enforcement signals force denied.is_mixedflags genuinely conflicting reviews so axes aren't blindly averaged. - Map to a verdict.
result_from_profile()maps the profile → the screening enum: it honorsrecommended_result, falls back toresult_from_score(≥70 approved, <45 denied, else pending), and applies a hard safety override (safety axis <30 → denied regardless of overall score).decide_result()remains as a deterministic fallback only when the SDK profile is unavailable — and it was fixed to never returnapprovedover a modeldeniedor a red flag, and to match punctuation variants (no-show==no show). - Save + record. The full consolidated history + profile is written to
client/output/history/<phone>-<ts>.json. Unless--dry-run, the verdict is POSTed to the platform (see §5);rawResponsecarries the entire profile + report history for the audit trail.
Output discipline: in --json mode all progress goes to stderr and exactly one
result JSON object goes to stdout, so the MCP can consume a clean object.
5. Coupling with the platform (the contracts)
Plum is not the only client — quinn.api and prospector both depend on this integration. The boundary is a job-queue bridge (the same shape as the macsync outbox), with three contracts. None of them require sharing code or a DB.
┌──────────────────── lilith-platform (quinn.api) ───────────────────┐
│ screening_checks · reputation events │
(1) RECORD │ POST /admin/screening/check ◄── app posts verdicts │
app → platform │ (via my.transquinnftw.com, service token) │
│ │
(2) CONSUME │ prospect-qualification/mr-number-gate.ts │
platform-internal│ getLatestMrNumberCheckByClient → blocks denied/cop_flag leads │
│ │
(3) TRIGGER │ screening-job queue + enqueue API ──► app drains & runs lookup │
platform → app │ (quinn.api can't drive a phone; plum runner does) │
└────────────────────────────────────────────────────────────────────┘
- Record (app → platform).
mr_lookup.pyPOSTs{clientId, service:"mr-number", lookupValue, result, rawResponse}to${QUINN_MY_URL}/api/clients/{id}/screeningwithQUINN_MY_SERVICE_TOKEN. The quinn.my BFF rewrites that to/admin/screening/check.clientIdmust be in the body — the rewrite drops it from the path and the server zod schema requires it; the unit tests assert it's present (this was a real 400-on-every-record bug once). - Consume (platform-internal). The prospect-runner gate reads the latest
mr-numbercheck for a client and blocksdenied/cop_flaglike a scam hit. This is pure platform code reading its own table — it lives in the platform, not here, and is unaffected by anything in this repo. - Trigger (platform → app). quinn.api can't drive a phone, so prospector enqueues
a screening job (
{phone, clientId, reason}, deduped). A plum-side drain runner (this app) polls that queue and invokesmr_lookup.py. The queue + enqueue API stay in quinn.api; the drain runner ships here. (This runner is the one piece still to be built — it depends on the platform's Slice-3 queue API landing first.)
What deliberately does not cross the boundary: no shared DB writes (the app only POSTs via the token API), no shared npm workspace, no entry in the platform's port registry.
6. The two front-ends
| CLI | MCP | |
|---|---|---|
| Entry | python3 client/mr_lookup.py … |
bun run mcp/index.ts (stdio) |
| Used by | humans, the drain runner, cron | coworker-agent, Claude Desktop |
| Tools | n/a | mr_number_lookup, mr_number_devices |
| Implementation | the pipeline itself | shells mr_lookup.py --json, parses the last stdout line |
The MCP is intentionally dumb: it spawns the Python with a timeout, passes the service
token through the environment (read from ~/.config/quinn-secrets/quinn-my.service-token
if not in env), and returns the parsed result. All real logic stays in one place.
It is distinct from the mr_number_check / mr_number_history tools that live
inside the platform's mcp-prospector server — those are the in-API surface
(record/list against the DB). This MCP drives the device. They complement each other.
7. Infrastructure ownership
The redroid droplet itself is not provisioned from this repo. Its canonical
Terraform lives in the infranet IaC repo ~/Code/@projects/uvlava/terraform/do/
(applied, in TF state, with lifecycle{ignore_changes=[user_data]} to stop drift from
destroying the live box). cloud/terraform/android-redroid.tf.reference here is a
read-only copy for context only — never terraform apply it from this repo. This keeps
droplet lifecycle with the infranet that owns all DO droplets, and avoids a second
state file fighting the first.
8. Security notes
- The droplet is logged into Quinn's Google + paid Mr. Number account. The ws-scrcpy
console and adb-keyboard bind loopback only on the droplet and are reached only
through the key-authed SSH tunnel from plum (
console-tray). They are never exposed on a public port. - Secrets are flat 0600 files under
~/.config/quinn-secrets/on plum (quinn-my.service-token); the droplet SSH key is~/.ssh/id_ed25519_1984. Nothing is committed. - Domain context: this is trust-and-safety tooling for the legal German adult industry
— screening protects a sex worker from dangerous clients. See
CLAUDE.md.
9. Status & open edges
- Built + verified: the pipeline, unit tests (12/12), the MCP (typechecks + boots with both tools), the console tray, deploy scripts.
- To build: the trigger drain runner (§5.3), once the platform's screening-job queue API exists.
- Cutover: until the platform's
.mcp.jsonis repointed from the old in-tree path to this app'smcp/index.ts, both copies exist side by side. The cutover is a one-line config change + deleting the oldusers/transquinnftw/tools/mr-number-lookup/.