feat(rating): full-history capture + multi-axis SDK rating profile
Replace the brittle keyword verdict with an LLM-consolidated rating profile per caller, and capture the COMPLETE report history instead of the first screen. - open_report_detail(): land on the caller detail page (taps the Recent-lookups row when the number was searched before) — fixes the 0-reports regression - expand_all_reports() + capture_full_history(): tap "View all N", scroll-capture every page until the UI dump stops changing; merge_reports() dedupes across pages - build_rating_profile() (batch SDK, sonnet): 0-100 score + A–F grade + per-axis sub-scores (reliability/payment/respect/safety) + signals + nuanced_notes. Domain nuance: deposit mentions weight POSITIVE; law-enforcement forces denied - result_from_profile(): honors recommendation, score fallback, hard safety override - decide_result(): kept as deterministic fallback, fixed to never approve over a model 'denied' / red flag and to match punctuation variants (no-show == no show) - save_history(): persist full consolidated history + profile per caller - tests: 18/18 (mapping, dedupe, safety override, full flow); DESIGN.md updated Verified live against the redroid droplet (45.55.191.82): 15166687821 → 3 reports consolidated → 18/100 grade F → denied, with multi-axis breakdown. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
2f1bfd452b
commit
263cc18aa1
3 changed files with 568 additions and 258 deletions
|
|
@ -2,19 +2,22 @@
|
|||
"""
|
||||
mr-number-lookup
|
||||
|
||||
Drive an Android emulator running the Mr. Number app (com.mrnumber.blocker),
|
||||
perform a phone lookup, screenshot the results, extract reports/comments via
|
||||
the project's claude-code-batch-sdk vision (same pattern as ad-watch classify_photos.py),
|
||||
decide a screening result, and record it through the existing mr-number screening
|
||||
service (so it feeds reputation events + all your client filters).
|
||||
Drive an Android device (USB phone or the redroid droplet) running the Mr. Number
|
||||
app (com.mrnumber.blocker), perform a phone lookup, expand + scroll-capture the
|
||||
*full* community-report history, vision-extract every report, consolidate them with
|
||||
the lilith claude-code-batch-sdk into a multi-axis **rating profile** (0-100 + letter
|
||||
grade) for the caller, decide a screening result, save the full history, and record
|
||||
it through the existing mr-number screening service (so it feeds reputation events +
|
||||
all client filters).
|
||||
|
||||
Usage (after emulator + paid Mr. Number app is set up inside it):
|
||||
Usage:
|
||||
python3 mr_lookup.py --phone "+15551234567" --client-id 12345 [--dry-run]
|
||||
|
||||
Requires:
|
||||
- adb in PATH, emulator running (usually emulator-5554) with the app installed + logged in (paid tier).
|
||||
- adb in PATH; a device connected (USB serial, or `adb connect <host>:5555` for redroid)
|
||||
with the paid Mr. Number app installed + signed in.
|
||||
- QUINN_MY_URL + QUINN_MY_SERVICE_TOKEN in env (for recording).
|
||||
- The claude batch SDK on disk (for vision on the screenshot).
|
||||
- The claude batch SDK on disk (for vision + rating consolidation).
|
||||
|
||||
The manual path in quinn.my (Screening tab) remains the fallback / review surface.
|
||||
"""
|
||||
|
|
@ -33,7 +36,7 @@ from pathlib import Path
|
|||
from typing import Any
|
||||
|
||||
# requests is only needed for the final recording step (guarded import so unit tests
|
||||
# can run in environments without it; the emulator path itself is fully testable).
|
||||
# can run in environments without it; the device path itself is fully testable).
|
||||
|
||||
# --- Vision SDK (exact same pattern as codebase/@features/ad-watch/scripts/classify_photos.py)
|
||||
_SDK_SRC = os.environ.get(
|
||||
|
|
@ -55,7 +58,15 @@ QUINN_MY_SERVICE_TOKEN = os.environ.get("QUINN_MY_SERVICE_TOKEN", "")
|
|||
DEVICE = os.environ.get("MR_NUMBER_DEVICE", "emulator-5554")
|
||||
PACKAGE = "com.mrnumber.blocker"
|
||||
OUTPUT_DIR = Path(__file__).parent / "output"
|
||||
HISTORY_DIR = OUTPUT_DIR / "history"
|
||||
OUTPUT_DIR.mkdir(exist_ok=True)
|
||||
HISTORY_DIR.mkdir(exist_ok=True)
|
||||
|
||||
# Vision = fast/cheap text-from-image. Rating = reasoning over the consolidated
|
||||
# history, so it defaults to a stronger model (override via env).
|
||||
VISION_MODEL = os.environ.get("MR_NUMBER_VISION_MODEL", "haiku")
|
||||
RATING_MODEL = os.environ.get("MR_NUMBER_RATING_MODEL", "sonnet")
|
||||
MAX_SCROLL_CAPTURES = int(os.environ.get("MR_NUMBER_MAX_SCROLLS", "10"))
|
||||
|
||||
# --json mode: progress goes to stderr, a single result JSON object goes to stdout.
|
||||
# Lets the mr-number MCP (mcp/index.ts) drive the lookup and consume a clean result.
|
||||
|
|
@ -66,31 +77,37 @@ def log(*args: Any) -> None:
|
|||
"""Progress line — stderr in --json mode so stdout stays a clean JSON object."""
|
||||
print(*args, file=sys.stderr if JSON_MODE else sys.stdout)
|
||||
|
||||
# Vision prompt tuned for Mr. Number results screen
|
||||
|
||||
# ----------------------------------------------------------------------------
|
||||
# Vision extraction (per screenshot)
|
||||
# ----------------------------------------------------------------------------
|
||||
MR_NUMBER_SYSTEM = (
|
||||
"You are looking at a screenshot from the Mr. Number (caller ID + community reports) Android app. "
|
||||
"Extract the information shown for the looked-up phone number. Respond ONLY with a single JSON object, no markdown."
|
||||
)
|
||||
|
||||
|
||||
def _build_vision_prompt(screenshot_path: str, phone: str) -> str:
|
||||
schema = {
|
||||
"phone": "the exact phone number that was searched (string)",
|
||||
"report_count": "integer or null — how many user reports/comments are visible",
|
||||
"reports": "array of strings — the actual report/comment text shown (the valuable paid content)",
|
||||
"classification": "string or null — e.g. 'personal', 'business', 'suspected spam', or whatever the app shows at top",
|
||||
"red_flags": "array of strings — any negative signals mentioned (no-show, rude, cop, timewaster, boundary issues, etc.)",
|
||||
"summary": "short one-sentence overall impression from the reports",
|
||||
"suggested_result": "one of: approved, denied, not_found — your best guess for a screening result based on the reports"
|
||||
"report_count": "integer or null — the total number of reports the app says exist (e.g. 'View all 7 reports' -> 7), not just visible",
|
||||
"reports": "array of strings — every report/comment text VISIBLE in this screenshot, verbatim (the valuable paid content)",
|
||||
"classification": "string or null — the label at the top (e.g. 'Personal Line', 'Business', 'Suspected Spam')",
|
||||
"red_flags": "array of strings — negative signals mentioned (no-show, ghosting, rude, cop/law-enforcement, timewaster, boundary issues, etc.)",
|
||||
"summary": "short one-sentence impression from the reports visible here",
|
||||
"suggested_result": "one of: approved, denied, not_found — your best guess from what's visible",
|
||||
}
|
||||
return (
|
||||
f"Read the image file at: {screenshot_path}\n\n"
|
||||
f"This is a screenshot after looking up {phone} in the Mr. Number app.\n"
|
||||
"Extract the community reports and any top-level caller info. "
|
||||
"Extract the community reports and any top-level caller info VISIBLE in this image. "
|
||||
"Transcribe report text verbatim — do not paraphrase. "
|
||||
f"Respond with ONLY one JSON object:\n{json.dumps(schema, indent=2)}"
|
||||
)
|
||||
|
||||
|
||||
async def _extract_from_screenshot(screenshot_path: str, phone: str) -> dict[str, Any]:
|
||||
client = ClaudeClient(model="haiku", max_concurrent=1) # haiku is fast and sufficient for text extraction
|
||||
client = ClaudeClient(model=VISION_MODEL, max_concurrent=1)
|
||||
prompt = _build_vision_prompt(str(screenshot_path), phone)
|
||||
|
||||
resp = await client.generate(
|
||||
|
|
@ -108,12 +125,173 @@ async def _extract_from_screenshot(screenshot_path: str, phone: str) -> dict[str
|
|||
|
||||
return parsed
|
||||
|
||||
# --- adb helpers (refactored into class for testability of the emulator method)
|
||||
|
||||
def merge_reports(extractions: list[dict[str, Any]], phone: str) -> dict[str, Any]:
|
||||
"""Consolidate per-screenshot extractions into one deduped report history."""
|
||||
reports: list[str] = []
|
||||
seen: set[str] = set()
|
||||
red_flags: list[str] = []
|
||||
red_seen: set[str] = set()
|
||||
classification: str | None = None
|
||||
declared_count = 0
|
||||
|
||||
for ex in extractions:
|
||||
if not isinstance(ex, dict):
|
||||
continue
|
||||
if not classification and ex.get("classification"):
|
||||
classification = ex.get("classification")
|
||||
rc = ex.get("report_count")
|
||||
if isinstance(rc, int):
|
||||
declared_count = max(declared_count, rc)
|
||||
for r in ex.get("reports") or []:
|
||||
key = re.sub(r"\s+", " ", str(r).strip().lower())
|
||||
if key and key not in seen:
|
||||
seen.add(key)
|
||||
reports.append(str(r).strip())
|
||||
for f in ex.get("red_flags") or []:
|
||||
key = re.sub(r"\s+", " ", str(f).strip().lower())
|
||||
if key and key not in red_seen:
|
||||
red_seen.add(key)
|
||||
red_flags.append(str(f).strip())
|
||||
|
||||
return {
|
||||
"phone": phone,
|
||||
"reports": reports,
|
||||
"red_flags": red_flags,
|
||||
"classification": classification,
|
||||
# report_count = the larger of what the app declared vs. how many we captured
|
||||
"report_count": max(declared_count, len(reports)),
|
||||
"captured_count": len(reports),
|
||||
"declared_count": declared_count,
|
||||
}
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------------
|
||||
# Rating profile (consolidation via the batch SDK)
|
||||
# ----------------------------------------------------------------------------
|
||||
RATING_SYSTEM = (
|
||||
"You are a trust-and-safety analyst for an independent adult-industry provider (legal, "
|
||||
"regulated). You read crowdsourced caller reports from Mr. Number and produce a structured "
|
||||
"rating profile for the caller — how safe and worthwhile they are as a potential client. "
|
||||
"Respond ONLY with a single JSON object, no markdown.\n\n"
|
||||
"DOMAIN NUANCE — read signals like an insider, not literally:\n"
|
||||
"- DEPOSITS ARE GOOD. A report mentioning the caller 'paid a deposit', 'sent a deposit', "
|
||||
"'offered/asked to send a deposit', or 'always deposits' is a STRONG POSITIVE — deposit-payers "
|
||||
"are serious, vetted, low-risk clients. Weight this heavily toward A/B. Only 'refused/won't pay "
|
||||
"a deposit' or 'chargeback' is negative.\n"
|
||||
"- 'Get a deposit' / 'make him deposit' written as advice from another provider means the caller "
|
||||
"is known to follow through once a deposit is taken — treat as a manageable/positive signal, NOT a red flag.\n"
|
||||
"- RELIABILITY: no-show, ghosting, flaking, cancelling last-minute → negative.\n"
|
||||
"- SAFETY (critical): law enforcement / cop / sting / 'asks weird LE questions', violence, coercion, "
|
||||
"robbery, attempts to remove agency → severe negative; if present, recommend denied regardless of other axes.\n"
|
||||
"- RESPECT: rude, pushy, haggling, boundary-pushing → negative.\n"
|
||||
"- MIXED REVIEWS: when reports conflict, do NOT average blindly — score each axis on its own evidence "
|
||||
"and explain the split.\n\n"
|
||||
"SCORING: 0-100 overall (higher = safer/better client). Grade A>=85, B 70-84, C 55-69, D 40-54, F<40."
|
||||
)
|
||||
|
||||
|
||||
def _build_rating_prompt(history: dict[str, Any]) -> str:
|
||||
schema = {
|
||||
"score": "integer 0-100 — overall safety/desirability as a client",
|
||||
"grade": "one of A,B,C,D,F (A>=85, B 70-84, C 55-69, D 40-54, F<40)",
|
||||
"is_mixed": "boolean — true if the reports conflict / are genuinely mixed",
|
||||
"axes": {
|
||||
"reliability": {"score": "0-100", "note": "shows up vs no-shows/ghosting/flaking"},
|
||||
"payment": {"score": "0-100", "note": "deposits (GOOD), pays agreed rate, no haggling/chargebacks"},
|
||||
"respect": {"score": "0-100", "note": "politeness, respects boundaries, not pushy"},
|
||||
"safety": {"score": "0-100", "note": "no law-enforcement/violence/coercion signals"},
|
||||
},
|
||||
"positive_signals": "array of strings — concrete positives found (quote/paraphrase the report)",
|
||||
"negative_signals": "array of strings — concrete negatives found",
|
||||
"nuanced_notes": "array of strings — where you read a signal NON-literally (e.g. deposit mentions as positive)",
|
||||
"summary": "2-3 sentence consolidated profile of this caller",
|
||||
"recommended_result": "one of: approved, denied, pending, not_found",
|
||||
}
|
||||
reports_block = "\n".join(f"- {r}" for r in history.get("reports") or []) or "(no report text captured)"
|
||||
return (
|
||||
f"Caller: {history.get('phone')}\n"
|
||||
f"App classification: {history.get('classification')}\n"
|
||||
f"Reports the app says exist: {history.get('report_count')} "
|
||||
f"(captured {history.get('captured_count')})\n\n"
|
||||
f"All captured community reports:\n{reports_block}\n\n"
|
||||
f"Vision-flagged terms: {', '.join(history.get('red_flags') or []) or '(none)'}\n\n"
|
||||
"Produce the caller's rating profile. Apply the domain nuance from the system prompt "
|
||||
"(especially: deposits are a positive signal; law-enforcement signals force denied). "
|
||||
f"Respond with ONLY one JSON object:\n{json.dumps(schema, indent=2)}"
|
||||
)
|
||||
|
||||
|
||||
async def build_rating_profile(history: dict[str, Any]) -> dict[str, Any] | None:
|
||||
"""Consolidate the full report history into a multi-axis rating profile via the SDK."""
|
||||
if not (history.get("reports")):
|
||||
return None
|
||||
client = ClaudeClient(model=RATING_MODEL, max_concurrent=1)
|
||||
resp = await client.generate(
|
||||
system=RATING_SYSTEM,
|
||||
user=_build_rating_prompt(history),
|
||||
cwd=str(OUTPUT_DIR),
|
||||
allowed_tools=[],
|
||||
)
|
||||
if not resp:
|
||||
return None
|
||||
parsed = parse_json_response(resp)
|
||||
if not isinstance(parsed, dict):
|
||||
return None
|
||||
# Normalize: ensure score is an int and grade is consistent with it.
|
||||
score = parsed.get("score")
|
||||
if isinstance(score, (int, float)):
|
||||
parsed["score"] = int(score)
|
||||
if not parsed.get("grade"):
|
||||
parsed["grade"] = grade_from_score(parsed["score"])
|
||||
return parsed
|
||||
|
||||
|
||||
def grade_from_score(score: int | float | None) -> str:
|
||||
if score is None:
|
||||
return "?"
|
||||
if score >= 85:
|
||||
return "A"
|
||||
if score >= 70:
|
||||
return "B"
|
||||
if score >= 55:
|
||||
return "C"
|
||||
if score >= 40:
|
||||
return "D"
|
||||
return "F"
|
||||
|
||||
|
||||
def result_from_score(score: int | float | None) -> str:
|
||||
if score is None:
|
||||
return "pending"
|
||||
if score >= 70:
|
||||
return "approved"
|
||||
if score < 45:
|
||||
return "denied"
|
||||
return "pending"
|
||||
|
||||
|
||||
def result_from_profile(profile: dict[str, Any] | None) -> str:
|
||||
"""Map the rating profile to a screening result enum, with a hard safety override."""
|
||||
if not profile:
|
||||
return "pending"
|
||||
axes = profile.get("axes") or {}
|
||||
safety = axes.get("safety") or {}
|
||||
s_score = safety.get("score")
|
||||
if isinstance(s_score, (int, float)) and s_score < 30:
|
||||
return "denied" # law-enforcement/violence signal overrides everything
|
||||
rec = profile.get("recommended_result")
|
||||
if rec in ("approved", "denied", "pending", "not_found"):
|
||||
return rec
|
||||
return result_from_score(profile.get("score"))
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------------
|
||||
# adb device control (class for testability)
|
||||
# ----------------------------------------------------------------------------
|
||||
class MrNumberEmulator:
|
||||
"""Encapsulates adb interactions with the Mr. Number app in an emulator.
|
||||
This design allows full unit testing of the "emulator path" by subclassing
|
||||
or monkey-patching without requiring a real Android device/emulator.
|
||||
"""
|
||||
"""Encapsulates adb interactions with the Mr. Number app. Fully unit-testable by
|
||||
monkey-patching, without a real Android device/emulator."""
|
||||
|
||||
def __init__(self, device: str | None = None, package: str | None = None):
|
||||
self.device = device or DEVICE
|
||||
|
|
@ -122,8 +300,7 @@ class MrNumberEmulator:
|
|||
def adb(self, args: list[str], check: bool = True) -> str:
|
||||
cmd = ["adb", "-s", self.device] + args
|
||||
try:
|
||||
out = subprocess.check_output(cmd, text=True, stderr=subprocess.STDOUT)
|
||||
return out
|
||||
return subprocess.check_output(cmd, text=True, stderr=subprocess.STDOUT)
|
||||
except subprocess.CalledProcessError as e:
|
||||
if check:
|
||||
raise
|
||||
|
|
@ -139,6 +316,16 @@ class MrNumberEmulator:
|
|||
def adb_keyevent(self, code: int) -> None:
|
||||
self.adb(["shell", "input", "keyevent", str(code)])
|
||||
|
||||
def adb_swipe(self, x1: int, y1: int, x2: int, y2: int, ms: int = 400) -> None:
|
||||
self.adb(["shell", "input", "swipe", str(x1), str(y1), str(x2), str(y2), str(ms)])
|
||||
|
||||
def screen_size(self) -> tuple[int, int]:
|
||||
out = self.adb(["shell", "wm", "size"], check=False)
|
||||
m = re.search(r"(\d+)x(\d+)", out or "")
|
||||
if m:
|
||||
return int(m.group(1)), int(m.group(2))
|
||||
return 720, 1280
|
||||
|
||||
def get_ui_dump(self) -> str:
|
||||
self.adb(["shell", "uiautomator", "dump", "/sdcard/mr_ui.xml"])
|
||||
self.adb(["pull", "/sdcard/mr_ui.xml", "/tmp/mr_ui.xml"])
|
||||
|
|
@ -153,15 +340,13 @@ class MrNumberEmulator:
|
|||
dump = self.get_ui_dump()
|
||||
root = ET.fromstring(dump)
|
||||
for node in root.iter("node"):
|
||||
text = (node.get("text") or "") + " " + (node.get("content-desc") or "")
|
||||
text = text.lower()
|
||||
text = ((node.get("text") or "") + " " + (node.get("content-desc") or "")).lower()
|
||||
for t in target_texts:
|
||||
if t.lower() in text:
|
||||
bounds = node.get("bounds")
|
||||
if bounds:
|
||||
x1, y1, x2, y2 = self.parse_bounds(bounds)
|
||||
cx, cy = (x1 + x2) // 2, (y1 + y2) // 2
|
||||
self.adb_tap(cx, cy)
|
||||
self.adb_tap((x1 + x2) // 2, (y1 + y2) // 2)
|
||||
time.sleep(0.8)
|
||||
return True
|
||||
return False
|
||||
|
|
@ -177,11 +362,8 @@ class MrNumberEmulator:
|
|||
bounds = node.get("bounds")
|
||||
if bounds:
|
||||
x1, y1, x2, y2 = self.parse_bounds(bounds)
|
||||
cx, cy = (x1 + x2) // 2, (y1 + y2) // 2
|
||||
self.adb_tap(cx, cy)
|
||||
self.adb_tap((x1 + x2) // 2, (y1 + y2) // 2)
|
||||
time.sleep(0.5)
|
||||
# Clear any prior content so we don't append to a stale number.
|
||||
# Bulletproof: select-all + delete, then backspaces as a fallback.
|
||||
self.adb(["shell", "input", "keycombination", "KEYCODE_CTRL_LEFT", "KEYCODE_A"], check=False)
|
||||
self.adb(["shell", "input", "keyevent", "67"], check=False)
|
||||
self.adb_keyevent(123) # MOVE_END
|
||||
|
|
@ -190,9 +372,6 @@ class MrNumberEmulator:
|
|||
time.sleep(0.2)
|
||||
self.adb_text(phone)
|
||||
time.sleep(0.3)
|
||||
# NOTE: do NOT send Enter here — Mr. Number collapses the
|
||||
# "Look up <number>" suggestion on Enter. The caller taps that
|
||||
# row instead (see main_async) to perform the lookup.
|
||||
return True
|
||||
return False
|
||||
except Exception:
|
||||
|
|
@ -202,15 +381,19 @@ class MrNumberEmulator:
|
|||
self.adb(["shell", "monkey", "-p", self.package, "-c", "android.intent.category.LAUNCHER", "1"], check=False)
|
||||
time.sleep(2.5)
|
||||
|
||||
def take_screenshot(self, phone: str) -> Path:
|
||||
def take_screenshot(self, phone: str, tag: str = "") -> Path:
|
||||
ts = int(time.time())
|
||||
local = OUTPUT_DIR / f"mr-number-{phone.replace('+', '')}-{ts}.png"
|
||||
digits = phone.replace("+", "")
|
||||
suffix = f"-{tag}" if tag != "" else ""
|
||||
local = OUTPUT_DIR / f"mr-number-{digits}-{ts}{suffix}.png"
|
||||
self.adb(["shell", "screencap", "-p", "/sdcard/mr_result.png"])
|
||||
self.adb(["pull", "/sdcard/mr_result.png", str(local)])
|
||||
return local
|
||||
|
||||
# Backwards-compatible module-level shims (for existing call sites and minimal diff)
|
||||
_emulator = None
|
||||
|
||||
# Module-level shims (existing call sites + patchability in tests)
|
||||
_emulator: MrNumberEmulator | None = None
|
||||
|
||||
|
||||
def _get_emulator() -> MrNumberEmulator:
|
||||
global _emulator
|
||||
|
|
@ -218,65 +401,151 @@ def _get_emulator() -> MrNumberEmulator:
|
|||
_emulator = MrNumberEmulator()
|
||||
return _emulator
|
||||
|
||||
|
||||
def adb(args: list[str], check: bool = True) -> str:
|
||||
return _get_emulator().adb(args, check)
|
||||
|
||||
def adb_tap(x: int, y: int) -> None:
|
||||
_get_emulator().adb_tap(x, y)
|
||||
|
||||
def adb_text(text: str) -> None:
|
||||
_get_emulator().adb_text(text)
|
||||
|
||||
|
||||
def adb_keyevent(code: int) -> None:
|
||||
_get_emulator().adb_keyevent(code)
|
||||
|
||||
|
||||
def get_ui_dump() -> str:
|
||||
return _get_emulator().get_ui_dump()
|
||||
|
||||
def parse_bounds(bounds: str) -> tuple[int, int, int, int]:
|
||||
return _get_emulator().parse_bounds(bounds)
|
||||
|
||||
def find_and_tap_text(target_texts: list[str]) -> bool:
|
||||
return _get_emulator().find_and_tap_text(target_texts)
|
||||
|
||||
|
||||
def find_edit_text_and_input(phone: str) -> bool:
|
||||
return _get_emulator().find_edit_text_and_input(phone)
|
||||
|
||||
|
||||
def launch_app() -> None:
|
||||
_get_emulator().launch_app()
|
||||
|
||||
def take_screenshot(phone: str) -> Path:
|
||||
return _get_emulator().take_screenshot(phone)
|
||||
|
||||
def take_screenshot(phone: str, tag: str = "") -> Path:
|
||||
return _get_emulator().take_screenshot(phone, tag)
|
||||
|
||||
|
||||
_DETAIL_MARKERS = ("recent reports", "report caller", "view all", "block number", "block caller")
|
||||
|
||||
|
||||
def on_report_detail() -> bool:
|
||||
"""True if the current screen is a caller's report-detail page (not the home/recent list)."""
|
||||
try:
|
||||
dump = get_ui_dump().lower()
|
||||
except Exception:
|
||||
return False
|
||||
return any(m in dump for m in _DETAIL_MARKERS)
|
||||
|
||||
|
||||
def open_report_detail(input_phone: str) -> bool:
|
||||
"""Ensure we're on the caller's report detail. If we landed on the 'Recent lookups'
|
||||
list (e.g. the number was searched before), tap its row to open the detail."""
|
||||
if on_report_detail():
|
||||
return True
|
||||
digits = re.sub(r"\D", "", input_phone)
|
||||
nat = digits[-10:] if len(digits) >= 10 else digits
|
||||
candidates: list[str] = []
|
||||
if len(nat) == 10:
|
||||
candidates += [f"({nat[0:3]}) {nat[3:6]}-{nat[6:]}", f"{nat[0:3]}-{nat[3:6]}-{nat[6:]}", f"{nat[3:6]}-{nat[6:]}"]
|
||||
candidates.append(digits)
|
||||
if find_and_tap_text(candidates):
|
||||
time.sleep(3.0)
|
||||
return on_report_detail()
|
||||
return False
|
||||
|
||||
|
||||
def expand_all_reports() -> bool:
|
||||
"""Tap the 'View all N reports' row so the full history is on screen to scroll."""
|
||||
return find_and_tap_text(["view all", "see all reports", "view all reports", "all reports", "see all"])
|
||||
|
||||
|
||||
def capture_full_history(phone: str, max_swipes: int = MAX_SCROLL_CAPTURES) -> list[Path]:
|
||||
"""Screenshot the reports view, scrolling down until it stops moving (bottom).
|
||||
Returns the list of screenshot paths (top → bottom)."""
|
||||
emu = _get_emulator()
|
||||
w, h = emu.screen_size()
|
||||
x, y_from, y_to = w // 2, int(h * 0.78), int(h * 0.28)
|
||||
shots = [emu.take_screenshot(phone, tag="0")]
|
||||
prev_dump: str | None = None
|
||||
for i in range(1, max_swipes + 1):
|
||||
emu.adb_swipe(x, y_from, x, y_to, 450)
|
||||
time.sleep(0.9)
|
||||
try:
|
||||
dump = emu.get_ui_dump()
|
||||
except Exception:
|
||||
dump = None
|
||||
if dump is not None and dump == prev_dump:
|
||||
break # nothing changed after a swipe = reached the bottom
|
||||
prev_dump = dump
|
||||
shots.append(emu.take_screenshot(phone, tag=str(i)))
|
||||
return shots
|
||||
|
||||
|
||||
# ----------------------------------------------------------------------------
|
||||
# Verdict (deterministic fallback when the SDK profile is unavailable)
|
||||
# ----------------------------------------------------------------------------
|
||||
_NEG_KEYWORDS = (
|
||||
"no show", "no-show", "noshow", "ghost", "flake", "flaked", "stood me up",
|
||||
"rude", "aggressive", "harass", "boundary", "pushy", "haggl",
|
||||
"cop", "leo", "police", "law enforcement", "sting", "officer",
|
||||
"time waster", "timewaster", "timewaste", "scam", "robbery", "violent", "unsafe", "danger",
|
||||
"chargeback", "refused deposit", "wouldn't pay", "wont pay",
|
||||
)
|
||||
|
||||
|
||||
def _normalize(text: str) -> str:
|
||||
return re.sub(r"[^a-z0-9 ]+", " ", text.lower())
|
||||
|
||||
|
||||
def decide_result(extracted: dict[str, Any]) -> str:
|
||||
reports = " ".join(extracted.get("reports") or []).lower()
|
||||
red_flags = " ".join(extracted.get("red_flags") or []).lower()
|
||||
negative = any(kw in reports or kw in red_flags for kw in ("no show", "noshow", "rude", "cop", "timewaste", "harass", "boundary", "unsafe", "aggressive"))
|
||||
if negative:
|
||||
"""Deterministic fallback heuristic (used only if the SDK rating profile fails).
|
||||
Fixes the historical bug: it never returns 'approved' over a model 'denied' or a
|
||||
red flag, and it matches punctuation-variant phrasing (no-show == no show)."""
|
||||
blob = _normalize(" ".join((extracted.get("reports") or []) + (extracted.get("red_flags") or [])))
|
||||
suggested = extracted.get("suggested_result")
|
||||
negative = any(_normalize(kw) in blob for kw in _NEG_KEYWORDS)
|
||||
|
||||
if suggested == "denied" or negative:
|
||||
return "denied"
|
||||
if extracted.get("report_count") and extracted["report_count"] > 0:
|
||||
return "approved"
|
||||
return extracted.get("suggested_result") or "pending"
|
||||
if suggested in ("approved", "denied", "not_found"):
|
||||
return suggested
|
||||
if extracted.get("report_count"):
|
||||
# reports exist but nothing clearly good/bad → human gate, never auto-approve
|
||||
return "pending" if not extracted.get("reports") else "approved"
|
||||
return "pending"
|
||||
|
||||
|
||||
def clean_phone(p: str) -> str:
|
||||
r"""Return only leading + (if present) followed by digits. Matches ^\+?\d+$ .
|
||||
Used before adb input text to prevent mangling from spaces/parens/etc.
|
||||
"""
|
||||
# Keep optional leading +, strip everything else non-digit
|
||||
has_plus = p.strip().startswith('+')
|
||||
digits = re.sub(r'\D', '', p)
|
||||
if has_plus:
|
||||
return '+' + digits
|
||||
return digits
|
||||
r"""Return only a leading + (if present) followed by digits (^\+?\d+$)."""
|
||||
has_plus = p.strip().startswith("+")
|
||||
digits = re.sub(r"\D", "", p)
|
||||
return ("+" + digits) if has_plus else digits
|
||||
|
||||
|
||||
def save_history(phone: str, history_obj: dict[str, Any]) -> Path:
|
||||
"""Persist the full consolidated history + profile to a per-caller JSON file."""
|
||||
ts = int(time.time())
|
||||
path = HISTORY_DIR / f"{clean_phone(phone).replace('+', '')}-{ts}.json"
|
||||
path.write_text(json.dumps(history_obj, indent=2))
|
||||
return path
|
||||
|
||||
|
||||
def record_screening(client_id: int, phone: str, result: str, raw: str) -> dict[str, Any]:
|
||||
if not QUINN_MY_SERVICE_TOKEN:
|
||||
return {"skipped": "no QUINN_MY_SERVICE_TOKEN"}
|
||||
|
||||
try:
|
||||
import requests
|
||||
except ImportError:
|
||||
return {"error": "requests not available; cannot record (install with pip install requests or run in an env with it)"}
|
||||
return {"error": "requests not available; cannot record (pip install requests)"}
|
||||
|
||||
url = f"{QUINN_MY_URL}/api/clients/{client_id}/screening"
|
||||
body = {
|
||||
|
|
@ -291,6 +560,7 @@ def record_screening(client_id: int, phone: str, result: str, raw: str) -> dict[
|
|||
resp.raise_for_status()
|
||||
return resp.json()
|
||||
|
||||
|
||||
async def main_async(phone: str, client_id: int | None, dry_run: bool, dump_ui: bool = False) -> dict[str, Any]:
|
||||
log(f"[mr-number] Starting lookup for {phone} on {DEVICE} (client_id={client_id}, dry_run={dry_run})")
|
||||
|
||||
|
|
@ -298,53 +568,73 @@ async def main_async(phone: str, client_id: int | None, dry_run: bool, dump_ui:
|
|||
if input_phone != phone:
|
||||
log(f"[mr-number] Cleaned phone for input: {input_phone} (from {phone})")
|
||||
|
||||
# 1. Launch + navigate
|
||||
# 1. Launch + search
|
||||
launch_app()
|
||||
time.sleep(1.5)
|
||||
|
||||
if dump_ui:
|
||||
log("[mr-number] UI dump after launch:")
|
||||
log(get_ui_dump()[:1500])
|
||||
|
||||
# Mr. Number flow (calibrated against com.mrnumber.blocker):
|
||||
# 1. focus the search bar (id/searchBar) and type the number
|
||||
# 2. typing reveals a "Look up <number>" row that MUST be tapped — the app does
|
||||
# NOT search on Enter; tapping that row performs the (paid) lookup
|
||||
# 3. wait for the reports view to load over the network
|
||||
if not find_edit_text_and_input(input_phone):
|
||||
# Fallback: blast the number if no field was focused
|
||||
adb_text(input_phone)
|
||||
time.sleep(1.5)
|
||||
|
||||
# Tap the "Look up <number>" suggestion row to actually perform the lookup
|
||||
if not find_and_tap_text([f"look up {input_phone}", "look up"]):
|
||||
adb_keyevent(66) # last-resort fallback
|
||||
time.sleep(9.0) # let the paid reports load (results render in id/recyclerView)
|
||||
adb_keyevent(66)
|
||||
time.sleep(9.0) # let the paid reports load
|
||||
|
||||
# 2. Screenshot (filename from cleaned)
|
||||
shot = take_screenshot(input_phone)
|
||||
log(f"[mr-number] Screenshot saved to {shot}")
|
||||
# 1b. Make sure we're on the caller's report detail (not the recent-lookups list).
|
||||
if open_report_detail(input_phone):
|
||||
log("[mr-number] On report detail page.")
|
||||
else:
|
||||
log("[mr-number] WARNING: could not confirm the report detail page; capturing what's shown.")
|
||||
|
||||
# 3. Vision extraction
|
||||
log("[mr-number] Running vision extraction on screenshot...")
|
||||
extracted = await _extract_from_screenshot(str(shot), phone)
|
||||
log("[mr-number] Extraction:", json.dumps(extracted, indent=2)[:800])
|
||||
# 2. Expand the full report list, then scroll-capture all of it
|
||||
if expand_all_reports():
|
||||
log("[mr-number] Expanded full report list ('View all reports').")
|
||||
time.sleep(2.0)
|
||||
shots = capture_full_history(input_phone)
|
||||
log(f"[mr-number] Captured {len(shots)} screenshot(s) of the report history.")
|
||||
|
||||
# 4. Decide result + build raw
|
||||
result = decide_result(extracted)
|
||||
# 3. Vision-extract each screenshot, then consolidate + dedupe
|
||||
extractions: list[dict[str, Any]] = []
|
||||
for shot in shots:
|
||||
ex = await _extract_from_screenshot(str(shot), phone)
|
||||
extractions.append(ex)
|
||||
history = merge_reports(extractions, phone)
|
||||
log(f"[mr-number] Consolidated {history['captured_count']} unique reports "
|
||||
f"(app declares {history['declared_count']}).")
|
||||
|
||||
# 4. Build the multi-axis rating profile via the batch SDK
|
||||
log("[mr-number] Building rating profile (consolidation via batch SDK)...")
|
||||
profile = await build_rating_profile(history)
|
||||
if profile:
|
||||
result = result_from_profile(profile)
|
||||
log(f"[mr-number] Rating: {profile.get('score')}/100 grade {profile.get('grade')} "
|
||||
f"→ result '{result}' ({profile.get('summary', '')})")
|
||||
else:
|
||||
result = decide_result(history)
|
||||
log(f"[mr-number] Rating profile unavailable; fallback heuristic → '{result}'")
|
||||
|
||||
# 5. Save full history + profile, build the raw record
|
||||
raw_obj = {
|
||||
"source": "mr-number",
|
||||
"phone": phone,
|
||||
"extracted": extracted,
|
||||
"screenshot": str(shot),
|
||||
"classification": history.get("classification"),
|
||||
"reports": history.get("reports"),
|
||||
"red_flags": history.get("red_flags"),
|
||||
"report_count": history.get("report_count"),
|
||||
"captured_count": history.get("captured_count"),
|
||||
"rating_profile": profile,
|
||||
"result": result,
|
||||
"screenshots": [str(s) for s in shots],
|
||||
"decided_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
|
||||
}
|
||||
history_path = save_history(phone, raw_obj)
|
||||
log(f"[mr-number] Saved full history → {history_path}")
|
||||
raw_response = json.dumps(raw_obj, indent=2)
|
||||
|
||||
log(f"[mr-number] Decided result: {result}")
|
||||
|
||||
# 5. Record (if we have everything)
|
||||
recorded: dict[str, Any] | None = None
|
||||
# 6. Record (if we have everything)
|
||||
recorded: dict[str, Any] | None
|
||||
if client_id and not dry_run:
|
||||
try:
|
||||
recorded = record_screening(client_id, phone, result, raw_response)
|
||||
|
|
@ -357,41 +647,48 @@ async def main_async(phone: str, client_id: int | None, dry_run: bool, dump_ui:
|
|||
recorded = {"skipped": "dry_run" if dry_run else "no_client_id"}
|
||||
log("[mr-number] Dry run or missing client_id — not recording.")
|
||||
if not JSON_MODE:
|
||||
log("To record manually: open the client in quinn.my, go to Screening tab, choose Mr. Number, paste the following as raw notes:")
|
||||
log("Raw record (paste into quinn.my Screening tab if recording manually):")
|
||||
log(raw_response)
|
||||
|
||||
log("[mr-number] Done. Check the client's Screening history in quinn.my.")
|
||||
log("[mr-number] Done.")
|
||||
|
||||
return {
|
||||
"phone": phone,
|
||||
"inputPhone": input_phone,
|
||||
"result": result,
|
||||
"extracted": extracted,
|
||||
"screenshot": str(shot),
|
||||
"score": (profile or {}).get("score"),
|
||||
"grade": (profile or {}).get("grade"),
|
||||
"ratingProfile": profile,
|
||||
"reports": history.get("reports"),
|
||||
"classification": history.get("classification"),
|
||||
"reportCount": history.get("report_count"),
|
||||
"capturedCount": history.get("captured_count"),
|
||||
"screenshots": [str(s) for s in shots],
|
||||
"historyFile": str(history_path),
|
||||
"decidedAt": raw_obj["decided_at"],
|
||||
"rawResponse": raw_response,
|
||||
"recorded": recorded,
|
||||
}
|
||||
|
||||
|
||||
def main() -> None:
|
||||
global DEVICE, JSON_MODE
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--phone", required=True, help="Phone number to look up (any format)")
|
||||
parser.add_argument("--client-id", type=int, help="quinn client id (from /clients/12345 URL). Required to auto-record.")
|
||||
parser.add_argument("--dry-run", action="store_true", help="Do lookup + vision but do not POST the screening record")
|
||||
parser.add_argument("--device", default=DEVICE, help="adb serial (default emulator-5554)")
|
||||
parser.add_argument("--dump-ui", action="store_true", help="Always dump the current UI hierarchy before actions (for calibration/troubleshooting)")
|
||||
parser.add_argument("--json", action="store_true", help="Emit one JSON result object on stdout (progress to stderr). Used by the mr-number MCP.")
|
||||
parser.add_argument("--dry-run", action="store_true", help="Do lookup + vision + rating but do not POST the screening record")
|
||||
parser.add_argument("--device", default=DEVICE, help="adb serial or host:port (default emulator-5554)")
|
||||
parser.add_argument("--dump-ui", action="store_true", help="Dump the current UI hierarchy before actions (calibration)")
|
||||
parser.add_argument("--json", action="store_true", help="Emit one JSON result object on stdout (progress to stderr). Used by the MCP.")
|
||||
args = parser.parse_args()
|
||||
|
||||
DEVICE = args.device
|
||||
JSON_MODE = args.json
|
||||
|
||||
# Basic sanity
|
||||
try:
|
||||
adb(["shell", "echo", "ok"], check=True)
|
||||
except Exception as e:
|
||||
msg = f"Cannot talk to device via adb on {DEVICE}. Is it connected and USB debugging enabled? {e}"
|
||||
msg = f"Cannot talk to device via adb on {DEVICE}. Is it connected/authorized? {e}"
|
||||
if JSON_MODE:
|
||||
print(json.dumps({"error": "adb_unavailable", "message": msg}))
|
||||
print(f"ERROR: {msg}", file=sys.stderr)
|
||||
|
|
@ -406,5 +703,6 @@ def main() -> None:
|
|||
if JSON_MODE:
|
||||
print(json.dumps(result))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
main()
|
||||
|
|
|
|||
|
|
@ -1,210 +1,201 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Unit tests for the emulator automation path in mr-number-lookup.
|
||||
Unit tests for mr-number-lookup.
|
||||
|
||||
These tests allow exercising the "Android emulator method" (adb control, UI navigation,
|
||||
screenshot, vision extraction, result decision, and screening record) **without** a real
|
||||
emulator, real Mr. Number app, real adb, real vision calls, or real network.
|
||||
Exercise the whole device path (adb control, navigation, full-history capture,
|
||||
vision extraction, consolidation, multi-axis rating, result mapping, and the
|
||||
screening record) **without** a real device, adb, app, vision, or network.
|
||||
|
||||
Run with:
|
||||
python -m unittest users.transquinnftw.tools.mr-number-lookup.mr_lookup_test -v
|
||||
|
||||
Or from the tool directory:
|
||||
python -m unittest mr_lookup_test -v
|
||||
|
||||
The design (MrNumberEmulator class + injectable callables) makes the emulator
|
||||
path fully unit-testable while keeping the CLI behavior unchanged.
|
||||
Run from this directory:
|
||||
python3 -m unittest mr_lookup_test -v
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import tempfile
|
||||
import unittest
|
||||
from pathlib import Path
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
# Import the module under test
|
||||
import sys
|
||||
sys.path.insert(0, str(Path(__file__).parent))
|
||||
import mr_lookup
|
||||
|
||||
|
||||
class TestDecideResult(unittest.TestCase):
|
||||
"""Pure logic for mapping vision extraction to screening result."""
|
||||
class TestDecideResultFallback(unittest.TestCase):
|
||||
"""The deterministic fallback heuristic (used only if the SDK profile fails)."""
|
||||
|
||||
def test_denied_on_negative_flags(self):
|
||||
extracted = {
|
||||
"reports": ["no show last week", "was rude over text"],
|
||||
"red_flags": ["cop vibes"],
|
||||
"suggested_result": "approved",
|
||||
}
|
||||
extracted = {"reports": ["no show last week", "was rude over text"], "red_flags": ["cop vibes"], "suggested_result": "approved"}
|
||||
self.assertEqual(mr_lookup.decide_result(extracted), "denied")
|
||||
|
||||
def test_denied_on_keywords_in_reports(self):
|
||||
extracted = {"reports": ["timewaster, kept asking for free stuff"], "red_flags": []}
|
||||
def test_denied_on_hyphenated_variant(self):
|
||||
# The historical bug: 'no-show' (hyphen) must still match.
|
||||
extracted = {"reports": ["total no-show, ghosted me"], "red_flags": [], "report_count": 1}
|
||||
self.assertEqual(mr_lookup.decide_result(extracted), "denied")
|
||||
|
||||
def test_approved_when_reports_present_and_clean(self):
|
||||
extracted = {"report_count": 2, "reports": ["good client, on time"], "red_flags": []}
|
||||
def test_never_approves_over_model_denied(self):
|
||||
# Even with clean-looking text, a model 'denied' is honored (the real bug).
|
||||
extracted = {"report_count": 3, "reports": ["seemed ok"], "suggested_result": "denied"}
|
||||
self.assertEqual(mr_lookup.decide_result(extracted), "denied")
|
||||
|
||||
def test_deposit_is_not_negative(self):
|
||||
# 'deposit' must NOT trip the negative keywords.
|
||||
extracted = {"report_count": 1, "reports": ["always sends a deposit, great client"], "suggested_result": "approved"}
|
||||
self.assertEqual(mr_lookup.decide_result(extracted), "approved")
|
||||
|
||||
def test_falls_back_to_suggested(self):
|
||||
extracted = {"report_count": 0, "suggested_result": "not_found"}
|
||||
self.assertEqual(mr_lookup.decide_result(extracted), "not_found")
|
||||
self.assertEqual(mr_lookup.decide_result({"report_count": 0, "suggested_result": "not_found"}), "not_found")
|
||||
|
||||
def test_pending_default(self):
|
||||
extracted = {}
|
||||
self.assertEqual(mr_lookup.decide_result(extracted), "pending")
|
||||
self.assertEqual(mr_lookup.decide_result({}), "pending")
|
||||
|
||||
|
||||
class TestMrNumberEmulator(unittest.TestCase):
|
||||
"""Test the emulator controller in isolation by mocking subprocess."""
|
||||
class TestRatingMapping(unittest.TestCase):
|
||||
"""Pure score/grade/result mapping + the safety override."""
|
||||
|
||||
def setUp(self):
|
||||
self.emulator = mr_lookup.MrNumberEmulator(device="emulator-test", package="com.test.mrnumber")
|
||||
def test_grade_bands(self):
|
||||
self.assertEqual(mr_lookup.grade_from_score(90), "A")
|
||||
self.assertEqual(mr_lookup.grade_from_score(75), "B")
|
||||
self.assertEqual(mr_lookup.grade_from_score(60), "C")
|
||||
self.assertEqual(mr_lookup.grade_from_score(45), "D")
|
||||
self.assertEqual(mr_lookup.grade_from_score(20), "F")
|
||||
|
||||
@patch("mr_lookup.subprocess.check_output")
|
||||
def test_adb_success(self, mock_check):
|
||||
mock_check.return_value = "ok\n"
|
||||
out = self.emulator.adb(["shell", "echo", "ok"])
|
||||
self.assertIn("ok", out)
|
||||
mock_check.assert_called_once()
|
||||
def test_result_from_score(self):
|
||||
self.assertEqual(mr_lookup.result_from_score(80), "approved")
|
||||
self.assertEqual(mr_lookup.result_from_score(55), "pending")
|
||||
self.assertEqual(mr_lookup.result_from_score(30), "denied")
|
||||
self.assertEqual(mr_lookup.result_from_score(None), "pending")
|
||||
|
||||
@patch("mr_lookup.subprocess.check_output")
|
||||
def test_adb_failure_non_check(self, mock_check):
|
||||
mock_check.side_effect = mr_lookup.subprocess.CalledProcessError(1, [], output="err")
|
||||
out = self.emulator.adb(["bad"], check=False)
|
||||
self.assertEqual(out, "err")
|
||||
def test_profile_prefers_recommendation(self):
|
||||
prof = {"score": 90, "recommended_result": "pending", "axes": {"safety": {"score": 90}}}
|
||||
self.assertEqual(mr_lookup.result_from_profile(prof), "pending")
|
||||
|
||||
@patch.object(mr_lookup.MrNumberEmulator, "adb")
|
||||
@patch.object(mr_lookup.MrNumberEmulator, "get_ui_dump")
|
||||
def test_find_and_tap_text_success(self, mock_dump, mock_adb):
|
||||
# Simulate a dump with a node we care about
|
||||
mock_dump.return_value = '''
|
||||
<node text="Lookup" bounds="[100,200][300,400]" />
|
||||
'''
|
||||
result = self.emulator.find_and_tap_text(["lookup"])
|
||||
self.assertTrue(result)
|
||||
# Should have called adb_tap with center
|
||||
mock_adb.assert_called() # tap happens inside via adb_tap which calls self.adb
|
||||
def test_profile_safety_override_forces_denied(self):
|
||||
# High overall score but a law-enforcement/safety signal → denied regardless.
|
||||
prof = {"score": 88, "recommended_result": "approved", "axes": {"safety": {"score": 10}}}
|
||||
self.assertEqual(mr_lookup.result_from_profile(prof), "denied")
|
||||
|
||||
@patch.object(mr_lookup.MrNumberEmulator, "get_ui_dump")
|
||||
@patch.object(mr_lookup.MrNumberEmulator, "adb_tap")
|
||||
@patch.object(mr_lookup.MrNumberEmulator, "adb_text")
|
||||
@patch.object(mr_lookup.MrNumberEmulator, "adb_keyevent")
|
||||
def test_find_edit_text_and_input_fallback(self, mock_key, mock_text, mock_tap, mock_dump):
|
||||
# Simulate a dump that will cause the parser to find an EditText
|
||||
mock_dump.return_value = '''<?xml version="1.0"?>
|
||||
<hierarchy>
|
||||
<node class="android.widget.EditText" resource-id="com.mrnumber.blocker:id/search" bounds="[50,100][400,150]" />
|
||||
</hierarchy>
|
||||
'''
|
||||
result = self.emulator.find_edit_text_and_input("+15551234567")
|
||||
self.assertTrue(result)
|
||||
mock_tap.assert_called()
|
||||
mock_text.assert_called()
|
||||
def test_profile_none_is_pending(self):
|
||||
self.assertEqual(mr_lookup.result_from_profile(None), "pending")
|
||||
|
||||
|
||||
class TestEmulatorMethodFlow(unittest.IsolatedAsyncioTestCase):
|
||||
"""End-to-end unit test of the emulator automation path using heavy mocking.
|
||||
class TestMergeReports(unittest.TestCase):
|
||||
"""Consolidation across multiple screenshots: dedupe + counts."""
|
||||
|
||||
This is the key test that lets us "try out our emulator method with unit testing"
|
||||
without any real Android, real vision, or real API calls.
|
||||
"""
|
||||
def test_dedupes_and_unions(self):
|
||||
extractions = [
|
||||
{"reports": ["paid deposit", "On time"], "red_flags": ["none"], "classification": "Personal Line", "report_count": 4},
|
||||
{"reports": ["paid deposit", " on time ", "ghosted once"], "red_flags": ["ghosting"], "report_count": 4},
|
||||
]
|
||||
merged = mr_lookup.merge_reports(extractions, "+15551112222")
|
||||
# 'paid deposit' and 'On time'/'on time' dedupe case/space-insensitively → 3 unique
|
||||
self.assertEqual(merged["captured_count"], 3)
|
||||
self.assertEqual(merged["declared_count"], 4)
|
||||
self.assertEqual(merged["classification"], "Personal Line")
|
||||
self.assertIn("ghosting", merged["red_flags"])
|
||||
|
||||
async def test_full_emulator_path_records_correct_mr_number_screening_body(self):
|
||||
fake_phone = "+15551234567"
|
||||
fake_client_id = 42
|
||||
fake_shot = Path("/tmp/fake-screenshot.png")
|
||||
|
||||
# Mock extraction that looks like real paid Mr. Number reports
|
||||
class TestFullFlow(unittest.IsolatedAsyncioTestCase):
|
||||
"""End-to-end device path with the expensive parts mocked."""
|
||||
|
||||
async def test_records_correct_wire_body_with_rating(self):
|
||||
phone = "+15551234567"
|
||||
client_id = 42
|
||||
shots = [Path("/tmp/s0.png"), Path("/tmp/s1.png")]
|
||||
|
||||
fake_extracted = {
|
||||
"phone": fake_phone,
|
||||
"report_count": 3,
|
||||
"reports": [
|
||||
"no show last month",
|
||||
"rude when asking for references",
|
||||
"otherwise seemed fine, paid on time"
|
||||
],
|
||||
"red_flags": ["no show"],
|
||||
"classification": "personal",
|
||||
"phone": phone, "report_count": 4,
|
||||
"reports": ["no-show, ghosted", "time waster"],
|
||||
"red_flags": ["no-show", "ghosting"], "classification": "Personal Line",
|
||||
"suggested_result": "denied",
|
||||
}
|
||||
fake_profile = {
|
||||
"score": 18, "grade": "F", "is_mixed": False,
|
||||
"axes": {"reliability": {"score": 10}, "payment": {"score": 40}, "respect": {"score": 30}, "safety": {"score": 70}},
|
||||
"recommended_result": "denied", "summary": "Repeated no-shows and time-wasting.",
|
||||
}
|
||||
|
||||
# Patch the expensive parts
|
||||
mock_requests = MagicMock()
|
||||
mock_post = mock_requests.post
|
||||
mock_post.return_value.json.return_value = {"id": 999, "status": "created"}
|
||||
mock_post.return_value.raise_for_status = MagicMock()
|
||||
|
||||
with patch("mr_lookup.launch_app") as mock_launch, \
|
||||
patch("mr_lookup.find_and_tap_text", return_value=True) as mock_tap, \
|
||||
patch("mr_lookup.find_edit_text_and_input", return_value=True) as mock_input, \
|
||||
patch("mr_lookup.take_screenshot", return_value=fake_shot) as mock_shot, \
|
||||
patch("mr_lookup._extract_from_screenshot", new_callable=AsyncMock) as mock_vision, \
|
||||
with patch("mr_lookup.launch_app"), \
|
||||
patch("mr_lookup.find_and_tap_text", return_value=True), \
|
||||
patch("mr_lookup.find_edit_text_and_input", return_value=True), \
|
||||
patch("mr_lookup.open_report_detail", return_value=True), \
|
||||
patch("mr_lookup.expand_all_reports", return_value=True), \
|
||||
patch("mr_lookup.capture_full_history", return_value=shots), \
|
||||
patch("mr_lookup._extract_from_screenshot", new_callable=AsyncMock, return_value=fake_extracted), \
|
||||
patch("mr_lookup.build_rating_profile", new_callable=AsyncMock, return_value=fake_profile), \
|
||||
patch("mr_lookup.save_history", return_value=Path("/tmp/hist.json")), \
|
||||
patch.dict("sys.modules", {"requests": mock_requests}), \
|
||||
patch("mr_lookup.QUINN_MY_SERVICE_TOKEN", "fake-token-for-test"), \
|
||||
patch("mr_lookup.time.sleep"): # speed up
|
||||
patch("mr_lookup.QUINN_MY_SERVICE_TOKEN", "fake-token"), \
|
||||
patch("mr_lookup.time.sleep"):
|
||||
|
||||
mock_vision.return_value = fake_extracted
|
||||
out = await mr_lookup.main_async(phone=phone, client_id=client_id, dry_run=False)
|
||||
|
||||
# Run the core async flow (bypassing CLI arg parsing)
|
||||
await mr_lookup.main_async(
|
||||
phone=fake_phone,
|
||||
client_id=fake_client_id,
|
||||
dry_run=False,
|
||||
dump_ui=False,
|
||||
)
|
||||
# Result comes from the rating profile (denied), score/grade surfaced.
|
||||
self.assertEqual(out["result"], "denied")
|
||||
self.assertEqual(out["score"], 18)
|
||||
self.assertEqual(out["grade"], "F")
|
||||
|
||||
# Verify vision was called with a real-looking screenshot path
|
||||
mock_vision.assert_awaited_once()
|
||||
self.assertIn(str(fake_shot), str(mock_vision.call_args))
|
||||
|
||||
# Verify the *actual wire body* sent via requests.post (the one that reaches
|
||||
# the zod checkSchema in admin/screening.ts and must contain clientId).
|
||||
# The actual wire body (must carry clientId for the zod schema).
|
||||
mock_post.assert_called_once()
|
||||
post_kwargs = mock_post.call_args[1]
|
||||
body = post_kwargs.get("json", {})
|
||||
self.assertEqual(body.get("clientId"), fake_client_id)
|
||||
body = mock_post.call_args[1].get("json", {})
|
||||
self.assertEqual(body.get("clientId"), client_id)
|
||||
self.assertEqual(body.get("service"), "mr-number")
|
||||
self.assertEqual(body.get("lookupValue"), fake_phone)
|
||||
self.assertEqual(body.get("lookupValue"), phone)
|
||||
self.assertEqual(body.get("result"), "denied")
|
||||
self.assertIn("mr-number", body.get("rawResponse", ""))
|
||||
self.assertIn("no show last month", body.get("rawResponse", ""))
|
||||
self.assertIn("suggested_result", body.get("rawResponse", ""))
|
||||
|
||||
# Sanity: launch and navigation were attempted (emulator method exercised)
|
||||
mock_launch.assert_called()
|
||||
self.assertTrue(mock_tap.called or mock_input.called)
|
||||
# rawResponse carries the full history + profile.
|
||||
self.assertIn("rating_profile", body.get("rawResponse", ""))
|
||||
self.assertIn("time waster", body.get("rawResponse", ""))
|
||||
|
||||
async def test_dry_run_does_not_record(self):
|
||||
with patch("mr_lookup.launch_app"), \
|
||||
patch("mr_lookup.find_and_tap_text", return_value=True), \
|
||||
patch("mr_lookup.find_edit_text_and_input", return_value=True), \
|
||||
patch("mr_lookup.take_screenshot") as mock_shot, \
|
||||
patch("mr_lookup._extract_from_screenshot", new_callable=AsyncMock) as mock_vision, \
|
||||
patch("mr_lookup.open_report_detail", return_value=True), \
|
||||
patch("mr_lookup.expand_all_reports", return_value=False), \
|
||||
patch("mr_lookup.capture_full_history", return_value=[Path("/tmp/s0.png")]), \
|
||||
patch("mr_lookup._extract_from_screenshot", new_callable=AsyncMock, return_value={"report_count": 0, "reports": [], "suggested_result": "not_found"}), \
|
||||
patch("mr_lookup.build_rating_profile", new_callable=AsyncMock, return_value=None), \
|
||||
patch("mr_lookup.save_history", return_value=Path("/tmp/hist.json")), \
|
||||
patch("mr_lookup.record_screening") as mock_record, \
|
||||
patch("mr_lookup.time.sleep"):
|
||||
|
||||
mock_shot.return_value = Path("/tmp/dry.png")
|
||||
mock_vision.return_value = {"report_count": 0, "suggested_result": "not_found"}
|
||||
|
||||
await mr_lookup.main_async(
|
||||
phone="+10000000000",
|
||||
client_id=99,
|
||||
dry_run=True,
|
||||
)
|
||||
|
||||
out = await mr_lookup.main_async(phone="+10000000000", client_id=99, dry_run=True)
|
||||
mock_record.assert_not_called()
|
||||
# No reports + no profile → fallback heuristic → pending.
|
||||
self.assertEqual(out["result"], "pending")
|
||||
|
||||
def test_decide_result_uses_heuristic_over_suggested_when_negative(self):
|
||||
# Even if vision says "approved", our local heuristic should win on bad reports
|
||||
extracted = {
|
||||
"reports": ["no show and kept pushing boundaries"],
|
||||
"suggested_result": "approved",
|
||||
}
|
||||
self.assertEqual(mr_lookup.decide_result(extracted), "denied")
|
||||
|
||||
class TestEmulatorControl(unittest.TestCase):
|
||||
"""adb controller in isolation."""
|
||||
|
||||
def setUp(self):
|
||||
self.emu = mr_lookup.MrNumberEmulator(device="emulator-test", package="com.test.mrnumber")
|
||||
|
||||
@patch("mr_lookup.subprocess.check_output")
|
||||
def test_adb_success(self, mock_check):
|
||||
mock_check.return_value = "ok\n"
|
||||
self.assertIn("ok", self.emu.adb(["shell", "echo", "ok"]))
|
||||
|
||||
@patch("mr_lookup.subprocess.check_output")
|
||||
def test_screen_size_parsed(self, mock_check):
|
||||
mock_check.return_value = "Physical size: 1080x1920\n"
|
||||
self.assertEqual(self.emu.screen_size(), (1080, 1920))
|
||||
|
||||
@patch("mr_lookup.subprocess.check_output")
|
||||
def test_screen_size_fallback(self, mock_check):
|
||||
mock_check.return_value = "weird output"
|
||||
self.assertEqual(self.emu.screen_size(), (720, 1280))
|
||||
|
||||
@patch.object(mr_lookup.MrNumberEmulator, "adb")
|
||||
@patch.object(mr_lookup.MrNumberEmulator, "get_ui_dump")
|
||||
def test_find_and_tap_text(self, mock_dump, mock_adb):
|
||||
mock_dump.return_value = '<node text="View all 4 reports" bounds="[100,200][300,400]" />'
|
||||
self.assertTrue(self.emu.find_and_tap_text(["view all"]))
|
||||
mock_adb.assert_called()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
unittest.main()
|
||||
|
|
|
|||
|
|
@ -130,19 +130,40 @@ docs/
|
|||
2. **Input.** The phone is cleaned to `^\+?\d+$` *before* `adb input text` (raw spaces
|
||||
/ parens mangle adb input). The "Look up <number>" suggestion row is tapped — the
|
||||
app does **not** search on Enter; tapping that row triggers the paid lookup.
|
||||
3. **Wait + screenshot.** A fixed wait lets the paid community reports render over the
|
||||
network, then `screencap` captures the full screen to `client/output/`.
|
||||
4. **Vision extraction.** The screenshot is handed to the Claude batch SDK
|
||||
(`ClaudeClient`, haiku) with `allowed_tools=["Read"]` and a prompt that says "Read
|
||||
the image file at <path>" + a strict JSON schema (report_count, reports[],
|
||||
classification, red_flags[], summary, suggested_result). Same pattern as ad-watch's
|
||||
`classify_photos.py`. No extra API keys — it reuses the platform's vision plumbing.
|
||||
5. **Decide.** `decide_result()` maps the extraction → a verdict heuristic: negative
|
||||
keywords (no-show, rude, cop, timewaster…) → `denied`; clean reports → `approved`;
|
||||
nothing found → `not_found`; ambiguous → `pending` (human gate). This is pure,
|
||||
deterministic, and the most-tested function in the repo.
|
||||
6. **Record.** Unless `--dry-run`, POST the verdict to the platform (see §5). The
|
||||
`rawResponse` carries the full extraction + screenshot path for the audit trail.
|
||||
3. **Land on the detail page.** `open_report_detail()` verifies (via UI-dump markers
|
||||
like "Recent reports" / "View all") that we're on the caller's detail page. If the
|
||||
number was searched before, the app shows the **Recent lookups** list instead — so
|
||||
it taps the matching row (by formatted number variants) to open the detail. Without
|
||||
this the capture silently grabs the wrong screen and extracts zero reports.
|
||||
4. **Capture the FULL history.** `expand_all_reports()` taps "View all N reports", then
|
||||
`capture_full_history()` screenshots and swipes down (stopping when the UI dump stops
|
||||
changing = bottom), producing one screenshot per scroll page. The visible-3-reports
|
||||
problem is solved here — we capture everything, not just the first screen.
|
||||
5. **Vision extraction (per page).** Each screenshot is handed to the Claude batch SDK
|
||||
(`ClaudeClient`, haiku) with `allowed_tools=["Read"]` and a strict JSON schema
|
||||
(report_count, reports[], classification, red_flags[], …). `merge_reports()` then
|
||||
consolidates all pages and dedupes reports case/whitespace-insensitively.
|
||||
6. **Rating profile (the consolidation).** `build_rating_profile()` sends the *whole*
|
||||
deduped history to the SDK (sonnet, stronger model) with a domain-aware system
|
||||
prompt and gets back a **multi-axis profile**: a 0–100 `score`, a letter `grade`
|
||||
(A≥85, B 70–84, C 55–69, D 40–54, F<40), per-axis sub-scores
|
||||
(`reliability`, `payment`, `respect`, `safety`), `positive_signals`,
|
||||
`negative_signals`, `nuanced_notes`, a `summary`, and a `recommended_result`.
|
||||
The prompt encodes the insider nuance — e.g. **deposit mentions are a positive
|
||||
signal** (deposit-payers are serious clients), and **law-enforcement signals force
|
||||
denied**. `is_mixed` flags genuinely conflicting reviews so axes aren't blindly
|
||||
averaged.
|
||||
7. **Map to a verdict.** `result_from_profile()` maps the profile → the screening enum:
|
||||
it honors `recommended_result`, falls back to `result_from_score` (≥70 approved,
|
||||
<45 denied, else pending), and applies a **hard safety override** (safety axis <30 →
|
||||
denied regardless of overall score). `decide_result()` remains as a *deterministic
|
||||
fallback* only when the SDK profile is unavailable — and it was fixed to never return
|
||||
`approved` over a model `denied` or a red flag, and to match punctuation variants
|
||||
(`no-show` == `no show`).
|
||||
8. **Save + record.** The full consolidated history + profile is written to
|
||||
`client/output/history/<phone>-<ts>.json`. Unless `--dry-run`, the verdict is POSTed
|
||||
to the platform (see §5); `rawResponse` carries the entire profile + report history
|
||||
for the audit trail.
|
||||
|
||||
Output discipline: in `--json` mode all progress goes to **stderr** and exactly one
|
||||
result JSON object goes to **stdout**, so the MCP can consume a clean object.
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue