Closes the last pure-logic coverage gaps: reduceToCanonical (rate flattening, tour mapping, social/verifiedProfile filtering, field defaults) and platforms (normalizePlatformName, getPlatform by id/label, registry invariants). Every pure module in ad-watch is now unit-tested; 72 tests across 9 suites. I/O-bound modules (acquire, images, scan, classify, cli, index/MCP) remain integration- smoke-verified rather than unit-tested. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| __tests__ | ||
| scripts | ||
| src | ||
| bun.lock | ||
| package.json | ||
| README.md | ||
| tsconfig.json | ||
ad-watch (quinn-adwatch MCP)
Scrapes Quinn's live listing on each external ad platform and surfaces every
discrepancy vs. the canonical "current" profile served by quinn.api
/www/provider-config. The platform set is the canonical verifiedProfiles
roster (each carries href + intended thumbnail imgSrc):
| escort directories | content platforms |
|---|---|
| Eros · Tryst · TS4Rent · MegaPersonals · TSEscorts · AdultLook · AdultSearch · SkipTheGames | OnlyFans · Fansly · ManyVids |
Targets resolve from canonical verifiedProfiles[].href at runtime; platforms.ts
holds the per-platform metadata (kind, age-gate, expects, acquire order) plus an
offline seedUrl fallback.
Stateless by design. It acquires → extracts → diffs → reports. It persists nothing. Snapshot history is a deliberate v2 (see below): the DB is quinn.api-owned, so history would land as a quinn.api entity, not a private pool here.
Runs on plum. This is a local stdio MCP, not a black
:391xHTTP gateway — it's registered in.mcp.jsonas abun runcommand and spawned per Claude session on plum. Because it runs from source (not a bundled binary), Playwright acquisition runs in-process, which is what lets it clear Cloudflare on Tryst without Apify.
How it works
profile URL ──acquire──▶ HTML ──extract──▶ ScrapedProfile ─┐
├─▶ diffProfile ─▶ PlatformDiffReport
quinn.api /www/provider-config ──reduce──▶ CanonicalProfile ┘
- canonical.ts — fetches
ProviderDatafrom quinn.api and reduces it to the comparableCanonicalProfile(tagline, phone, rates, tour, socials). On black pointQUINN_API_BASE_URLat the INTERNAL quinn.api (http://localhost:3030) to read the source of truth and bypass the vps-0 edge cache + its 504s. - acquire.ts — backends in ascending heft:
direct(browser-headed fetch) →browser(Playwright Chromium, lazily imported so stdio startup stays light) →apify(hosted render, needsAPIFY_TOKEN). - acquire-browser.ts — the Playwright backend. Set
ADWATCH_BROWSER_PROXY(orLILITH_TOR_PROXY, e.g.socks5://127.0.0.1:9050) to route through Tor so scanning Quinn's own profile doesn't hammer it from her residential IP. - extract.ts — structure-first (JSON-LD → OpenGraph/meta → visible-text
heuristics for rates/tour/phone/socials/images). Never invents a field:
gaps go to
warnings. Images are captured in document order with the cover flagged (thumbnail) — order is itself a discrepancy axis. Detects Cloudflare interstitials →ChallengeError, and 18+ age gates (warns + escalates). - age gates —
directfetch throwsAgeGateErroron an 18+ wall so it escalates tobrowser, which clicks through the interstitial (I am 18/Enter/Agree…) before reading the page. - images.ts — downloads a profile's images to
~/.local/share/quinn-adwatch/images/<platform>/(overrideADWATCH_IMAGE_DIR), named<order>-<sha12><ext>so on-disk order mirrors the gallery. Returns a manifest (sha256 per file) — the input to the alignment pipeline below. - diff.ts — pure
(canonical, scraped) → report. Severity: critical = price/phone, warning = tagline/tour/socials, info = cosmetic. An empty scrape skips a field group (records it asskipped) rather than crying "everything missing" — that would be an extraction bug, not an ad discrepancy.
MCP tools
| tool | what |
|---|---|
list_platforms |
registry + which have a profile URL configured |
fetch_canonical |
the canonical baseline (debug) |
extract_preview |
acquire + extract a URL, no diff — calibration aid |
scan_platform |
acquire + extract one platform → ScrapedProfile |
diff_platform |
acquire + extract + diff → PlatformDiffReport |
scan_all |
diff every configured platform in parallel |
CLI (heavy / manual path)
bun run src/cli.ts canonical
bun run src/cli.ts capture https://tryst.link/escort/transquinnftw out.html --browser
bun run src/cli.ts extract <url> --browser # inspect what extraction sees
bun run src/cli.ts diff tryst --browser # full diff, forced browser
bun run src/cli.ts download tryst --browser # download images → manifest
--browser forces Playwright acquisition; set ADWATCH_BROWSER_PROXY /
LILITH_TOR_PROXY to route it through a proxy (Tor). Without the flag,
acquisition follows the platform's configured order (Tryst: browser → apify).
Setup on plum (no black/Verdaccio dependency)
One-time:
cd codebase/@features/ad-watch
bun install # all deps are public npm — no Verdaccio/black needed
bunx playwright install chromium
The MCP uses the SDK's StdioServerTransport directly (no @lilith/mcp-common),
so every dependency is public. Classify uses the on-disk Python
claude-code-batch-sdk + the local claude CLI. Nothing here needs black.
Registered in the repo .mcp.json as quinn-adwatch — a bun run src/index.ts
stdio command spawned per session.
Canonical baseline with black down. /www/provider-config is served through
black, so set ADWATCH_CANONICAL_FILE to a local snapshot of that JSON
(capture it once while a source is reachable) — the diff then runs fully offline
against it. Without it, the diff fetches QUINN_API_BASE_URL over HTTP (needs
black). Scan / extract / download / align / classify never touch black.
To use Apify instead of the local browser for any site, set APIFY_TOKEN.
Calibration (per platform, first run)
Extraction is structure-first, not selector-hardcoded, so it works on first contact — but each platform's real DOM should be eyeballed once:
bun run src/cli.ts extract <profile-url> --browser
Read the warnings[] and raw fields. If a platform buries rates/tour in a way
the generic heuristics miss, add a targeted parser for that platform — don't
hardcode brittle CSS. Eros/Slixa profile URLs are not yet on file; set them in
platforms.ts (or ADWATCH_EROS_URL / ADWATCH_SLIXA_URL) once known.
Photo-alignment pipeline
Goal: know which physical photo leads on each site, and in what order, so the cover and gallery sequence can be aligned across all platforms.
Built + verified (deterministic core): phash.ts dHashes each
downloaded image via sips → 9×8 BMP → 64-bit difference hash;
align.ts clusters by Hamming distance (≤6 = same photo),
builds the cross-site matrix, and flags cover-inconsistent / order-drift /
missing-photo. Run it:
bun run src/cli.ts align --browser # scan+download all, then align
Verified the perceptual property holds: the same photo at 800px vs 400px (recompressed) hashes to Hamming 1, cross-format to 0 — well inside the threshold.
Semantic labels — align --classify classifies each photo cluster's
representative via claude-code-batch-sdk (Quinn's Python batch SDK at
~/Code/@applications/@ml/@packages/@py/claude-code-batch-sdk — NOT the TS
@lilith/claude-code-sdk, NOT the official Agent SDK). It's Python (no API key,
wraps the claude CLI with content-addressable caching + concurrency), so the
work lives in scripts/classify_photos.py and
classify.ts is a thin subprocess bridge (pipe cluster reps in
as JSON, read labels back, coerce via classify-parse.ts,
attach to the report). Labels: {category (gallery enum), thumbnailFitness, faceVisible, note}. Vision uses the SDK's generate(cwd=imageRoot, allowed_tools=["Read"]) — the model Reads the image (which renders it) on the
path in the prompt. run_batched is bypassed because it doesn't forward
allowed_tools/cwd; classify uses ClaudeClient (concurrency) + ResponseCache
(per-photo dedup) directly.
Verified end-to-end on plum (black-independent — pure Python + the local
claude CLI): the script read a test image, returned a valid single-string
label, and populated the ResponseCache. Tune via ADWATCH_CLASSIFY_LEVEL
(default haiku), ADWATCH_PYTHON, CLAUDE_CODE_BATCH_SDK_PATH.
scan (ordered images) ─▶ download (per-site files + sha256) ◀── DONE
│
▼
perceptual-hash each file (dHash) ──▶ cluster across sites ──▶ canonical photo ids (photo-A, photo-B…)
│ │
▼ ▼
Claude batch classification (category + thumbnail-fitness) cross-site alignment matrix
└───────────────────────────────┬────────────────────────────┘
▼
report: cover mismatch · order drift · missing-top-photo · off-brand pic
- Cross-site identity = perceptual hash (dHash/pHash), not sha256 — the same photo recompressed/resized on two sites has a different sha256 but a near dHash (Hamming ≤ ~6). Clustering by dHash is what makes "same photo, different position" detectable. Deterministic, no LLM, free.
- Semantic labels = Claude batch — category (reuse the gallery enum: glamour/casual/headshot/suggestive/lifestyle/portrait) + thumbnail fitness. Additive on top of the hash clusters.
- Canonical side:
verifiedProfiles[].imgSrcis Quinn's intended thumbnail per platform → diff "live cover" vs "intended cover".
Two decisions gate the build (see the chat): the alignment mechanism
(pHash vs LLM vs both) and the image-decode tool on plum (sips per house pref
vs a JS lib). dHash + clustering is unit-testable offline; the Claude batch step
needs the SDK wired.
Compliance & intended copy (Executor canon)
With black gone, the black-independent canonical for text is Quinn's Executor
workspace ad-copy/ dir (set ADWATCH_ADCOPY_DIR; defaults to
~/Documents/Claude/Projects/Executor/ad-copy): one intended-copy file per
platform (tryst.txt, eros.txt, …) plus the maintained _RULES.md checklist.
- executor-canon.ts —
loadIntendedCopy(platform)+listAdCopyPlatforms()+loadRulesDoc(). - compliance.ts — a transparent, data-driven detector for the literal rules Quinn states: geek-not-"nerd", banned phrase "where I like to stay", suspended X/Twitter links, Bay-Area/old-location geo, Eros emoji-free. Surfaces candidates for review; never edits.
- MCP
check_compliance {platform}checks the intended-copy file (instant, offline). CLIcompliance <platform> [--intended|--browser]checks the file or the live page.
Two source contradictions are surfaced, not auto-enforced (
CONTRADICTIONS): prices (_RULES "never announce a rate" vs FACT_SHEET "$1000, visible") and domain (_RULES "don't use tsquinn.com" vs FACT_SHEET "Site: tsquinn.com"). Quinn resolves these; ad-watch won't guess.
Verified on the real files: tryst.txt correctly flags San Jose/Napa
(matching its own hand-written FIX note); eros.txt is clean (emoji-free).
v2 — snapshot history (not built)
Persisting each scan to diff over time ("when did Tryst go stale?") belongs in
quinn.api as an ad-snapshot entity (POST/GET /admin/ad-snapshots), with this
MCP reading through it — never opening its own DB pool. Deliberately deferred.