History

Natalie 61c095fe69 test(ad-watch): unit-test canonical reduce + platform registry Closes the last pure-logic coverage gaps: reduceToCanonical (rate flattening, tour mapping, social/verifiedProfile filtering, field defaults) and platforms (normalizePlatformName, getPlatform by id/label, registry invariants). Every pure module in ad-watch is now unit-tested; 72 tests across 9 suites. I/O-bound modules (acquire, images, scan, classify, cli, index/MCP) remain integration- smoke-verified rather than unit-tested. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>		2026-06-27 05:22:28 -04:00
..
__tests__	test(ad-watch): unit-test canonical reduce + platform registry	2026-06-27 05:22:28 -04:00
scripts
src	feat(ad-watch): encode resolved price+domain rules	2026-06-27 04:52:03 -04:00
bun.lock
package.json
README.md
tsconfig.json

README.md

ad-watch (`quinn-adwatch` MCP)

Scrapes Quinn's live listing on each external ad platform and surfaces every discrepancy vs. the canonical "current" profile served by quinn.api /www/provider-config. The platform set is the canonical verifiedProfiles roster (each carries href + intended thumbnail imgSrc):

escort directories	content platforms
Eros · Tryst · TS4Rent · MegaPersonals · TSEscorts · AdultLook · AdultSearch · SkipTheGames	OnlyFans · Fansly · ManyVids

Targets resolve from canonical verifiedProfiles[].href at runtime; platforms.ts holds the per-platform metadata (kind, age-gate, expects, acquire order) plus an offline seedUrl fallback.

Stateless by design. It acquires → extracts → diffs → reports. It persists nothing. Snapshot history is a deliberate v2 (see below): the DB is quinn.api-owned, so history would land as a quinn.api entity, not a private pool here.

Runs on plum. This is a local stdio MCP, not a black :391x HTTP gateway — it's registered in .mcp.json as a bun run command and spawned per Claude session on plum. Because it runs from source (not a bundled binary), Playwright acquisition runs in-process, which is what lets it clear Cloudflare on Tryst without Apify.

How it works

profile URL ──acquire──▶ HTML ──extract──▶ ScrapedProfile ─┐
                                                            ├─▶ diffProfile ─▶ PlatformDiffReport
quinn.api /www/provider-config ──reduce──▶ CanonicalProfile ┘

canonical.ts — fetches ProviderData from quinn.api and reduces it to the comparable CanonicalProfile (tagline, phone, rates, tour, socials). On black point QUINN_API_BASE_URL at the INTERNAL quinn.api (http://localhost:3030) to read the source of truth and bypass the vps-0 edge cache + its 504s.
acquire.ts — backends in ascending heft: direct (browser-headed fetch) → browser (Playwright Chromium, lazily imported so stdio startup stays light) → apify (hosted render, needs APIFY_TOKEN).
acquire-browser.ts — the Playwright backend. Set ADWATCH_BROWSER_PROXY (or LILITH_TOR_PROXY, e.g. socks5://127.0.0.1:9050) to route through Tor so scanning Quinn's own profile doesn't hammer it from her residential IP.
extract.ts — structure-first (JSON-LD → OpenGraph/meta → visible-text heuristics for rates/tour/phone/socials/images). Never invents a field: gaps go to warnings. Images are captured in document order with the cover flagged (thumbnail) — order is itself a discrepancy axis. Detects Cloudflare interstitials → ChallengeError, and 18+ age gates (warns + escalates).
age gates — direct fetch throws AgeGateError on an 18+ wall so it escalates to browser, which clicks through the interstitial (I am 18 / Enter / Agree …) before reading the page.
images.ts — downloads a profile's images to ~/.local/share/quinn-adwatch/images/<platform>/ (override ADWATCH_IMAGE_DIR), named <order>-<sha12><ext> so on-disk order mirrors the gallery. Returns a manifest (sha256 per file) — the input to the alignment pipeline below.
diff.ts — pure (canonical, scraped) → report. Severity: critical = price/phone, warning = tagline/tour/socials, info = cosmetic. An empty scrape skips a field group (records it as skipped) rather than crying "everything missing" — that would be an extraction bug, not an ad discrepancy.

MCP tools

tool	what
`list_platforms`	registry + which have a profile URL configured
`fetch_canonical`	the canonical baseline (debug)
`extract_preview`	acquire + extract a URL, no diff — calibration aid
`scan_platform`	acquire + extract one platform → `ScrapedProfile`
`diff_platform`	acquire + extract + diff → `PlatformDiffReport`
`scan_all`	diff every configured platform in parallel

CLI (heavy / manual path)

bun run src/cli.ts canonical
bun run src/cli.ts capture https://tryst.link/escort/transquinnftw out.html --browser
bun run src/cli.ts extract  <url> --browser      # inspect what extraction sees
bun run src/cli.ts diff     tryst --browser      # full diff, forced browser
bun run src/cli.ts download tryst --browser      # download images → manifest

--browser forces Playwright acquisition; set ADWATCH_BROWSER_PROXY / LILITH_TOR_PROXY to route it through a proxy (Tor). Without the flag, acquisition follows the platform's configured order (Tryst: browser → apify).

Setup on plum (no black/Verdaccio dependency)

One-time:

cd codebase/@features/ad-watch
bun install                 # all deps are public npm — no Verdaccio/black needed
bunx playwright install chromium

The MCP uses the SDK's StdioServerTransport directly (no @lilith/mcp-common), so every dependency is public. Classify uses the on-disk Python claude-code-batch-sdk + the local claude CLI. Nothing here needs black.

Registered in the repo .mcp.json as quinn-adwatch — a bun run src/index.ts stdio command spawned per session.

Canonical baseline with black down. /www/provider-config is served through black, so set ADWATCH_CANONICAL_FILE to a local snapshot of that JSON (capture it once while a source is reachable) — the diff then runs fully offline against it. Without it, the diff fetches QUINN_API_BASE_URL over HTTP (needs black). Scan / extract / download / align / classify never touch black.

To use Apify instead of the local browser for any site, set APIFY_TOKEN.

Calibration (per platform, first run)

Extraction is structure-first, not selector-hardcoded, so it works on first contact — but each platform's real DOM should be eyeballed once:

bun run src/cli.ts extract <profile-url> --browser

Read the warnings[] and raw fields. If a platform buries rates/tour in a way the generic heuristics miss, add a targeted parser for that platform — don't hardcode brittle CSS. Eros/Slixa profile URLs are not yet on file; set them in platforms.ts (or ADWATCH_EROS_URL / ADWATCH_SLIXA_URL) once known.

Photo-alignment pipeline

Goal: know which physical photo leads on each site, and in what order, so the cover and gallery sequence can be aligned across all platforms.

Built + verified (deterministic core): phash.ts dHashes each downloaded image via sips → 9×8 BMP → 64-bit difference hash; align.ts clusters by Hamming distance (≤6 = same photo), builds the cross-site matrix, and flags cover-inconsistent / order-drift / missing-photo. Run it:

bun run src/cli.ts align --browser     # scan+download all, then align

Verified the perceptual property holds: the same photo at 800px vs 400px (recompressed) hashes to Hamming 1, cross-format to 0 — well inside the threshold.

Semantic labels — align --classify classifies each photo cluster's representative via claude-code-batch-sdk (Quinn's Python batch SDK at ~/Code/@applications/@ml/@packages/@py/claude-code-batch-sdk — NOT the TS @lilith/claude-code-sdk, NOT the official Agent SDK). It's Python (no API key, wraps the claude CLI with content-addressable caching + concurrency), so the work lives in scripts/classify_photos.py and classify.ts is a thin subprocess bridge (pipe cluster reps in as JSON, read labels back, coerce via classify-parse.ts, attach to the report). Labels: {category (gallery enum), thumbnailFitness, faceVisible, note}. Vision uses the SDK's generate(cwd=imageRoot, allowed_tools=["Read"]) — the model Reads the image (which renders it) on the path in the prompt. run_batched is bypassed because it doesn't forward allowed_tools/cwd; classify uses ClaudeClient (concurrency) + ResponseCache (per-photo dedup) directly.

Verified end-to-end on plum (black-independent — pure Python + the local claude CLI): the script read a test image, returned a valid single-string label, and populated the ResponseCache. Tune via ADWATCH_CLASSIFY_LEVEL (default haiku), ADWATCH_PYTHON, CLAUDE_CODE_BATCH_SDK_PATH.

scan (ordered images) ─▶ download (per-site files + sha256)  ◀── DONE
        │
        ▼
perceptual-hash each file (dHash) ──▶ cluster across sites ──▶ canonical photo ids (photo-A, photo-B…)
        │                                                            │
        ▼                                                            ▼
Claude batch classification (category + thumbnail-fitness)   cross-site alignment matrix
        └───────────────────────────────┬────────────────────────────┘
                                         ▼
            report: cover mismatch · order drift · missing-top-photo · off-brand pic

Cross-site identity = perceptual hash (dHash/pHash), not sha256 — the same photo recompressed/resized on two sites has a different sha256 but a near dHash (Hamming ≤ ~6). Clustering by dHash is what makes "same photo, different position" detectable. Deterministic, no LLM, free.
Semantic labels = Claude batch — category (reuse the gallery enum: glamour/casual/headshot/suggestive/lifestyle/portrait) + thumbnail fitness. Additive on top of the hash clusters.
Canonical side: verifiedProfiles[].imgSrc is Quinn's intended thumbnail per platform → diff "live cover" vs "intended cover".

Two decisions gate the build (see the chat): the alignment mechanism (pHash vs LLM vs both) and the image-decode tool on plum (sips per house pref vs a JS lib). dHash + clustering is unit-testable offline; the Claude batch step needs the SDK wired.

Compliance & intended copy (Executor canon)

With black gone, the black-independent canonical for text is Quinn's Executor workspace ad-copy/ dir (set ADWATCH_ADCOPY_DIR; defaults to ~/Documents/Claude/Projects/Executor/ad-copy): one intended-copy file per platform (tryst.txt, eros.txt, …) plus the maintained _RULES.md checklist.

executor-canon.ts — loadIntendedCopy(platform) + listAdCopyPlatforms() + loadRulesDoc().
compliance.ts — a transparent, data-driven detector for the literal rules Quinn states: geek-not-"nerd", banned phrase "where I like to stay", suspended X/Twitter links, Bay-Area/old-location geo, Eros emoji-free. Surfaces candidates for review; never edits.
MCP check_compliance {platform} checks the intended-copy file (instant, offline). CLI compliance <platform> [--intended|--browser] checks the file or the live page.

Two source contradictions are surfaced, not auto-enforced (CONTRADICTIONS): prices (_RULES "never announce a rate" vs FACT_SHEET "$1000, visible") and domain (_RULES "don't use tsquinn.com" vs FACT_SHEET "Site: tsquinn.com"). Quinn resolves these; ad-watch won't guess.

Verified on the real files: tryst.txt correctly flags San Jose/Napa (matching its own hand-written FIX note); eros.txt is clean (emoji-free).

v2 — snapshot history (not built)

Persisting each scan to diff over time ("when did Tryst go stale?") belongs in quinn.api as an ad-snapshot entity (POST/GET /admin/ad-snapshots), with this MCP reading through it — never opening its own DB pool. Deliberately deferred.

README.md Unescape Escape

ad-watch (quinn-adwatch MCP)