docs(port): 📝 Update port findings documentation with verified architecture details and v4 reframing

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-05-18 18:28:57 -07:00 · 2026-05-18 18:28:57 -07:00 · 7685871bb5
commit 7685871bb5
parent 26ccf73242
1 changed files with 58 additions and 2 deletions
--- a/talent-scout-port-findings.md
+++ b/talent-scout-port-findings.md
@ -144,7 +144,7 @@ A production-grade **provider-scraping engine** built to discover escort-listing

 ## Component-by-component port verdicts

-### Adapter base + Tryst adapter — **PORT** (most reusable)
+### Adapter base + Tryst adapter — **PORT** (most reusable) — verified read 2026-05-18 (full files)

 | File | Purpose | v4 destination |
 |---|---|---|
@ -160,6 +160,22 @@ A production-grade **provider-scraping engine** built to discover escort-listing

 **v4 reframing**: instead of `crawl()` + `scrapeProfile()`, the v4 adapter exposes `login()` + `bump()` + `updateProfile()` + `fetchInbox()` + `replyDM()`. The infrastructure (selectors, anti-bot, page-nav, screenshots) is shared.

+### BaseAdapter architecture (verified read 2026-05-18 — full 360L)
+
+**Class shape**: `abstract class BaseAdapter implements PlatformAdapter`. Per-platform circuit breaker from `@lilith/circuit-breaker` (rename → `@cocotte/circuit-breaker`). Selector schema loaded from `selectors/<platformId>.json` via `getSelectorSchema(platformId)` (port the registry pattern).
+
+**Abstract methods adapters must implement**: `buildListingUrlFromSlug(slug, page)`, `buildListingUrl(city, page)`, `buildProfileUrl(slug)`. **For v4 CocotteAI**: the operate-on flow (Quinn manages her own profiles) replaces these with `getOwnProfileUrl()`, `getEditUrl()`, `getBumpUrl()`, `getSettingsUrl()` — same selector-schema-driven pattern, different verbs.
+
+**Helper-module split** (lift verbatim):
+- `content-extraction.ts` — pure extractors: `extractRates / extractMenu / extractTouringStatus / extractVerification / extractPhotos / extractSocials / extractSimilarProfiles / revealContact / extractTagline / extractProfileDetails / extractPolicies / extractFromBio / mergeBioSocials`.
+- `page-navigation.ts` — `hasNextPage / handleAntiBot / normalizePhone / screenshotOnError`.
+
+**`scrapeProfile` parallelism pattern**: 10 extractors fired via `Promise.all([...])` after `waitForSelector(name)`. Port directly — same pattern applies to v4 "read Quinn's current profile state" before computing a diff for the operate-on action.
+
+**Bio-text supplemental extraction**: `extractFromBio()` parses phone/rates/socials out of bio text — DOM extraction takes precedence, bio extraction SUPPLEMENTS only when DOM is empty. **Critical invariant for v4**: applies symmetrically to the operate-on flow when Quinn's *own* draft bio mentions a number that didn't make it into the phone field — surface as a suggestion via the strategist specialist.
+
+**Telemetry hook (`onSolveAttempt?: (data) => void`)**: optional callback set by pipeline worker. Port verbatim — wire to `captcha_solve_attempts` insert in v4 pipeline worker.
+
 ### Captcha solver — **PORT** (major win — already-trained ML pipeline)

 The captcha-solver is **NOT a single 3.8GB model** as the archive map said. It's a Python ML pipeline:
@ -255,6 +271,14 @@ circuitBreaker:

 **Port plan**: lift the Tor manager service config + the circuit-breaker library (`@lilith/circuit-breaker` — already an internal package). For v4, may want **fewer circuits** (10 was for parallel crawling of N city-pages; Quinn-operate-on is mostly sequential per-surface).

+### Expert pool (LLM extraction experts) — **PARTIAL PORT** — verified read 2026-05-18
+
+`src/experts/expert-pool.ts` runs 5 specialized LLM extractors (`MenuExpert / RateExpert / BioExpert / ContactExpert / PolicyExpert`) against scraped third-party profile HTML to normalize raw data into typed shapes. Execution adapts to pool state: parallel via `Promise.all` when LLM pool exists (`llmClient.hasPool === true`), sequential otherwise.
+
+**Port verdict**: **Reuse only for CocotteAI's competitor-research / prospector path** (scanning Tryst listings for competitor pricing, regional trends, etc.). For the operate-on flow (Quinn manages her own profiles) the LLM-extraction experts are mostly N/A — Quinn's draft is already structured, no normalization needed. Drop into `@cocottetech/@platform/codebase/@features/prospector/experts/` (or whichever feature owns competitor scanning); skip for `bookings-tryst` adapter.
+
+**LLM-pool reuse**: the `TalentScoutLLMClient.hasPool` pattern + `acquire/release` semantics already align with `ServicePoolManager` — same shared infrastructure powers captcha-solver pool, Tor circuit pool, LLM expert pool. Three pools, one pattern. **Confirmed unifies cleanly.**
+
 ### Detection module — **PORT** (key safety primitives)

 | Sub-module | Verdict | Notes |
@ -297,7 +321,39 @@ Specifically the Tryst dir confirms the surface-tryst brief gets accurate detail

 ---

-## Tryst-specific anti-bot details (from `tryst-adapter.ts` lines 60–180)
+## Tryst-specific anti-bot details (verified — full `tryst-adapter.ts` 775L read 2026-05-18)
+
+### Reveal flow (Stimulus `unobfuscate-details` controller) — 3-path extraction
+
+For each contact field (email, mobile), the v1 adapter executes a triple-redundant extraction strategy because Tryst's reveal mechanism varies by browser/timing:
+
+1. **Path A — API interception** (primary): `page.waitForResponse((r) => r.url().includes('/api/v1/profiles/') && r.request().method() === 'POST' && r.status() === 200)` installed BEFORE the reveal click; parses JSON for `data.mobile / data.email / data.phone`. Bypasses DOM timing issues.
+2. **Path B — DOM polling** (fallback): `waitForFunction` checks `[data-unobfuscate-details-target="output"]` until `●` (obfuscation) chars disappear. Then reads from injected `mailto:` / `sms:` / `tel:` link if present, else from span text.
+3. **Path C — postMessage capture** (final fallback): listens to `window.message` events pre-click; iframe sometimes postMessages the revealed value to parent.
+
+**Key trigger detail**: `showButton.dispatchEvent('click')` is used INSTEAD of Playwright's `.click()` — the latter doesn't reliably fire Stimulus action handlers under stealth-mode. **Port directly.**
+
+### CAPTCHA dialog detection
+
+Tryst's fancybox iframe doesn't reliably URL-match — content-verifies via `frame.$('img')` AND `frame.$('input[type="text"], [role="textbox"]')` both present, then falls back to `'dialog iframe, .fancybox__container iframe, [id^="fancybox__iframe"]'` with same content check. Critical: after successful solve, the iframe navigates to a postMessage-bridge URL that still includes "challenge" — URL alone is insufficient.
+
+### CAPTCHA submit form quirks
+
+- Image: any `<img>` in iframe (SVG-distorted text)
+- Input: `input#captcha_text, input[name="captcha_text"], input[type="text"]`
+- Submit: **`<input type="submit">` not `<button>`** — selector must include both: `'input[type="submit"], button[type="submit"], button'`
+
+### Captcha-solver HTTP contract (port verbatim)
+
+`POST http://127.0.0.1:3099/solve` · FormData: `image` (Blob) + `strategy=style_expert` · 30s timeout · returns `CaptchaSolveResponse { text, confidence, strategy_used, model_used, detected_style, style_confidence, timing: { total_ms, preprocess_ms, inference_ms }, path_used }`.
+
+### Telemetry callback contract (`onSolveAttempt`)
+
+Both success and failure paths emit per-attempt telemetry → feeds `captcha_solve_attempts` table:
+- Success: `success=true, failureReason=null`
+- Failure: `failureReason` classified via body-text-match → `'server_error'` (text "Something went wrong") | `'wrong_answer'` ("did not match") | `'new_captcha'` (default). **Port the classification logic verbatim.**
+
+### Original section: lines 60–180 ALTCHA + Turnstile + terms-toast (kept below)

 Read directly from the adapter code: