cocottetech/@platform/codebase/@features/ai-copilot/docs/_engineering-surface-adapter-container.md

20 KiB
Raw Blame History

_engineering-surface-adapter-container — Container-based surface automation

Genre: engineering annex (non-UX). The architecture for how Cocotte actually operates on external surfaces (Tryst, TS4Rent, Slixa, Eros, OF, X, …). Supersedes the cookie-paste model in earlier drafts of tryst-connect.screen.md and surface-tryst.brief.md §2.

Why a container model

Earlier drafts (2026-05-18, before the user's correction) proposed cookie-paste: Quinn extracts session cookies from her Safari, pastes into Cocotte, Cocotte replays them. That model has fatal problems:

  • High friction. Quinn has to open DevTools and copy a string. Per-surface. Per re-auth (every ~30 days). Quinn's stated #1 time-sink is the 23h Tryst bump (tier-dependent per surface-tryst §canonical-facts) — adding a manual re-auth ritual every 30d compounds the wrong direction.
  • No 2FA support. Cookies-only is useless on surfaces requiring fresh 2FA per login.
  • No captcha handling. Cookie replay bypasses login entirely, but rate-limit / IP-flag triggers can demand mid-session captcha — and Cocotte has no way to solve.
  • Fragile to fingerprint changes. Tryst's bot detection compares browser fingerprint; if Cocotte's User-Agent / Accept-Language / etc. don't match Quinn's, the session is revoked. Cookie-paste doesn't carry fingerprint.

The right model: per-surface ephemeral container that performs the full login dance with stored credentials, persists session state, handles captchas in 3 tiers, and is fingerprint-stable across runs.

Architecture

┌──────────────────────────────────────────────────────────────┐
│  Quinn's iOS device (CocotteAI)                              │
│   "bump Tryst" → ai-copilot routes to bookings-tryst         │
└──────────────────────────────┬───────────────────────────────┘
                               │ MCP / HTTP
                               ▼
┌──────────────────────────────────────────────────────────────┐
│  bookings-tryst specialist (NestJS, black)                   │
│   Reads policy from platform.api, calls adapter action       │
└──────────────────────────────┬───────────────────────────────┘
                               │ HTTP
                               ▼
┌──────────────────────────────────────────────────────────────┐
│  @features/bookings-tryst/adapter/bump (NestJS)              │
│   1. Lookup credentials from vault (per credentials-vault)   │
│   2. Acquire/reuse browser session in container pool         │
│   3. Issue Playwright instruction (click bump, verify)       │
│   4. Capture result + write agent_actions row                │
└──────────────────────────────┬───────────────────────────────┘
                               │ container.exec
                               ▼
┌──────────────────────────────────────────────────────────────┐
│  Surface-adapter container pool (apricot, GPU host)          │
│   • Playwright headless Chromium image (primary)             │
│   • Android emulator image (secondary, for app-only)         │
│   • Per-(user_id, surface) browser context — persistent      │
│     storage of cookies/localStorage/IndexedDB                │
│   • Tor circuit pool for IP rotation                         │
│   • Fingerprint manager (stable per Quinn)                   │
│   • Captcha-solver service (3 tiers below)                   │
└──────────────────────────────────────────────────────────────┘

Layer 1 — Container runtime

Primary: Playwright headless Chromium, Docker image, one container per (user_id, surface) — or pool of N containers sharded by surface. Containers are long-lived (browser context persisted to volume) but disposable (restart on crash; one user's container crash never affects another's).

Secondary: Android emulator (e.g. redroid/redroid), reserved for surfaces with NO web equivalent — Signal, Wickr, Threema, maybe WhatsApp business flows. Heavier (~1GB RAM/instance vs ~300MB Playwright); only spawned for those surfaces.

Pool sizing: per-user N (default 3) Playwright containers, lazily warmed; one Android instance only when needed. Lives on apricot (has GPU + RAM headroom; black runs the auth-critical NestJS services).

Per-surface browser context: each (user_id, surface) tuple gets its own Chromium context with persistent storage at /data/contexts/<user_id>/<surface>/. Cookies, localStorage, IndexedDB, service workers all survive container restarts. This is the session-persistence layer — Cocotte doesn't re-login on every bump.

Layer 2 — Tor circuit pool (IP rotation)

Port from v2's event-scrapers / tour-scout (per brief J):

  • HAProxy on black.lan:3131 fronts 20 Tor circuits.
  • Each adapter request acquires a circuit from the pool; circuit rotates on rate-limit / IP-flag detection.
  • Per-surface circuit affinity (Cocotte tries to reuse the same exit IP for the same surface; surfaces flag IP changes as suspicious).

Hostname allowlist: per-surface config restricts which domains the container can reach (Tryst container can only call tryst.link + Tor + captcha-solver — no public internet escape). Defense-in-depth against credential exfiltration if a container is somehow compromised.

Layer 3 — Browser fingerprint manager

Per-(user_id, surface) stable fingerprint:

  • User-Agent (matches Quinn's actual primary browser — Safari macOS, derived once at first connect).
  • Accept-Language: en-US,en;q=0.9 + locale-derived.
  • Screen + viewport (1440×900 default; macOS variants).
  • Timezone, platform, hardware concurrency.
  • Canvas + WebGL fingerprints (stable noise per user, not randomized per session — surface detectors flag fingerprint flux).
  • Navigator props (plugins, mimeTypes, devicePixelRatio).

Library: playwright-extra + stealth plugin as starting point; per-surface overrides for known fingerprint gotchas.

Layer 4 — Credentials injection (dual-mode)

Per _engineering-credentials-vault.md, credentials can be stored under one of two auth_mode values:

  • Vault row carries a cookie_blob_enc field (encrypted session cookie value).
  • Adapter action at start: decrypt cookie → load into Playwright BrowserContext.addCookies(...) → context is ready to navigate already-authenticated.
  • No login dance. No captcha exposure at session-establish time.
  • Recovery: when adapter detects 401/403 mid-action, action fails with session-expired; specialist degrades; Quinn must re-paste via tryst-connect.screen.md cookie mode.
  • Best for: fast initial onboarding, captcha-solver-bootstrap-pending periods.

Mode B — auth_mode='credentials' (full credentials path)

  • Vault row carries username, password_enc, optional totp_secret_enc.
  • Adapter action at start: check existing browser context cookies; if valid, proceed as Mode A. If expired: trigger login flow — navigate to surface's sign-in URL, fill form, handle 2FA via auto-generated TOTP (from totp_secret), handle email-OTP via mail-sync inbox interception (per brief P), handle captcha via the 3-tier solver (Layer 5).
  • After successful login, captured cookies are persisted to the browser context volume; subsequent actions reuse the session without re-login until expiry.
  • Best for: long-haul autonomous operation.

Mode resolution at action-time

  • Adapter checks browser-context cookies first (both modes use them after first connect).
  • If cookies valid → proceed; mode doesn't matter for this action.
  • If cookies expired:
    • Mode B: trigger auto-login.
    • Mode A: fail with session-expired; degrade to user-recoverable.

Common invariants (both modes)

  • Credentials live in adapter process memory ONLY — never written to container disk, never logged.
  • Cookie blobs likewise — decrypted only at injection time, GC'd after addCookies().
  • agent_actions rows include auth_mode for audit visibility but never the credentials values themselves.

Layer 5 — Captcha solver (3 tiers)

This is the load-bearing piece. Tryst, OF, X, and most directories occasionally surface captchas — Cocotte needs all three tiers.

Tier 1 — anti-detection (avoid trigger)

  • Stable fingerprint per Layer 3.
  • Human-like timing: pre-action mouse move (page.mouse.move(...)), 200800ms delays before clicks, scroll-jitter.
  • Avoid requestAnimationFrame patterns automation libraries leave behind.
  • Tor exit-IP reputation check before action (rotate if flagged on OpenProxy lists).
  • Honor rate-limit hints (Tryst's cadence cap is ~3/hr; Cocotte never exceeds even if Quinn's policy allows higher).

Tier 2 — ML captcha solver (port from v1 talent-scout)

v1's talent-scout had a 3.8GB custom-trained model for solving the captchas Tryst specifically used. Per the archive map (.archive/ARCHIVED.md):

talent-scout (tryst scraper) | platform.1/codebase/tools/talent-scout/ + platform.1/operations/talent-scout/ | Provider intel scraper (excluding the 3.8G captcha-solver model) talent-scout/captcha-solver | rebuild via @applications/@ml/ if needed

Port plan:

  1. Extract v1 archive (apricot once reachable, or build the tarball locally) to get the scraper code + the model's training data + the inference code.
  2. Retrain the model in @ml/ workspace using the original training data (the 3.8GB weights are not in archive; the training pipeline + data should be).
  3. Wrap as a service: captcha-solver:8080 container with POST /solve { image_b64, type: "hcaptcha"|"recaptcha"|"text"|"img-grid" }{ solution }.
  4. Adapter integration: when Playwright detects a captcha challenge in the page, screenshot the challenge, POST to captcha-solver, paste solution back.

Captcha types the model handles (per v1 talent-scout context): hCaptcha image grids, reCAPTCHA v2 image grids, text-distortion (a few platforms still use), Tryst's specific challenge style.

Tier 3 — Human-in-the-loop (HITL) fallback

When Tier 1 fails AND Tier 2 fails (or confidence is too low):

  • Adapter pauses the action mid-flight.
  • Captures the challenge image.
  • Sends a high-stakes push notification to Quinn's iOS: "Tryst captcha needs you. Tap to solve."
  • Quinn taps → iOS deeplink opens a captcha-solve sheet (new screen — captcha-solve.screen.md, to be designed) — renders the challenge image, accepts her solution (tap, drag, or type), submits.
  • Adapter receives the solution via webhook, resumes the action.
  • If Quinn doesn't respond within N minutes (configurable, default 5), action fails with failed: captcha-timeout and surfaces in audit + chat-home receipt per brief M.

HITL has costs (Quinn's attention) but is the safety net for cases Tier 2 doesn't cover (new captcha format, model degradation, paranoid platform).

Layer 6 — Adapter API contract

Every @cocottetech/@platform/codebase/@features/{bookings,content}-{surface}/adapter/{verb}/ exports:

export interface SurfaceAdapterAction<I, O> {
  surface: SurfaceKind;            // 'tryst' | 'ts4rent' | ...
  action: ActionVerb;              // 'bump' | 'update-profile' | 'reply' | 'login' | ...
  schema: { input: ZodSchema<I>; output: ZodSchema<O> };
  
  // Three required functions per action:
  precheck(input: I, ctx: AdapterContext): Promise<PrecheckResult>;
  execute(input: I, ctx: AdapterContext): Promise<O>;
  rollback?(output: O, ctx: AdapterContext): Promise<void>;  // optional for undoable actions
}

export interface AdapterContext {
  user_id: string;
  org_id?: string;
  credentials: SurfaceCredentials;   // decrypted at action-start, scoped to function
  browserContext: BrowserContext;     // Playwright context, ready to use
  torCircuit: TorCircuit;             // pre-acquired
  captchaSolver: CaptchaSolverClient; // 3-tier
  agentActionsClient: AgentActionsClient;  // writes the audit row
  logger: Logger;                     // structured logging (never logs credential values)
}

precheck runs deterministic eligibility gates (per brief K blocklist + per-surface rate-limit check + jurisdiction per K §K4); if any fails, action is declined without container spin-up.

execute runs the Playwright instructions, handling captchas via the 3-tier captcha-solver, writing audit rows on success/fail.

rollback (optional) undoes the action — e.g. delete the post, remove the bump (where the surface supports it).

Layer 7 — Observability + safety

  • Structured logs: every adapter action emits {user_id, surface, action, step, outcome, duration_ms} to platform.api's logging pipeline. Credential values, raw HTML, and screenshots are NEVER logged (PII risk; container-only debug).
  • Screenshot capture: on every failure + on opt-in --debug, save screenshots to /data/debug/<user_id>/<surface>/<timestamp>.png with 7-day TTL. Helps diagnose flakes without leaking creds.
  • Per-surface rate-limit guardrails: enforced at adapter layer regardless of policy (Cocotte respects platform rate-caps even if Quinn's policy says otherwise).
  • Kill-switch integration: per brief K §K5, kill-switch causes adapter pool to drain (in-flight actions complete or abort; queued actions purge; no new actions accepted).
  • Per-container resource caps: 512MB RAM, 1 CPU, 10MB/s network. Prevents one runaway action from starving the pool.

Migration plan

Step 1 — Extract v1 talent-scout from archive

  • Build v1 archive tarball if not yet built (via ./scripts/build-archives.sh on apricot).
  • ./scripts/extract-archive.sh platform.1 to local /tmp/cocottetech-archive/platform.1/.
  • Inspect codebase/tools/talent-scout/ + operations/talent-scout/:
    • Scraper code (Playwright? Puppeteer?)
    • Captcha-solver model training pipeline
    • Training data
    • Inference code

Step 2 — Rebuild captcha solver model in @ml/

  • Workspace location: ~/Code/@applications/@ml/captcha-solver/
  • Inputs: training data from v1 + any open-source captcha datasets to bolster.
  • Output: ONNX-portable model (~200500MB target; smaller than 3.8GB v1 model via distillation if possible).
  • Service wrapper: FastAPI/Python or Node-onnxruntime; POST /solve API.

Step 3 — Build Playwright surface-adapter base image

  • Dockerfile at @ai/@skills/_shared/surface-adapter-base/Dockerfile.
  • Base: mcr.microsoft.com/playwright:focal or equivalent.
  • Adds: playwright-extra, stealth plugin, Tor SOCKS5 client, fingerprint manager.
  • Exposes: gRPC or HTTP interface for adapter actions to issue browser commands.

Step 4 — First per-surface adapter: @cocottetech/@platform/codebase/@features/bookings-tryst/adapter/

  • login/index.ts — handles Tryst's login form including 2FA + captcha.
  • bump/index.ts — issues the availability bump (calls login first if session expired).
  • update-profile/index.ts — applies structured profile edits per tryst-profile-editor.screen.md.
  • fetch-inbox/index.ts — polls DMs per tryst-inbox.screen.md.
  • Each action exports the SurfaceAdapterAction interface above.

Step 5 — Captcha HITL screen

  • New screen captcha-solve.screen.md — image render + input + submit. iOS push deeplink target.
  • Backend: /api/v1/captcha-challenges/:id endpoint that surfaces pending challenges to iOS + accepts solutions.

Step 6 — Per-surface adapter rollout

  • TS4Rent, Slixa, Eros, OnlyFans, X follow the Tryst template. Per-surface variations:
    • X / Threads / Bluesky: real APIs exist (cheaper to skip Playwright; direct HTTP).
    • WhatsApp / Signal / Telegram: Android emulator route (slower; only when web-equivalent absent).
    • Tryst / TS4Rent / Slixa / Eros / OF: full Playwright + captcha pipeline.

Captcha-solver retraining notes

The 3.8GB v1 model is too big for our needs. Recommended:

  • Distill to a 200500MB model via teacher-student training (use the v1 model as teacher if we can resurrect it; otherwise use commercial APIs as ephemeral teachers during distillation).
  • Multi-task the new model — train on hCaptcha + reCAPTCHA + Tryst-specific + a few others rather than per-platform-per-model. Saves disk + reduces retraining frequency.
  • Online refinement: every HITL captcha Quinn solves becomes a labeled training example (with consent). Slow but compounds.

Open questions

  • Captcha-solver vendor fallback: ship with paid 2captcha/anti-captcha/capsolver as a cheap Tier-2.5 (between ML and HITL)? Cost is ~$0.0010.003 per solve; small for Quinn's volume. Lean: yes, as a third Tier-2 alternative; configurable per-user (some prefer HITL over paying a 3rd party).
  • Android emulator host: apricot or a dedicated GPU host? Emulators are RAM-heavy; ~2GB per instance. With Quinn alone, 1 instance suffices; multi-tenant scaling will need allocation strategy. Defer.
  • Per-surface "warmth" persistence: how long do we keep a browser context idle before destroying it? Tradeoff between fast re-acquire (warm context = no re-login) and resource cost. Lean: per-surface configurable; default 24h idle TTL.
  • Recovery from "this browser is automated" detection: Cloudflare / Akamai / DataDome often catch automation regardless of stealth measures. When detected (specific error patterns), Cocotte should escalate to HITL captcha + fingerprint regeneration; if recurring, surface as a degraded-mode banner per brief M.

Out of scope

  • Container orchestration platform choice (k8s / nomad / docker-compose) — engineering call later.
  • Anti-detection cat-and-mouse with specific platforms (will be ongoing; spec'd here is the framework, not per-surface tactics).
  • Multi-region container deployment (Quinn-only at P0; multi-tenant scaling is W brief territory).