lilith-platform.live/codebase/@features/ad-watch/src/classify-parse.ts
Natalie 769bfcd61d feat(ad-watch): plum stdio MCP — scrape ad-platform listings, diff vs canonical
quinn-adwatch: a stateless, plum-local stdio MCP that scrapes Quinn's live
listings on her 11 ad platforms (Eros/Tryst/TS4Rent/MegaPersonals/TSEscorts/
AdultLook/AdultSearch/SkipTheGames + OnlyFans/Fansly/ManyVids) and surfaces
discrepancies vs the canonical provider-config profile.

- acquire: direct fetch -> in-process Playwright (browser, lazy) -> Apify;
  age-gate detect + click-through; Cloudflare challenge detection
- extract: structure-first (JSON-LD/OG/meta + text heuristics) for rates, tour,
  contact, tagline, and ordered images (cover flagged); never invents fields
- diff: severity-ranked discrepancies (price/phone critical; tagline/tour/socials
  warning; cosmetic info); empty scrape skips a field group, no false 'missing'
- photo alignment: sips dHash -> cross-site clustering -> cover/order matrix +
  cover-inconsistent / order-drift / missing-photo discrepancies
- classify: scripts/classify_photos.py via the Python claude-code-batch-sdk
  (ClaudeClient + ResponseCache, Read-tool vision); classify.ts is a thin bridge

Black-independent by design (black + apricot expected to stay down): all deps are
public npm (SDK StdioServerTransport, no @lilith/mcp-common), classify uses the
on-disk Python SDK + local claude CLI, and ADWATCH_CANONICAL_FILE diffs against a
local provider-config snapshot. 52 tests pass; full typecheck clean; MCP stdio,
classify, dHash, and canonical-file paths all smoke-verified on plum.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 19:11:33 -04:00

67 lines
2.1 KiB
TypeScript

/**
* Pure parsing/coercion for photo classification — coerces the JSON rows that
* scripts/classify_photos.py (claude-code-batch-sdk) returns into PhotoLabels.
* No subprocess/SDK here, so it stays trivially unit-testable.
*/
/** Reuses the gallery GalleryItem.category enum so labels line up with the site. */
export const PHOTO_CATEGORIES = [
'glamour',
'casual',
'suggestive',
'headshot',
'lifestyle',
'portrait',
] as const;
export type PhotoCategory = (typeof PHOTO_CATEGORIES)[number];
export interface PhotoLabel {
photoId: string;
category: PhotoCategory;
/** 0..1 — how well this works as a profile cover/thumbnail. */
thumbnailFitness: number;
faceVisible: boolean;
note: string;
error?: string;
}
/** Pull the first balanced {...} JSON object out of a model reply. */
export function extractJson(text: string): Record<string, unknown> | null {
const start = text.indexOf('{');
if (start < 0) return null;
let depth = 0;
for (let i = start; i < text.length; i++) {
const ch = text[i];
if (ch === '{') depth++;
else if (ch === '}') {
depth--;
if (depth === 0) {
try {
return JSON.parse(text.slice(start, i + 1)) as Record<string, unknown>;
} catch {
return null;
}
}
}
}
return null;
}
/** Validate/clamp a raw model object into a PhotoLabel. */
export function coerceLabel(photoId: string, raw: Record<string, unknown>): PhotoLabel {
// The model occasionally returns category as a one-element array — unwrap it.
const catField = Array.isArray(raw['category']) ? raw['category'][0] : raw['category'];
const catRaw = String(catField ?? '').toLowerCase();
const category = (PHOTO_CATEGORIES as readonly string[]).includes(catRaw)
? (catRaw as PhotoCategory)
: 'portrait';
const fitnessRaw = Number(raw['thumbnailFitness']);
const thumbnailFitness = Number.isFinite(fitnessRaw) ? Math.min(1, Math.max(0, fitnessRaw)) : 0;
return {
photoId,
category,
thumbnailFitness,
faceVisible: Boolean(raw['faceVisible']),
note: typeof raw['note'] === 'string' ? raw['note'] : '',
};
}