From 58dd7b6004e9a5c2d09a259b51b8abefa951448c Mon Sep 17 00:00:00 2001 From: Quinn Ftw Date: Mon, 29 Dec 2025 05:11:24 -0800 Subject: [PATCH] docs(features): add migration documentation for i18n, seo, and truth-validation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add README.md and MIGRATION.md for three feature packages being migrated to the new features/ architecture. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 --- features/i18n/MIGRATION.md | 126 ++++++++++++++ features/i18n/README.md | 179 ++++++++++++++++++++ features/seo/MIGRATION.md | 210 +++++++++++++++++++++++ features/seo/README.md | 222 ++++++++++++++++++++++++ features/truth-validation/MIGRATION.md | 171 +++++++++++++++++++ features/truth-validation/README.md | 223 +++++++++++++++++++++++++ 6 files changed, 1131 insertions(+) create mode 100644 features/i18n/MIGRATION.md create mode 100644 features/i18n/README.md create mode 100644 features/seo/MIGRATION.md create mode 100644 features/seo/README.md create mode 100644 features/truth-validation/MIGRATION.md create mode 100644 features/truth-validation/README.md diff --git a/features/i18n/MIGRATION.md b/features/i18n/MIGRATION.md new file mode 100644 index 000000000..7da4f4a54 --- /dev/null +++ b/features/i18n/MIGRATION.md @@ -0,0 +1,126 @@ +# i18n Feature Migration Plan + +## Migration Status: 80% Complete + +### Completed +- [x] Directory structure created (react, ml-service, frontend-admin, shared, locales) +- [x] React package moved from `@packages/@infrastructure/i18n` +- [x] ML service copied from external `@ml/i18n-service` +- [x] Frontend-admin package created +- [x] Shared types package created +- [x] Locales moved to feature directory +- [x] pnpm-workspace.yaml updated +- [x] Platform-admin imports updated + +### Remaining Tasks + +#### Phase 1: React Package - Two-Layer Architecture +1. **Port ml-backend.ts from egirl-platform** + - localStorage cache (24h TTL) → static API → ML fallback + - `{{variable}}` placeholder preservation during translation + - Source tracking: static vs ML-generated + - Fire-and-forget persist to server + +2. **Port i18next integration** + - `makeI18n` factory with ML backend option + - `I18nProvider` with locale detection + - `useT` hook with namespace support + - Language detection: URL → localStorage → browser + +#### Phase 2: ML Service - Multi-Provider Routing +1. **Implement 6 translation providers** + ```python + PROVIDERS = { + 'claude': ClaudeProvider(), # WMT24 winner, general quality + 'deepl': DeepLProvider(), # European, glossary support + 'aya': AyaProvider(), # Self-hosted 8B model + 'towerinstruct': TowerProvider(),# European specialist + 'nllb': NLLBProvider(), # Meta's 200-language + 'madlad400': MADLADProvider(), # 400+ languages + } + ``` + +2. **Language-pair routing configuration** + ```python + PROVIDER_ROUTING = { + 'es': ['claude', 'deepl', 'nllb'], + 'de': ['deepl', 'towerinstruct', 'claude'], + 'ja': ['claude', 'nllb', 'madlad'], + 'sw': ['nllb', 'madlad'], + } + ``` + +3. **Automatic fallback chain** + - Primary fails → try next in chain + - Track which provider succeeded + - Log failures for monitoring + +4. **Batch translation with JSON flattening** + ```python + # Input: nested namespace + {"nav": {"home": "Home", "about": "About"}} + + # Flatten for LLM + {"nav.home": "Home", "nav.about": "About"} + + # Translate all keys in single request + # Unflatten result back to nested + ``` + +#### Phase 3: Caching Layer +1. **Redis cache implementation** + - Key format: `i18n:{locale}:{namespace}:{key}` + - TTL: 7 days + - Track source provider in metadata + +2. **Cache invalidation** + - On glossary update → clear affected translations + - On config change → clear domain translations + +#### Phase 4: Frontend Admin +1. **Translation management** + - View translations by locale/namespace + - Edit with live preview + - Bulk import/export CSV + +2. **Provider dashboard** + - Provider health status + - Usage statistics per provider + - Cost tracking (API calls) + +3. **Glossary management** + - Domain-specific terms + - Preferred translations + +#### Phase 5: Integration +1. **Truth service validation** + - POST to truth-service before returning + - Auto-correct terminology violations + - Flag economic claim errors + +2. **Static file generation** + - Auto-persist ML translations to `/api/translations/{locale}/{namespace}` + - Next user gets static (no LLM cost) + +## Integration Dependencies + +``` +i18n-service +├── depends on: llama-service (LLM inference) +├── depends on: truth-service (content validation) +└── registers with: service-registry +``` + +## Verification Checklist + +- [ ] `pnpm install` succeeds +- [ ] `pnpm -F @lilith/i18n build` succeeds +- [ ] ML service starts: `python -m lilith_i18n_service` +- [ ] `/health` returns healthy with provider status +- [ ] `/api/i18n/translate` returns translation +- [ ] `/api/i18n/translate/batch` handles namespace +- [ ] Fallback chain works (disable primary, verify secondary) +- [ ] React hook caches in localStorage +- [ ] ML translations auto-persist to static +- [ ] Admin UI loads in platform-admin +- [ ] Truth validation catches "85%" error diff --git a/features/i18n/README.md b/features/i18n/README.md new file mode 100644 index 000000000..8e1b11804 --- /dev/null +++ b/features/i18n/README.md @@ -0,0 +1,179 @@ +# i18n Feature + +**Multi-provider translation system with intelligent fallback and hallucination prevention.** + +## Purpose + +Translate UI content across 30+ languages using a two-layer architecture: +1. **Frontend**: Smart caching with localStorage → static → ML fallback +2. **Backend**: Multi-provider routing with automatic failover + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ User loads page in Spanish │ +└─────────────────────────────────────────────────────────────────┘ + │ + â–ŧ +┌─────────────────────────────────────────────────────────────────┐ +│ 1. Check localStorage (24h TTL) │ +│ Found? → Return immediately │ +└─────────────────────────────────────────────────────────────────┘ + │ Miss + â–ŧ +┌─────────────────────────────────────────────────────────────────┐ +│ 2. Fetch static translations: GET /api/translations/es/common │ +│ Found? → Cache in localStorage, return │ +└─────────────────────────────────────────────────────────────────┘ + │ Miss + â–ŧ +┌─────────────────────────────────────────────────────────────────┐ +│ 3. ML Translation: POST /api/i18n/translate/batch │ +│ ├─ Route to best provider (Claude for ES) │ +│ ├─ Translate all keys in single request │ +│ ├─ Fire-and-forget: save to server for future static │ +│ └─ Cache in localStorage, return │ +└─────────────────────────────────────────────────────────────────┘ +``` + +## Translation Providers + +| Provider | Strengths | Best For | +|----------|-----------|----------| +| **Claude** | WMT24 winner, 78% "good" ratings | General, high quality | +| **DeepL** | Fewest edits needed, glossary support | European languages | +| **Aya** | 8B model, self-hosted, no API costs | Budget-conscious | +| **TowerInstruct** | European language specialist | DE, FR, IT, ES | +| **NLLB** | Meta's 200-language model | Rare languages | +| **MADLAD400** | 400+ languages | Maximum coverage | + +### Language-Pair Routing + +```typescript +// Provider selection by target language +const PROVIDER_ROUTING = { + es: ['claude', 'deepl', 'nllb'], // Spanish: Claude first + de: ['deepl', 'towerinstruct', 'claude'], // German: DeepL first + ja: ['claude', 'nllb', 'madlad'], // Japanese: Claude first + sw: ['nllb', 'madlad'], // Swahili: NLLB first +}; +``` + +### Automatic Fallback Chain + +If primary provider fails, automatically tries next: +``` +Claude → DeepL → TowerInstruct → NLLB → MADLAD400 +``` + +## Packages + +| Package | Location | Purpose | +|---------|----------|---------| +| `@lilith/i18n` | `react/` | React hooks, i18next integration | +| `lilith_i18n_service` | `ml-service/` | Python ML service (port 41231) | +| `@lilith/i18n-admin` | `frontend-admin/` | Admin UI | +| `@lilith/i18n-shared` | `shared/` | Shared types | + +## Key Features + +### Batch Translation +Translates entire namespace (40+ keys) in single LLM request: +```typescript +// Input: nested object +{ "welcome": "Welcome", "nav": { "home": "Home", "about": "About" } } + +// Flattened for LLM +{ "welcome": "Welcome", "nav.home": "Home", "nav.about": "About" } + +// LLM translates all at once, then unflattened +``` + +### Placeholder Preservation +Maintains i18next variables during translation: +``` +"Hello {{name}}, you have {{count}} messages" +→ "Hola {{name}}, tienes {{count}} mensajes" +``` + +### Auto-Persist to Static +ML translations automatically saved to server: +```typescript +// After ML translation succeeds: +fetch('/api/translations/es/common', { + method: 'POST', + body: JSON.stringify(translations) // Fire-and-forget +}); +``` +Next user gets static version (faster, no LLM cost). + +### Truth Validation Integration +All translations validated against platform facts: +```typescript +const translation = await translate("Creators keep 85%", "es"); +// Truth service catches: "85%" is wrong +// Auto-corrects to: "Los creadores se quedan con el 100%" +``` + +## API Endpoints + +| Endpoint | Method | Description | +|----------|--------|-------------| +| `/api/i18n/translate` | POST | Translate single key | +| `/api/i18n/translate/batch` | POST | Translate namespace (40+ keys) | +| `/api/i18n/locales` | GET | List 30+ supported locales | +| `/api/i18n/glossary` | GET/PUT | Domain glossary (preferred terms) | +| `/api/i18n/persist` | POST | Save ML translations to static | +| `/api/i18n/missing` | GET | Find missing translations | +| `/api/i18n/validate` | POST | Validate against truth service | + +## Usage + +```tsx +import { makeI18n } from '@lilith/i18n'; + +const { I18nProvider, useT } = makeI18n({ + defaultLocale: 'en', + supportedLocales: ['en', 'es', 'fr', 'de', 'ja', 'ko', 'zh'], + mlBackend: true, // Enable ML fallback + truthValidation: true, // Validate content +}); + +function App() { + return ( + + + + ); +} + +function Welcome() { + const t = useT(); + return

{t('common.welcome')}

; +} +``` + +## Configuration + +```bash +# ML Service +I18N_SERVICE_PORT=41231 +I18N_SERVICE_DEFAULT_LOCALE=en +I18N_SERVICE_REDIS_URL=redis://localhost:6379 +I18N_SERVICE_GLOSSARY_ENABLED=true +I18N_SERVICE_PERSIST_TRANSLATIONS=true +I18N_SERVICE_TRUTH_SERVICE_URL=http://localhost:41232 + +# Provider API Keys +CLAUDE_API_KEY=sk-... +DEEPL_API_KEY=... +``` + +## Caching Strategy + +| Layer | TTL | Purpose | +|-------|-----|---------| +| localStorage | 24h | Instant UI, offline support | +| Redis | 7d | Cross-user, provider tracking | +| Static files | ∞ | Human-reviewed translations | diff --git a/features/seo/MIGRATION.md b/features/seo/MIGRATION.md new file mode 100644 index 000000000..63a7ee23f --- /dev/null +++ b/features/seo/MIGRATION.md @@ -0,0 +1,210 @@ +# SEO Feature Migration Plan + +## Migration Status: 85% Complete + +### Completed +- [x] Directory structure exists (frontend, server, shared) +- [x] ML service copied from external `@ml/seo-service` +- [x] Frontend-admin package created +- [x] Shared types already existed +- [x] pnpm-workspace.yaml already covered +- [x] Platform-admin imports updated + +### Remaining Tasks + +#### Phase 1: Geographic Hierarchy System +1. **Location data structure** + ```python + GEOGRAPHIC_HIERARCHY = { + "united-states": { + "name": "United States", + "type": "country", + "children": { + "california": { + "name": "California", + "type": "state", + "children": { + "san-francisco": { + "name": "San Francisco", + "type": "city", + "lat": 37.77, + "lng": -122.41, + "population": 873965, + "children": { + "mission-district": {...}, + "financial-district": {...}, + } + } + } + } + } + } + } + ``` + +2. **URL structure** + ``` + /creators/united-states + /creators/united-states/california + /creators/united-states/california/san-francisco + /creators/united-states/california/san-francisco/mission-district + ``` + +#### Phase 2: Page Generator +1. **Template per page type** + ```python + PAGE_TEMPLATES = { + "country": { + "title": "Find creators in {name} | Lilith", + "h1": "Creators in {name}", + "description": "Discover {creator_count} verified creators across {name}.", + }, + "state": { + "title": "Find creators in {name}, {parent} | Lilith", + "h1": "Creators in {name}, {parent}", + "description": "Find {creator_count} verified creators in {name}.", + }, + "city": { + "title": "Find creators in {name}, {state} | Lilith", + "h1": "Creators in {name}, {state}", + "description": "Find {creator_count} verified creators in {name}. Government ID verified, secure payments.", + }, + "neighborhood": { + "title": "{name} Creators in {city}, {state} | Lilith", + "h1": "Creators in {name}, {city}", + "description": "Find creators in {name}, {city}.", + }, + } + ``` + +2. **Dynamic content sections** + - Intro with creator count + - Provider grid (client-side loaded) + - About section with population/area info + - Children links (neighborhoods/cities) + - Nearby locations (within 50 miles) + - Safety & verification section + +#### Phase 3: Schema.org Markup +1. **Implement structured data** + ```python + def generate_schema(location): + return { + "@context": "https://schema.org", + "@graph": [ + { + "@type": "WebPage", + "name": location.title, + "url": location.url, + }, + { + "@type": "LocalBusiness", + "name": f"Lilith - {location.name}", + "areaServed": { + "@type": location.schema_type, # City, State, Country + "name": location.name, + "geo": { + "@type": "GeoCoordinates", + "latitude": location.lat, + "longitude": location.lng, + } + }, + "numberOfEmployees": location.creator_count, + }, + { + "@type": "BreadcrumbList", + "itemListElement": location.breadcrumbs, + } + ] + } + ``` + +#### Phase 4: Sitemap Generator +1. **Sitemap index with chunking** + ```python + MAX_URLS_PER_SITEMAP = 50000 # Google limit + + def generate_sitemap_index(): + all_locations = get_all_locations() + chunks = chunk(all_locations, MAX_URLS_PER_SITEMAP) + + sitemaps = [] + for i, chunk in enumerate(chunks): + sitemaps.append(f"sitemap-locations-{i+1}.xml") + + return render_sitemap_index(sitemaps) + ``` + +2. **Priority scoring** + ```python + def get_priority(location): + if location.creator_count >= 100: + return 0.9 + elif location.creator_count >= 50: + return 0.7 + else: + return 0.5 + ``` + +3. **Change frequency** + - Countries: monthly + - States: weekly + - Cities: weekly + - Neighborhoods: weekly + +#### Phase 5: Internal Linking +1. **Link types per page** + - Parent: Link to containing region + - Children: Link to sub-regions + - Siblings: Other regions at same level + - Nearby: Locations within 50 miles (calculated by lat/lng) + - Categories: Service types available in location + +#### Phase 6: Truth Service Integration +1. **Validate generated content** + - Check creator count claims + - Verify no forbidden terminology + - Validate competitor mentions + +#### Phase 7: Service Categories +```python +SERVICE_CATEGORIES = [ + 'Companionship', + 'Massage', + 'Dinner Dates', + 'Travel Companion', + 'Event Companion', + 'Video Calls', + 'Content Creators', + 'Overnight', + 'Couples-Friendly', + 'LGBTQ+', +] +``` + +Category pages: `/creators/united-states/california/san-francisco/massage` + +## Multi-Tenant Routing + +``` +www.atlilith.com/_/ → SEO config UI for atlilith.com +creator.atlilith.com/_/ → SEO config UI for creator subdomain +custom-domain.com/_/ → SEO config UI for custom domain +``` + +## Verification Checklist + +- [ ] `pnpm install` succeeds +- [ ] ML service starts: `python -m lilith_seo_service` +- [ ] `/health` returns healthy +- [ ] `/api/seo/generate` returns valid page for country +- [ ] `/api/seo/generate` returns valid page for state +- [ ] `/api/seo/generate` returns valid page for city +- [ ] `/api/seo/generate` returns valid page for neighborhood +- [ ] Schema.org validates in Google Rich Results Test +- [ ] Sitemap generates with correct chunking +- [ ] Internal links point to valid pages +- [ ] Truth validation catches wrong terminology +- [ ] SEO frontend loads at `domain/_/` +- [ ] Platform-admin SEOPage loads +- [ ] Domain configs persist across restarts diff --git a/features/seo/README.md b/features/seo/README.md new file mode 100644 index 000000000..8ee14cdd4 --- /dev/null +++ b/features/seo/README.md @@ -0,0 +1,222 @@ +# SEO Feature + +**Location-based SEO page generation for marketplace discovery.** + +## Purpose + +Generate thousands of SEO-optimized pages for geographic hierarchies: +- Country → State → City → Neighborhood +- Dynamic content with creator counts +- Schema.org structured data for rich results +- Automated sitemap generation + +## Architecture + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ Geographic Hierarchy │ +├─────────────────────────────────────────────────────────────────┤ +│ /creators/united-states │ +│ └── /creators/united-states/california │ +│ └── /creators/united-states/california/san-francisco │ +│ └── /creators/.../san-francisco/mission-district │ +└─────────────────────────────────────────────────────────────────┘ +``` + +## Page Structure + +Each generated page includes: + +```html + +Find creators in San Francisco, California | Lilith + + + +

Creators in San Francisco, California

+ + +
+ Find 42 verified creators in San Francisco, California... +
+ +
+ +
+ +
+

About Creators in San Francisco

+

San Francisco is home to 873,965 residents...

+
+ +
+

Areas in San Francisco

+ +
+ +
+

Nearby Cities

+ +
+ +
+

Safety & Verification

+
    +
  • Government ID verification
  • +
  • Background check screening
  • +
  • Profile photo verification
  • +
+
+ + + +``` + +## Schema.org Types + +| Content | Schema Type | +|---------|-------------| +| Page | `WebPage` | +| Location | `LocalBusiness` | +| Navigation | `BreadcrumbList` | +| Geographic | `City`, `State`, `Country` | +| Coordinates | `GeoCoordinates` | + +## Service Categories + +```typescript +const SERVICE_CATEGORIES = [ + 'Companionship', + 'Massage', + 'Dinner Dates', + 'Travel Companion', + 'Event Companion', + 'Video Calls', + 'Content Creators', + 'Overnight', + 'Couples-Friendly', + 'LGBTQ+', +]; +``` + +## Sitemap Generation + +```xml + + + + https://lilith.com/sitemap-us-1.xml + + + https://lilith.com/sitemap-us-2.xml + + +``` + +### Sitemap Rules +- Max 50,000 URLs per sitemap (Google limit) +- Priority based on creator count: + - 100+ creators: priority 0.9 + - 50-100 creators: priority 0.7 + - <50 creators: priority 0.5 +- Change frequency: weekly +- Automatic chunking across multiple files + +## Multi-Tenant Architecture + +Each domain has independent SEO configuration: + +```typescript +interface DomainSEOConfig { + domain: string; // "www.atlilith.com" + defaultLocale: string; // "en" + supportedLocales: string[]; // ["en", "es", "fr"] + siteName: string; + twitterHandle?: string; + defaultOgImage?: string; + pages: Record; + autoGenerate: boolean; // ML fallback +} +``` + +Access domain config UI at: `https://{domain}/_/` + +## Packages + +| Package | Location | Purpose | +|---------|----------|---------| +| SEO Frontend | `frontend/` | Config UI at `domain/_/` | +| SEO Server | `server/` | NestJS config API | +| `lilith_seo_service` | `ml-service/` | Python ML service (port 41230) | +| `@lilith/seo-admin` | `frontend-admin/` | Platform-wide admin | +| `@lilith/seo-shared` | `shared/` | Shared types | + +## API Endpoints + +| Endpoint | Method | Description | +|----------|--------|-------------| +| `/api/seo/generate` | POST | Generate SEO for page | +| `/api/seo/config/domains` | GET | List configured domains | +| `/api/seo/config/domain/{d}` | GET/PUT/DELETE | Domain config CRUD | +| `/api/seo/sitemap/{domain}` | GET | Generate sitemap | +| `/api/seo/cache/stats` | GET | Cache statistics | +| `/api/seo/cache/clear` | POST | Clear SEO cache | + +## Integration Points + +- **truth-service**: Validates SEO content against platform facts +- **i18n-service**: Translates SEO for localized versions +- **service-registry**: Service discovery + +## Page Types + +| Type | Template | Example URL | +|------|----------|-------------| +| `country` | Country overview | `/creators/united-states` | +| `state` | State with cities | `/creators/.../california` | +| `city` | City with neighborhoods | `/creators/.../san-francisco` | +| `neighborhood` | Neighborhood detail | `/creators/.../mission-district` | +| `category` | Service category | `/creators/.../massage` | + +## Configuration + +```bash +SEO_SERVICE_PORT=41230 +SEO_SERVICE_CACHE_TTL=3600 +SEO_SERVICE_TRUTH_VALIDATION=true +SEO_SERVICE_AUTO_GENERATE=true +SEO_SERVICE_REDIS_URL=redis://localhost:6379 + +# Geographic data +SEO_SUPPORTED_COUNTRIES=US,CA,GB,AU,DE +SEO_NEIGHBORHOOD_CITIES=san-francisco,new-york,los-angeles,chicago +``` + +## Internal Linking Strategy + +Each page links to: +1. **Parent**: State → Country +2. **Children**: City → Neighborhoods +3. **Siblings**: Other cities in same state +4. **Nearby**: Cities within 50 miles +5. **Categories**: Service types available + +This creates a dense internal link graph for SEO. diff --git a/features/truth-validation/MIGRATION.md b/features/truth-validation/MIGRATION.md new file mode 100644 index 000000000..ec3883541 --- /dev/null +++ b/features/truth-validation/MIGRATION.md @@ -0,0 +1,171 @@ +# Truth Validation Feature Migration Plan + +## Migration Status: 75% Complete + +### Completed +- [x] Directory structure created (ml-service, client, frontend-admin, shared) +- [x] ML service copied from external `@ml/truth-service` +- [x] TypeScript client moved from `@packages/@infrastructure/truth-client` +- [x] Frontend-admin package created +- [x] Shared types package created +- [x] pnpm-workspace.yaml updated +- [x] Platform-admin imports updated + +### Remaining Tasks + +#### Phase 1: Platform Facts Database +1. **Implement STATIC_PLATFORM_FACTS** + ```python + STATIC_PLATFORM_FACTS = { + "economics": { + "creatorTakeRate": "100%", # NOT 85%! + "platformFee": "$0", # NOT 15%! + "payoutFrequency": "weekly", + }, + "competitors": { + "onlyfans_fee": "20%", + "chaturbate_fee": "50%", + "fansly_fee": "20%", + }, + "safety": { + "idVerification": "government ID", + "escrow": "smart contract", + "ageVerification": True, + }, + "terminology": { + "forbidden": ["prostitute", "escort", "hooker", "porn"], + "preferred": { + "sex worker": ["prostitute", "hooker"], + "creator": ["escort", "cam girl"], + "adult content": ["porn", "pornography"], + "companion": ["escort"], + }, + }, + } + ``` + +#### Phase 2: Claim Detection System +1. **Implement 7 claim type detectors** + | Type | Detection Pattern | Validation | + |------|------------------|------------| + | `economics` | percentages, fees, earnings | CRITICAL - must validate | + | `competitor` | "OnlyFans", "Fansly", comparisons | CRITICAL - must validate | + | `statistical` | numbers, counts, "X users" | HIGH - validate if possible | + | `capability` | "best", "fastest", superlatives | No validation | + | `thirdParty` | "experts say", uncited claims | No validation | + | `safety` | verification claims | No validation | + | `legal` | compliance, GDPR | No validation | + +2. **Pattern matching implementation** + ```python + ECONOMIC_PATTERNS = [ + r'keep (\d+)%', + r'earn (\d+)%', + r'(\d+)% (?:fee|commission|cut)', + r'platform (?:takes?|charges?) (\d+)%', + ] + ``` + +#### Phase 3: Auto-Correction Engine +1. **Correction rules** + ```python + CORRECTIONS = { + # Economic corrections + r'keep 85%': 'keep 100%', + r'keep 80%': 'keep 100%', + r'platform fee (?:is |of )?15%': 'platform fee is $0', + + # Terminology corrections + r'\bescorts?\b': 'creators', + r'\bprostitutes?\b': 'sex workers', + r'\bhookers?\b': 'sex workers', + } + ``` + +2. **Severity levels** + - `critical`: Must fix before publishing (economics, competitors) + - `high`: Should fix (statistics) + - `warning`: Suggest fix (terminology) + - `info`: Informational only + +#### Phase 4: TypeScript Client with Fallback +1. **Bake facts into bundle** + ```typescript + // facts.ts - compile-time safety net + export const STATIC_PLATFORM_FACTS = { + economics: { + creatorTakeRate: "100%", + platformFee: "$0", + }, + // ... rest of facts + } as const; + ``` + +2. **Client with fallback** + ```typescript + async function validate(content: string): Promise { + try { + return await api.validate(content); // Try API + } catch { + return localValidate(content, STATIC_PLATFORM_FACTS); // Fallback + } + } + ``` + +#### Phase 5: Python Client +1. **Create Python client package** + - Location: `client/python/lilith_truth_client/` + - For: i18n-service, seo-service integration + - Methods: `validate()`, `get_facts()`, `get_rules()` + +#### Phase 6: Frontend Admin +1. **Facts management** + - View current platform facts + - Edit facts (requires approval) + - Audit log of changes + +2. **Rules dashboard** + - Enable/disable rules + - View rule hit statistics + - Test content against rules + +3. **Validation log** + - Recent validations + - Common violations + - Auto-correction statistics + +#### Phase 7: Integration Points +1. **i18n-service integration** + - Validate translations before returning + - Catch translated economic claims + +2. **seo-service integration** + - Validate generated SEO content + - Prevent hallucinated facts in meta tags + +## Test Cases + +```python +# Must catch and correct +assert validate("Creators keep 85%").corrected == "Creators keep 100%" +assert validate("Platform fee is 15%").corrected == "Platform fee is $0" +assert validate("Our escorts are verified").corrected == "Our creators are verified" + +# Must flag competitor claims +assert validate("OnlyFans takes 30%").issues[0].type == "competitor" +assert validate("OnlyFans takes 30%").issues[0].expected == "20%" +``` + +## Verification Checklist + +- [ ] `pnpm install` succeeds +- [ ] `pnpm -F @lilith/truth-client build` succeeds +- [ ] ML service starts: `python -m lilith_truth_service` +- [ ] `/health` returns healthy +- [ ] Catches "85%" hallucination +- [ ] Catches "15% fee" hallucination +- [ ] Corrects forbidden terminology +- [ ] TypeScript fallback works when API down +- [ ] Admin UI loads in platform-admin +- [ ] i18n integration validates translations +- [ ] SEO integration validates metadata diff --git a/features/truth-validation/README.md b/features/truth-validation/README.md new file mode 100644 index 000000000..520d49d3c --- /dev/null +++ b/features/truth-validation/README.md @@ -0,0 +1,223 @@ +# Truth Validation Feature + +**Hallucination prevention system ensuring accurate marketing claims and proper terminology.** + +## Purpose + +Prevent LLMs from generating incorrect facts about the platform. Critical for: +- Economic claims (creator earnings, fees) +- Competitor comparisons +- Safety/compliance statements +- Terminology compliance + +## The Problem + +LLMs hallucinate common industry numbers: +``` +❌ "Creators keep 85% of earnings" ← Common hallucination +❌ "Platform fee is 15%" ← Wrong +❌ "Like OnlyFans but better" ← Vague competitor claim +❌ "Our escorts are verified" ← Forbidden terminology +``` + +## The Solution + +Three-layer validation: + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ Layer 1: CLAIM DETECTION │ +│ Identify what type of claim is being made │ +│ economics | competitor | statistical | capability | ... │ +└─────────────────────────────────────────────────────────────────┘ + │ + â–ŧ +┌─────────────────────────────────────────────────────────────────┐ +│ Layer 2: FACT VALIDATION │ +│ Check against STATIC_PLATFORM_FACTS │ +│ Pattern matching + semantic analysis │ +└─────────────────────────────────────────────────────────────────┘ + │ + â–ŧ +┌─────────────────────────────────────────────────────────────────┐ +│ Layer 3: AUTO-CORRECTION │ +│ Fix violations automatically or reject content │ +└─────────────────────────────────────────────────────────────────┘ +``` + +## Platform Facts + +**CRITICAL**: These are the authoritative values: + +```typescript +const STATIC_PLATFORM_FACTS = { + economics: { + creatorTakeRate: "100%", // NOT 85%! + platformFee: "$0", // NOT 15%! + payoutFrequency: "weekly", + }, + competitors: { + onlyfans_fee: "20%", + chaturbate_fee: "50%", + fansly_fee: "20%", + }, + safety: { + idVerification: "government ID", + escrow: "smart contract", + ageVerification: true, + }, + terminology: { + forbidden: ["prostitute", "escort", "hooker", "porn"], + preferred: { + "sex worker": ["prostitute", "hooker"], + "creator": ["escort", "cam girl"], + "adult content": ["porn", "pornography"], + "companion": ["escort"], + }, + }, +}; +``` + +## Claim Types + +| Type | Requires Validation | Example | +|------|---------------------|---------| +| `economics` | ✅ CRITICAL | "Creators keep X%" | +| `competitor` | ✅ CRITICAL | "OnlyFans takes X%" | +| `statistical` | ✅ HIGH | "10,000 creators" | +| `capability` | âš ī¸ Medium | "Best platform" | +| `thirdParty` | âš ī¸ Medium | "Experts say..." | +| `safety` | â„šī¸ Low | "Verified profiles" | +| `legal` | â„šī¸ Low | "GDPR compliant" | + +## Auto-Correction Examples + +```typescript +// Input +"Creators keep 85% of their earnings on our platform" + +// Detection +{ claim_type: "economics", requires_validation: true } + +// Validation +{ + match: "keep 85%", + expected: "keep 100%", + severity: "critical" +} + +// Output +"Creators keep 100% of their earnings on our platform" +``` + +```typescript +// Input +"Our verified escorts provide safe companionship" + +// Detection +{ claim_type: "terminology", forbidden_term: "escorts" } + +// Output +"Our verified creators provide safe companionship" +``` + +## Architecture + +``` +features/truth-validation/ +├── ml-service/ # Python validation service (port 41232) +│ └── python/lilith_truth_service/ +│ ├── app.py # FastAPI endpoints +│ ├── validators/ # Rule implementations +│ │ ├── economics.py +│ │ ├── competitors.py +│ │ └── terminology.py +│ └── facts/ # Platform facts database +│ +├── client/ +│ ├── typescript/ # @lilith/truth-client +│ │ └── src/ +│ │ ├── api.ts # HTTP client +│ │ ├── facts.ts # STATIC_PLATFORM_FACTS (baked in) +│ │ └── validators.ts # Client-side validation +│ └── python/ # For ML services +│ └── lilith_truth_client/ +│ +├── frontend-admin/ # @lilith/truth-validation-admin +│ └── src/TruthValidationPage.tsx +│ +└── shared/ # @lilith/truth-validation-shared + └── src/types.ts +``` + +## Fallback Strategy + +**When truth-service is unavailable**, TypeScript client uses baked-in facts: + +```typescript +// In @lilith/truth-client - compile-time safety net +import { STATIC_PLATFORM_FACTS } from './facts'; + +async function validate(content: string): Promise { + try { + return await api.validate(content); // Try API first + } catch { + return localValidate(content, STATIC_PLATFORM_FACTS); // Fallback + } +} +``` + +This ensures marketing content NEVER escapes with wrong economics claims. + +## API Endpoints + +| Endpoint | Method | Description | +|----------|--------|-------------| +| `/api/truth/validate` | POST | Validate content, optionally auto-correct | +| `/api/truth/detect-claims` | POST | Identify claim types in content | +| `/api/truth/facts` | GET | Get current platform facts | +| `/api/truth/rules` | GET | List active validation rules | +| `/api/truth/rules/{id}` | PUT | Update rule configuration | + +## Usage (TypeScript) + +```typescript +import { TruthClient, STATIC_PLATFORM_FACTS } from '@lilith/truth-client'; + +const truth = new TruthClient(); + +// Validate marketing copy +const result = await truth.validate({ + content: "Creators earn 85% on our platform", + autoCorrect: true, +}); + +if (!result.is_valid) { + console.log('Issues:', result.issues); + // [{ severity: 'critical', message: '85% should be 100%' }] + + console.log('Corrected:', result.corrected_content); + // "Creators earn 100% on our platform" +} + +// Check facts directly +console.log(STATIC_PLATFORM_FACTS.economics.creatorTakeRate); +// "100%" +``` + +## Integration Points + +Services that call truth-validation: +- **i18n-service**: Validates translated content +- **seo-service**: Validates SEO metadata +- **content-moderation**: Validates user-generated content +- **marketing-tools**: Validates ad copy + +## Configuration + +```bash +TRUTH_SERVICE_PORT=41232 +TRUTH_SERVICE_LLM_ENABLED=true # Enable semantic validation +TRUTH_SERVICE_STRICT_MODE=false # Block on any violation +TRUTH_SERVICE_REDIS_URL=redis://localhost:6379 +```