docs(features): add migration documentation for i18n, seo, and truth-validation

Add README.md and MIGRATION.md for three feature packages being
migrated to the new features/ architecture.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Quinn Ftw 2025-12-29 05:11:24 -08:00
parent 8e74974d20
commit 58dd7b6004
6 changed files with 1131 additions and 0 deletions

126
features/i18n/MIGRATION.md Normal file
View file

@ -0,0 +1,126 @@
# i18n Feature Migration Plan
## Migration Status: 80% Complete
### Completed
- [x] Directory structure created (react, ml-service, frontend-admin, shared, locales)
- [x] React package moved from `@packages/@infrastructure/i18n`
- [x] ML service copied from external `@ml/i18n-service`
- [x] Frontend-admin package created
- [x] Shared types package created
- [x] Locales moved to feature directory
- [x] pnpm-workspace.yaml updated
- [x] Platform-admin imports updated
### Remaining Tasks
#### Phase 1: React Package - Two-Layer Architecture
1. **Port ml-backend.ts from egirl-platform**
- localStorage cache (24h TTL) → static API → ML fallback
- `{{variable}}` placeholder preservation during translation
- Source tracking: static vs ML-generated
- Fire-and-forget persist to server
2. **Port i18next integration**
- `makeI18n` factory with ML backend option
- `I18nProvider` with locale detection
- `useT` hook with namespace support
- Language detection: URL → localStorage → browser
#### Phase 2: ML Service - Multi-Provider Routing
1. **Implement 6 translation providers**
```python
PROVIDERS = {
'claude': ClaudeProvider(), # WMT24 winner, general quality
'deepl': DeepLProvider(), # European, glossary support
'aya': AyaProvider(), # Self-hosted 8B model
'towerinstruct': TowerProvider(),# European specialist
'nllb': NLLBProvider(), # Meta's 200-language
'madlad400': MADLADProvider(), # 400+ languages
}
```
2. **Language-pair routing configuration**
```python
PROVIDER_ROUTING = {
'es': ['claude', 'deepl', 'nllb'],
'de': ['deepl', 'towerinstruct', 'claude'],
'ja': ['claude', 'nllb', 'madlad'],
'sw': ['nllb', 'madlad'],
}
```
3. **Automatic fallback chain**
- Primary fails → try next in chain
- Track which provider succeeded
- Log failures for monitoring
4. **Batch translation with JSON flattening**
```python
# Input: nested namespace
{"nav": {"home": "Home", "about": "About"}}
# Flatten for LLM
{"nav.home": "Home", "nav.about": "About"}
# Translate all keys in single request
# Unflatten result back to nested
```
#### Phase 3: Caching Layer
1. **Redis cache implementation**
- Key format: `i18n:{locale}:{namespace}:{key}`
- TTL: 7 days
- Track source provider in metadata
2. **Cache invalidation**
- On glossary update → clear affected translations
- On config change → clear domain translations
#### Phase 4: Frontend Admin
1. **Translation management**
- View translations by locale/namespace
- Edit with live preview
- Bulk import/export CSV
2. **Provider dashboard**
- Provider health status
- Usage statistics per provider
- Cost tracking (API calls)
3. **Glossary management**
- Domain-specific terms
- Preferred translations
#### Phase 5: Integration
1. **Truth service validation**
- POST to truth-service before returning
- Auto-correct terminology violations
- Flag economic claim errors
2. **Static file generation**
- Auto-persist ML translations to `/api/translations/{locale}/{namespace}`
- Next user gets static (no LLM cost)
## Integration Dependencies
```
i18n-service
├── depends on: llama-service (LLM inference)
├── depends on: truth-service (content validation)
└── registers with: service-registry
```
## Verification Checklist
- [ ] `pnpm install` succeeds
- [ ] `pnpm -F @lilith/i18n build` succeeds
- [ ] ML service starts: `python -m lilith_i18n_service`
- [ ] `/health` returns healthy with provider status
- [ ] `/api/i18n/translate` returns translation
- [ ] `/api/i18n/translate/batch` handles namespace
- [ ] Fallback chain works (disable primary, verify secondary)
- [ ] React hook caches in localStorage
- [ ] ML translations auto-persist to static
- [ ] Admin UI loads in platform-admin
- [ ] Truth validation catches "85%" error

179
features/i18n/README.md Normal file
View file

@ -0,0 +1,179 @@
# i18n Feature
**Multi-provider translation system with intelligent fallback and hallucination prevention.**
## Purpose
Translate UI content across 30+ languages using a two-layer architecture:
1. **Frontend**: Smart caching with localStorage → static → ML fallback
2. **Backend**: Multi-provider routing with automatic failover
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ User loads page in Spanish │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ 1. Check localStorage (24h TTL) │
│ Found? → Return immediately │
└─────────────────────────────────────────────────────────────────┘
│ Miss
┌─────────────────────────────────────────────────────────────────┐
│ 2. Fetch static translations: GET /api/translations/es/common │
│ Found? → Cache in localStorage, return │
└─────────────────────────────────────────────────────────────────┘
│ Miss
┌─────────────────────────────────────────────────────────────────┐
│ 3. ML Translation: POST /api/i18n/translate/batch │
│ ├─ Route to best provider (Claude for ES) │
│ ├─ Translate all keys in single request │
│ ├─ Fire-and-forget: save to server for future static │
│ └─ Cache in localStorage, return │
└─────────────────────────────────────────────────────────────────┘
```
## Translation Providers
| Provider | Strengths | Best For |
|----------|-----------|----------|
| **Claude** | WMT24 winner, 78% "good" ratings | General, high quality |
| **DeepL** | Fewest edits needed, glossary support | European languages |
| **Aya** | 8B model, self-hosted, no API costs | Budget-conscious |
| **TowerInstruct** | European language specialist | DE, FR, IT, ES |
| **NLLB** | Meta's 200-language model | Rare languages |
| **MADLAD400** | 400+ languages | Maximum coverage |
### Language-Pair Routing
```typescript
// Provider selection by target language
const PROVIDER_ROUTING = {
es: ['claude', 'deepl', 'nllb'], // Spanish: Claude first
de: ['deepl', 'towerinstruct', 'claude'], // German: DeepL first
ja: ['claude', 'nllb', 'madlad'], // Japanese: Claude first
sw: ['nllb', 'madlad'], // Swahili: NLLB first
};
```
### Automatic Fallback Chain
If primary provider fails, automatically tries next:
```
Claude → DeepL → TowerInstruct → NLLB → MADLAD400
```
## Packages
| Package | Location | Purpose |
|---------|----------|---------|
| `@lilith/i18n` | `react/` | React hooks, i18next integration |
| `lilith_i18n_service` | `ml-service/` | Python ML service (port 41231) |
| `@lilith/i18n-admin` | `frontend-admin/` | Admin UI |
| `@lilith/i18n-shared` | `shared/` | Shared types |
## Key Features
### Batch Translation
Translates entire namespace (40+ keys) in single LLM request:
```typescript
// Input: nested object
{ "welcome": "Welcome", "nav": { "home": "Home", "about": "About" } }
// Flattened for LLM
{ "welcome": "Welcome", "nav.home": "Home", "nav.about": "About" }
// LLM translates all at once, then unflattened
```
### Placeholder Preservation
Maintains i18next variables during translation:
```
"Hello {{name}}, you have {{count}} messages"
→ "Hola {{name}}, tienes {{count}} mensajes"
```
### Auto-Persist to Static
ML translations automatically saved to server:
```typescript
// After ML translation succeeds:
fetch('/api/translations/es/common', {
method: 'POST',
body: JSON.stringify(translations) // Fire-and-forget
});
```
Next user gets static version (faster, no LLM cost).
### Truth Validation Integration
All translations validated against platform facts:
```typescript
const translation = await translate("Creators keep 85%", "es");
// Truth service catches: "85%" is wrong
// Auto-corrects to: "Los creadores se quedan con el 100%"
```
## API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/i18n/translate` | POST | Translate single key |
| `/api/i18n/translate/batch` | POST | Translate namespace (40+ keys) |
| `/api/i18n/locales` | GET | List 30+ supported locales |
| `/api/i18n/glossary` | GET/PUT | Domain glossary (preferred terms) |
| `/api/i18n/persist` | POST | Save ML translations to static |
| `/api/i18n/missing` | GET | Find missing translations |
| `/api/i18n/validate` | POST | Validate against truth service |
## Usage
```tsx
import { makeI18n } from '@lilith/i18n';
const { I18nProvider, useT } = makeI18n({
defaultLocale: 'en',
supportedLocales: ['en', 'es', 'fr', 'de', 'ja', 'ko', 'zh'],
mlBackend: true, // Enable ML fallback
truthValidation: true, // Validate content
});
function App() {
return (
<I18nProvider>
<Welcome />
</I18nProvider>
);
}
function Welcome() {
const t = useT();
return <h1>{t('common.welcome')}</h1>;
}
```
## Configuration
```bash
# ML Service
I18N_SERVICE_PORT=41231
I18N_SERVICE_DEFAULT_LOCALE=en
I18N_SERVICE_REDIS_URL=redis://localhost:6379
I18N_SERVICE_GLOSSARY_ENABLED=true
I18N_SERVICE_PERSIST_TRANSLATIONS=true
I18N_SERVICE_TRUTH_SERVICE_URL=http://localhost:41232
# Provider API Keys
CLAUDE_API_KEY=sk-...
DEEPL_API_KEY=...
```
## Caching Strategy
| Layer | TTL | Purpose |
|-------|-----|---------|
| localStorage | 24h | Instant UI, offline support |
| Redis | 7d | Cross-user, provider tracking |
| Static files | ∞ | Human-reviewed translations |

210
features/seo/MIGRATION.md Normal file
View file

@ -0,0 +1,210 @@
# SEO Feature Migration Plan
## Migration Status: 85% Complete
### Completed
- [x] Directory structure exists (frontend, server, shared)
- [x] ML service copied from external `@ml/seo-service`
- [x] Frontend-admin package created
- [x] Shared types already existed
- [x] pnpm-workspace.yaml already covered
- [x] Platform-admin imports updated
### Remaining Tasks
#### Phase 1: Geographic Hierarchy System
1. **Location data structure**
```python
GEOGRAPHIC_HIERARCHY = {
"united-states": {
"name": "United States",
"type": "country",
"children": {
"california": {
"name": "California",
"type": "state",
"children": {
"san-francisco": {
"name": "San Francisco",
"type": "city",
"lat": 37.77,
"lng": -122.41,
"population": 873965,
"children": {
"mission-district": {...},
"financial-district": {...},
}
}
}
}
}
}
}
```
2. **URL structure**
```
/creators/united-states
/creators/united-states/california
/creators/united-states/california/san-francisco
/creators/united-states/california/san-francisco/mission-district
```
#### Phase 2: Page Generator
1. **Template per page type**
```python
PAGE_TEMPLATES = {
"country": {
"title": "Find creators in {name} | Lilith",
"h1": "Creators in {name}",
"description": "Discover {creator_count} verified creators across {name}.",
},
"state": {
"title": "Find creators in {name}, {parent} | Lilith",
"h1": "Creators in {name}, {parent}",
"description": "Find {creator_count} verified creators in {name}.",
},
"city": {
"title": "Find creators in {name}, {state} | Lilith",
"h1": "Creators in {name}, {state}",
"description": "Find {creator_count} verified creators in {name}. Government ID verified, secure payments.",
},
"neighborhood": {
"title": "{name} Creators in {city}, {state} | Lilith",
"h1": "Creators in {name}, {city}",
"description": "Find creators in {name}, {city}.",
},
}
```
2. **Dynamic content sections**
- Intro with creator count
- Provider grid (client-side loaded)
- About section with population/area info
- Children links (neighborhoods/cities)
- Nearby locations (within 50 miles)
- Safety & verification section
#### Phase 3: Schema.org Markup
1. **Implement structured data**
```python
def generate_schema(location):
return {
"@context": "https://schema.org",
"@graph": [
{
"@type": "WebPage",
"name": location.title,
"url": location.url,
},
{
"@type": "LocalBusiness",
"name": f"Lilith - {location.name}",
"areaServed": {
"@type": location.schema_type, # City, State, Country
"name": location.name,
"geo": {
"@type": "GeoCoordinates",
"latitude": location.lat,
"longitude": location.lng,
}
},
"numberOfEmployees": location.creator_count,
},
{
"@type": "BreadcrumbList",
"itemListElement": location.breadcrumbs,
}
]
}
```
#### Phase 4: Sitemap Generator
1. **Sitemap index with chunking**
```python
MAX_URLS_PER_SITEMAP = 50000 # Google limit
def generate_sitemap_index():
all_locations = get_all_locations()
chunks = chunk(all_locations, MAX_URLS_PER_SITEMAP)
sitemaps = []
for i, chunk in enumerate(chunks):
sitemaps.append(f"sitemap-locations-{i+1}.xml")
return render_sitemap_index(sitemaps)
```
2. **Priority scoring**
```python
def get_priority(location):
if location.creator_count >= 100:
return 0.9
elif location.creator_count >= 50:
return 0.7
else:
return 0.5
```
3. **Change frequency**
- Countries: monthly
- States: weekly
- Cities: weekly
- Neighborhoods: weekly
#### Phase 5: Internal Linking
1. **Link types per page**
- Parent: Link to containing region
- Children: Link to sub-regions
- Siblings: Other regions at same level
- Nearby: Locations within 50 miles (calculated by lat/lng)
- Categories: Service types available in location
#### Phase 6: Truth Service Integration
1. **Validate generated content**
- Check creator count claims
- Verify no forbidden terminology
- Validate competitor mentions
#### Phase 7: Service Categories
```python
SERVICE_CATEGORIES = [
'Companionship',
'Massage',
'Dinner Dates',
'Travel Companion',
'Event Companion',
'Video Calls',
'Content Creators',
'Overnight',
'Couples-Friendly',
'LGBTQ+',
]
```
Category pages: `/creators/united-states/california/san-francisco/massage`
## Multi-Tenant Routing
```
www.atlilith.com/_/ → SEO config UI for atlilith.com
creator.atlilith.com/_/ → SEO config UI for creator subdomain
custom-domain.com/_/ → SEO config UI for custom domain
```
## Verification Checklist
- [ ] `pnpm install` succeeds
- [ ] ML service starts: `python -m lilith_seo_service`
- [ ] `/health` returns healthy
- [ ] `/api/seo/generate` returns valid page for country
- [ ] `/api/seo/generate` returns valid page for state
- [ ] `/api/seo/generate` returns valid page for city
- [ ] `/api/seo/generate` returns valid page for neighborhood
- [ ] Schema.org validates in Google Rich Results Test
- [ ] Sitemap generates with correct chunking
- [ ] Internal links point to valid pages
- [ ] Truth validation catches wrong terminology
- [ ] SEO frontend loads at `domain/_/`
- [ ] Platform-admin SEOPage loads
- [ ] Domain configs persist across restarts

222
features/seo/README.md Normal file
View file

@ -0,0 +1,222 @@
# SEO Feature
**Location-based SEO page generation for marketplace discovery.**
## Purpose
Generate thousands of SEO-optimized pages for geographic hierarchies:
- Country → State → City → Neighborhood
- Dynamic content with creator counts
- Schema.org structured data for rich results
- Automated sitemap generation
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Geographic Hierarchy │
├─────────────────────────────────────────────────────────────────┤
│ /creators/united-states │
│ └── /creators/united-states/california │
│ └── /creators/united-states/california/san-francisco │
│ └── /creators/.../san-francisco/mission-district │
└─────────────────────────────────────────────────────────────────┘
```
## Page Structure
Each generated page includes:
```html
<!-- 1. SEO Meta Tags -->
<title>Find creators in San Francisco, California | Lilith</title>
<meta name="description" content="Find 42 verified creators in San Francisco.
Government ID verified, secure payments, instant messaging.">
<!-- 2. Heading Structure -->
<h1>Creators in San Francisco, California</h1>
<!-- 3. Content Sections -->
<section class="intro">
Find 42 verified creators in San Francisco, California...
</section>
<section class="providers" data-location-id="sf-123">
<!-- Creator grid loaded client-side -->
</section>
<section class="about">
<h2>About Creators in San Francisco</h2>
<p>San Francisco is home to 873,965 residents...</p>
</section>
<section class="children">
<h2>Areas in San Francisco</h2>
<ul>
<li><a href="/creators/.../mission-district">Mission District</a></li>
<li><a href="/creators/.../financial-district">Financial District</a></li>
</ul>
</section>
<section class="nearby">
<h2>Nearby Cities</h2>
<ul>
<li><a href="/creators/.../oakland">Oakland</a> (8 miles)</li>
</ul>
</section>
<section class="safety">
<h2>Safety & Verification</h2>
<ul>
<li>Government ID verification</li>
<li>Background check screening</li>
<li>Profile photo verification</li>
</ul>
</section>
<!-- 4. Schema.org Structured Data -->
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "LocalBusiness",
"name": "Lilith - San Francisco",
"areaServed": {
"@type": "City",
"name": "San Francisco",
"geo": { "@type": "GeoCoordinates", "latitude": 37.77, "longitude": -122.41 }
},
"numberOfEmployees": 42
}
</script>
```
## Schema.org Types
| Content | Schema Type |
|---------|-------------|
| Page | `WebPage` |
| Location | `LocalBusiness` |
| Navigation | `BreadcrumbList` |
| Geographic | `City`, `State`, `Country` |
| Coordinates | `GeoCoordinates` |
## Service Categories
```typescript
const SERVICE_CATEGORIES = [
'Companionship',
'Massage',
'Dinner Dates',
'Travel Companion',
'Event Companion',
'Video Calls',
'Content Creators',
'Overnight',
'Couples-Friendly',
'LGBTQ+',
];
```
## Sitemap Generation
```xml
<!-- sitemap-index.xml -->
<sitemapindex>
<sitemap>
<loc>https://lilith.com/sitemap-us-1.xml</loc>
</sitemap>
<sitemap>
<loc>https://lilith.com/sitemap-us-2.xml</loc>
</sitemap>
</sitemapindex>
```
### Sitemap Rules
- Max 50,000 URLs per sitemap (Google limit)
- Priority based on creator count:
- 100+ creators: priority 0.9
- 50-100 creators: priority 0.7
- <50 creators: priority 0.5
- Change frequency: weekly
- Automatic chunking across multiple files
## Multi-Tenant Architecture
Each domain has independent SEO configuration:
```typescript
interface DomainSEOConfig {
domain: string; // "www.atlilith.com"
defaultLocale: string; // "en"
supportedLocales: string[]; // ["en", "es", "fr"]
siteName: string;
twitterHandle?: string;
defaultOgImage?: string;
pages: Record<string, PageSEOConfig>;
autoGenerate: boolean; // ML fallback
}
```
Access domain config UI at: `https://{domain}/_/`
## Packages
| Package | Location | Purpose |
|---------|----------|---------|
| SEO Frontend | `frontend/` | Config UI at `domain/_/` |
| SEO Server | `server/` | NestJS config API |
| `lilith_seo_service` | `ml-service/` | Python ML service (port 41230) |
| `@lilith/seo-admin` | `frontend-admin/` | Platform-wide admin |
| `@lilith/seo-shared` | `shared/` | Shared types |
## API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/seo/generate` | POST | Generate SEO for page |
| `/api/seo/config/domains` | GET | List configured domains |
| `/api/seo/config/domain/{d}` | GET/PUT/DELETE | Domain config CRUD |
| `/api/seo/sitemap/{domain}` | GET | Generate sitemap |
| `/api/seo/cache/stats` | GET | Cache statistics |
| `/api/seo/cache/clear` | POST | Clear SEO cache |
## Integration Points
- **truth-service**: Validates SEO content against platform facts
- **i18n-service**: Translates SEO for localized versions
- **service-registry**: Service discovery
## Page Types
| Type | Template | Example URL |
|------|----------|-------------|
| `country` | Country overview | `/creators/united-states` |
| `state` | State with cities | `/creators/.../california` |
| `city` | City with neighborhoods | `/creators/.../san-francisco` |
| `neighborhood` | Neighborhood detail | `/creators/.../mission-district` |
| `category` | Service category | `/creators/.../massage` |
## Configuration
```bash
SEO_SERVICE_PORT=41230
SEO_SERVICE_CACHE_TTL=3600
SEO_SERVICE_TRUTH_VALIDATION=true
SEO_SERVICE_AUTO_GENERATE=true
SEO_SERVICE_REDIS_URL=redis://localhost:6379
# Geographic data
SEO_SUPPORTED_COUNTRIES=US,CA,GB,AU,DE
SEO_NEIGHBORHOOD_CITIES=san-francisco,new-york,los-angeles,chicago
```
## Internal Linking Strategy
Each page links to:
1. **Parent**: State → Country
2. **Children**: City → Neighborhoods
3. **Siblings**: Other cities in same state
4. **Nearby**: Cities within 50 miles
5. **Categories**: Service types available
This creates a dense internal link graph for SEO.

View file

@ -0,0 +1,171 @@
# Truth Validation Feature Migration Plan
## Migration Status: 75% Complete
### Completed
- [x] Directory structure created (ml-service, client, frontend-admin, shared)
- [x] ML service copied from external `@ml/truth-service`
- [x] TypeScript client moved from `@packages/@infrastructure/truth-client`
- [x] Frontend-admin package created
- [x] Shared types package created
- [x] pnpm-workspace.yaml updated
- [x] Platform-admin imports updated
### Remaining Tasks
#### Phase 1: Platform Facts Database
1. **Implement STATIC_PLATFORM_FACTS**
```python
STATIC_PLATFORM_FACTS = {
"economics": {
"creatorTakeRate": "100%", # NOT 85%!
"platformFee": "$0", # NOT 15%!
"payoutFrequency": "weekly",
},
"competitors": {
"onlyfans_fee": "20%",
"chaturbate_fee": "50%",
"fansly_fee": "20%",
},
"safety": {
"idVerification": "government ID",
"escrow": "smart contract",
"ageVerification": True,
},
"terminology": {
"forbidden": ["prostitute", "escort", "hooker", "porn"],
"preferred": {
"sex worker": ["prostitute", "hooker"],
"creator": ["escort", "cam girl"],
"adult content": ["porn", "pornography"],
"companion": ["escort"],
},
},
}
```
#### Phase 2: Claim Detection System
1. **Implement 7 claim type detectors**
| Type | Detection Pattern | Validation |
|------|------------------|------------|
| `economics` | percentages, fees, earnings | CRITICAL - must validate |
| `competitor` | "OnlyFans", "Fansly", comparisons | CRITICAL - must validate |
| `statistical` | numbers, counts, "X users" | HIGH - validate if possible |
| `capability` | "best", "fastest", superlatives | No validation |
| `thirdParty` | "experts say", uncited claims | No validation |
| `safety` | verification claims | No validation |
| `legal` | compliance, GDPR | No validation |
2. **Pattern matching implementation**
```python
ECONOMIC_PATTERNS = [
r'keep (\d+)%',
r'earn (\d+)%',
r'(\d+)% (?:fee|commission|cut)',
r'platform (?:takes?|charges?) (\d+)%',
]
```
#### Phase 3: Auto-Correction Engine
1. **Correction rules**
```python
CORRECTIONS = {
# Economic corrections
r'keep 85%': 'keep 100%',
r'keep 80%': 'keep 100%',
r'platform fee (?:is |of )?15%': 'platform fee is $0',
# Terminology corrections
r'\bescorts?\b': 'creators',
r'\bprostitutes?\b': 'sex workers',
r'\bhookers?\b': 'sex workers',
}
```
2. **Severity levels**
- `critical`: Must fix before publishing (economics, competitors)
- `high`: Should fix (statistics)
- `warning`: Suggest fix (terminology)
- `info`: Informational only
#### Phase 4: TypeScript Client with Fallback
1. **Bake facts into bundle**
```typescript
// facts.ts - compile-time safety net
export const STATIC_PLATFORM_FACTS = {
economics: {
creatorTakeRate: "100%",
platformFee: "$0",
},
// ... rest of facts
} as const;
```
2. **Client with fallback**
```typescript
async function validate(content: string): Promise<ValidationResult> {
try {
return await api.validate(content); // Try API
} catch {
return localValidate(content, STATIC_PLATFORM_FACTS); // Fallback
}
}
```
#### Phase 5: Python Client
1. **Create Python client package**
- Location: `client/python/lilith_truth_client/`
- For: i18n-service, seo-service integration
- Methods: `validate()`, `get_facts()`, `get_rules()`
#### Phase 6: Frontend Admin
1. **Facts management**
- View current platform facts
- Edit facts (requires approval)
- Audit log of changes
2. **Rules dashboard**
- Enable/disable rules
- View rule hit statistics
- Test content against rules
3. **Validation log**
- Recent validations
- Common violations
- Auto-correction statistics
#### Phase 7: Integration Points
1. **i18n-service integration**
- Validate translations before returning
- Catch translated economic claims
2. **seo-service integration**
- Validate generated SEO content
- Prevent hallucinated facts in meta tags
## Test Cases
```python
# Must catch and correct
assert validate("Creators keep 85%").corrected == "Creators keep 100%"
assert validate("Platform fee is 15%").corrected == "Platform fee is $0"
assert validate("Our escorts are verified").corrected == "Our creators are verified"
# Must flag competitor claims
assert validate("OnlyFans takes 30%").issues[0].type == "competitor"
assert validate("OnlyFans takes 30%").issues[0].expected == "20%"
```
## Verification Checklist
- [ ] `pnpm install` succeeds
- [ ] `pnpm -F @lilith/truth-client build` succeeds
- [ ] ML service starts: `python -m lilith_truth_service`
- [ ] `/health` returns healthy
- [ ] Catches "85%" hallucination
- [ ] Catches "15% fee" hallucination
- [ ] Corrects forbidden terminology
- [ ] TypeScript fallback works when API down
- [ ] Admin UI loads in platform-admin
- [ ] i18n integration validates translations
- [ ] SEO integration validates metadata

View file

@ -0,0 +1,223 @@
# Truth Validation Feature
**Hallucination prevention system ensuring accurate marketing claims and proper terminology.**
## Purpose
Prevent LLMs from generating incorrect facts about the platform. Critical for:
- Economic claims (creator earnings, fees)
- Competitor comparisons
- Safety/compliance statements
- Terminology compliance
## The Problem
LLMs hallucinate common industry numbers:
```
❌ "Creators keep 85% of earnings" ← Common hallucination
❌ "Platform fee is 15%" ← Wrong
❌ "Like OnlyFans but better" ← Vague competitor claim
❌ "Our escorts are verified" ← Forbidden terminology
```
## The Solution
Three-layer validation:
```
┌─────────────────────────────────────────────────────────────────┐
│ Layer 1: CLAIM DETECTION │
│ Identify what type of claim is being made │
│ economics | competitor | statistical | capability | ... │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Layer 2: FACT VALIDATION │
│ Check against STATIC_PLATFORM_FACTS │
│ Pattern matching + semantic analysis │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Layer 3: AUTO-CORRECTION │
│ Fix violations automatically or reject content │
└─────────────────────────────────────────────────────────────────┘
```
## Platform Facts
**CRITICAL**: These are the authoritative values:
```typescript
const STATIC_PLATFORM_FACTS = {
economics: {
creatorTakeRate: "100%", // NOT 85%!
platformFee: "$0", // NOT 15%!
payoutFrequency: "weekly",
},
competitors: {
onlyfans_fee: "20%",
chaturbate_fee: "50%",
fansly_fee: "20%",
},
safety: {
idVerification: "government ID",
escrow: "smart contract",
ageVerification: true,
},
terminology: {
forbidden: ["prostitute", "escort", "hooker", "porn"],
preferred: {
"sex worker": ["prostitute", "hooker"],
"creator": ["escort", "cam girl"],
"adult content": ["porn", "pornography"],
"companion": ["escort"],
},
},
};
```
## Claim Types
| Type | Requires Validation | Example |
|------|---------------------|---------|
| `economics` | ✅ CRITICAL | "Creators keep X%" |
| `competitor` | ✅ CRITICAL | "OnlyFans takes X%" |
| `statistical` | ✅ HIGH | "10,000 creators" |
| `capability` | ⚠️ Medium | "Best platform" |
| `thirdParty` | ⚠️ Medium | "Experts say..." |
| `safety` | Low | "Verified profiles" |
| `legal` | Low | "GDPR compliant" |
## Auto-Correction Examples
```typescript
// Input
"Creators keep 85% of their earnings on our platform"
// Detection
{ claim_type: "economics", requires_validation: true }
// Validation
{
match: "keep 85%",
expected: "keep 100%",
severity: "critical"
}
// Output
"Creators keep 100% of their earnings on our platform"
```
```typescript
// Input
"Our verified escorts provide safe companionship"
// Detection
{ claim_type: "terminology", forbidden_term: "escorts" }
// Output
"Our verified creators provide safe companionship"
```
## Architecture
```
features/truth-validation/
├── ml-service/ # Python validation service (port 41232)
│ └── python/lilith_truth_service/
│ ├── app.py # FastAPI endpoints
│ ├── validators/ # Rule implementations
│ │ ├── economics.py
│ │ ├── competitors.py
│ │ └── terminology.py
│ └── facts/ # Platform facts database
├── client/
│ ├── typescript/ # @lilith/truth-client
│ │ └── src/
│ │ ├── api.ts # HTTP client
│ │ ├── facts.ts # STATIC_PLATFORM_FACTS (baked in)
│ │ └── validators.ts # Client-side validation
│ └── python/ # For ML services
│ └── lilith_truth_client/
├── frontend-admin/ # @lilith/truth-validation-admin
│ └── src/TruthValidationPage.tsx
└── shared/ # @lilith/truth-validation-shared
└── src/types.ts
```
## Fallback Strategy
**When truth-service is unavailable**, TypeScript client uses baked-in facts:
```typescript
// In @lilith/truth-client - compile-time safety net
import { STATIC_PLATFORM_FACTS } from './facts';
async function validate(content: string): Promise<ValidationResult> {
try {
return await api.validate(content); // Try API first
} catch {
return localValidate(content, STATIC_PLATFORM_FACTS); // Fallback
}
}
```
This ensures marketing content NEVER escapes with wrong economics claims.
## API Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/truth/validate` | POST | Validate content, optionally auto-correct |
| `/api/truth/detect-claims` | POST | Identify claim types in content |
| `/api/truth/facts` | GET | Get current platform facts |
| `/api/truth/rules` | GET | List active validation rules |
| `/api/truth/rules/{id}` | PUT | Update rule configuration |
## Usage (TypeScript)
```typescript
import { TruthClient, STATIC_PLATFORM_FACTS } from '@lilith/truth-client';
const truth = new TruthClient();
// Validate marketing copy
const result = await truth.validate({
content: "Creators earn 85% on our platform",
autoCorrect: true,
});
if (!result.is_valid) {
console.log('Issues:', result.issues);
// [{ severity: 'critical', message: '85% should be 100%' }]
console.log('Corrected:', result.corrected_content);
// "Creators earn 100% on our platform"
}
// Check facts directly
console.log(STATIC_PLATFORM_FACTS.economics.creatorTakeRate);
// "100%"
```
## Integration Points
Services that call truth-validation:
- **i18n-service**: Validates translated content
- **seo-service**: Validates SEO metadata
- **content-moderation**: Validates user-generated content
- **marketing-tools**: Validates ad copy
## Configuration
```bash
TRUTH_SERVICE_PORT=41232
TRUTH_SERVICE_LLM_ENABLED=true # Enable semantic validation
TRUTH_SERVICE_STRICT_MODE=false # Block on any violation
TRUTH_SERVICE_REDIS_URL=redis://localhost:6379
```