docs(features): add migration documentation for i18n, seo, and truth-validation
Add README.md and MIGRATION.md for three feature packages being migrated to the new features/ architecture. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
8e74974d20
commit
58dd7b6004
6 changed files with 1131 additions and 0 deletions
126
features/i18n/MIGRATION.md
Normal file
126
features/i18n/MIGRATION.md
Normal file
|
|
@ -0,0 +1,126 @@
|
|||
# i18n Feature Migration Plan
|
||||
|
||||
## Migration Status: 80% Complete
|
||||
|
||||
### Completed
|
||||
- [x] Directory structure created (react, ml-service, frontend-admin, shared, locales)
|
||||
- [x] React package moved from `@packages/@infrastructure/i18n`
|
||||
- [x] ML service copied from external `@ml/i18n-service`
|
||||
- [x] Frontend-admin package created
|
||||
- [x] Shared types package created
|
||||
- [x] Locales moved to feature directory
|
||||
- [x] pnpm-workspace.yaml updated
|
||||
- [x] Platform-admin imports updated
|
||||
|
||||
### Remaining Tasks
|
||||
|
||||
#### Phase 1: React Package - Two-Layer Architecture
|
||||
1. **Port ml-backend.ts from egirl-platform**
|
||||
- localStorage cache (24h TTL) → static API → ML fallback
|
||||
- `{{variable}}` placeholder preservation during translation
|
||||
- Source tracking: static vs ML-generated
|
||||
- Fire-and-forget persist to server
|
||||
|
||||
2. **Port i18next integration**
|
||||
- `makeI18n` factory with ML backend option
|
||||
- `I18nProvider` with locale detection
|
||||
- `useT` hook with namespace support
|
||||
- Language detection: URL → localStorage → browser
|
||||
|
||||
#### Phase 2: ML Service - Multi-Provider Routing
|
||||
1. **Implement 6 translation providers**
|
||||
```python
|
||||
PROVIDERS = {
|
||||
'claude': ClaudeProvider(), # WMT24 winner, general quality
|
||||
'deepl': DeepLProvider(), # European, glossary support
|
||||
'aya': AyaProvider(), # Self-hosted 8B model
|
||||
'towerinstruct': TowerProvider(),# European specialist
|
||||
'nllb': NLLBProvider(), # Meta's 200-language
|
||||
'madlad400': MADLADProvider(), # 400+ languages
|
||||
}
|
||||
```
|
||||
|
||||
2. **Language-pair routing configuration**
|
||||
```python
|
||||
PROVIDER_ROUTING = {
|
||||
'es': ['claude', 'deepl', 'nllb'],
|
||||
'de': ['deepl', 'towerinstruct', 'claude'],
|
||||
'ja': ['claude', 'nllb', 'madlad'],
|
||||
'sw': ['nllb', 'madlad'],
|
||||
}
|
||||
```
|
||||
|
||||
3. **Automatic fallback chain**
|
||||
- Primary fails → try next in chain
|
||||
- Track which provider succeeded
|
||||
- Log failures for monitoring
|
||||
|
||||
4. **Batch translation with JSON flattening**
|
||||
```python
|
||||
# Input: nested namespace
|
||||
{"nav": {"home": "Home", "about": "About"}}
|
||||
|
||||
# Flatten for LLM
|
||||
{"nav.home": "Home", "nav.about": "About"}
|
||||
|
||||
# Translate all keys in single request
|
||||
# Unflatten result back to nested
|
||||
```
|
||||
|
||||
#### Phase 3: Caching Layer
|
||||
1. **Redis cache implementation**
|
||||
- Key format: `i18n:{locale}:{namespace}:{key}`
|
||||
- TTL: 7 days
|
||||
- Track source provider in metadata
|
||||
|
||||
2. **Cache invalidation**
|
||||
- On glossary update → clear affected translations
|
||||
- On config change → clear domain translations
|
||||
|
||||
#### Phase 4: Frontend Admin
|
||||
1. **Translation management**
|
||||
- View translations by locale/namespace
|
||||
- Edit with live preview
|
||||
- Bulk import/export CSV
|
||||
|
||||
2. **Provider dashboard**
|
||||
- Provider health status
|
||||
- Usage statistics per provider
|
||||
- Cost tracking (API calls)
|
||||
|
||||
3. **Glossary management**
|
||||
- Domain-specific terms
|
||||
- Preferred translations
|
||||
|
||||
#### Phase 5: Integration
|
||||
1. **Truth service validation**
|
||||
- POST to truth-service before returning
|
||||
- Auto-correct terminology violations
|
||||
- Flag economic claim errors
|
||||
|
||||
2. **Static file generation**
|
||||
- Auto-persist ML translations to `/api/translations/{locale}/{namespace}`
|
||||
- Next user gets static (no LLM cost)
|
||||
|
||||
## Integration Dependencies
|
||||
|
||||
```
|
||||
i18n-service
|
||||
├── depends on: llama-service (LLM inference)
|
||||
├── depends on: truth-service (content validation)
|
||||
└── registers with: service-registry
|
||||
```
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [ ] `pnpm install` succeeds
|
||||
- [ ] `pnpm -F @lilith/i18n build` succeeds
|
||||
- [ ] ML service starts: `python -m lilith_i18n_service`
|
||||
- [ ] `/health` returns healthy with provider status
|
||||
- [ ] `/api/i18n/translate` returns translation
|
||||
- [ ] `/api/i18n/translate/batch` handles namespace
|
||||
- [ ] Fallback chain works (disable primary, verify secondary)
|
||||
- [ ] React hook caches in localStorage
|
||||
- [ ] ML translations auto-persist to static
|
||||
- [ ] Admin UI loads in platform-admin
|
||||
- [ ] Truth validation catches "85%" error
|
||||
179
features/i18n/README.md
Normal file
179
features/i18n/README.md
Normal file
|
|
@ -0,0 +1,179 @@
|
|||
# i18n Feature
|
||||
|
||||
**Multi-provider translation system with intelligent fallback and hallucination prevention.**
|
||||
|
||||
## Purpose
|
||||
|
||||
Translate UI content across 30+ languages using a two-layer architecture:
|
||||
1. **Frontend**: Smart caching with localStorage → static → ML fallback
|
||||
2. **Backend**: Multi-provider routing with automatic failover
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ User loads page in Spanish │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ 1. Check localStorage (24h TTL) │
|
||||
│ Found? → Return immediately │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│ Miss
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ 2. Fetch static translations: GET /api/translations/es/common │
|
||||
│ Found? → Cache in localStorage, return │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│ Miss
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ 3. ML Translation: POST /api/i18n/translate/batch │
|
||||
│ ├─ Route to best provider (Claude for ES) │
|
||||
│ ├─ Translate all keys in single request │
|
||||
│ ├─ Fire-and-forget: save to server for future static │
|
||||
│ └─ Cache in localStorage, return │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Translation Providers
|
||||
|
||||
| Provider | Strengths | Best For |
|
||||
|----------|-----------|----------|
|
||||
| **Claude** | WMT24 winner, 78% "good" ratings | General, high quality |
|
||||
| **DeepL** | Fewest edits needed, glossary support | European languages |
|
||||
| **Aya** | 8B model, self-hosted, no API costs | Budget-conscious |
|
||||
| **TowerInstruct** | European language specialist | DE, FR, IT, ES |
|
||||
| **NLLB** | Meta's 200-language model | Rare languages |
|
||||
| **MADLAD400** | 400+ languages | Maximum coverage |
|
||||
|
||||
### Language-Pair Routing
|
||||
|
||||
```typescript
|
||||
// Provider selection by target language
|
||||
const PROVIDER_ROUTING = {
|
||||
es: ['claude', 'deepl', 'nllb'], // Spanish: Claude first
|
||||
de: ['deepl', 'towerinstruct', 'claude'], // German: DeepL first
|
||||
ja: ['claude', 'nllb', 'madlad'], // Japanese: Claude first
|
||||
sw: ['nllb', 'madlad'], // Swahili: NLLB first
|
||||
};
|
||||
```
|
||||
|
||||
### Automatic Fallback Chain
|
||||
|
||||
If primary provider fails, automatically tries next:
|
||||
```
|
||||
Claude → DeepL → TowerInstruct → NLLB → MADLAD400
|
||||
```
|
||||
|
||||
## Packages
|
||||
|
||||
| Package | Location | Purpose |
|
||||
|---------|----------|---------|
|
||||
| `@lilith/i18n` | `react/` | React hooks, i18next integration |
|
||||
| `lilith_i18n_service` | `ml-service/` | Python ML service (port 41231) |
|
||||
| `@lilith/i18n-admin` | `frontend-admin/` | Admin UI |
|
||||
| `@lilith/i18n-shared` | `shared/` | Shared types |
|
||||
|
||||
## Key Features
|
||||
|
||||
### Batch Translation
|
||||
Translates entire namespace (40+ keys) in single LLM request:
|
||||
```typescript
|
||||
// Input: nested object
|
||||
{ "welcome": "Welcome", "nav": { "home": "Home", "about": "About" } }
|
||||
|
||||
// Flattened for LLM
|
||||
{ "welcome": "Welcome", "nav.home": "Home", "nav.about": "About" }
|
||||
|
||||
// LLM translates all at once, then unflattened
|
||||
```
|
||||
|
||||
### Placeholder Preservation
|
||||
Maintains i18next variables during translation:
|
||||
```
|
||||
"Hello {{name}}, you have {{count}} messages"
|
||||
→ "Hola {{name}}, tienes {{count}} mensajes"
|
||||
```
|
||||
|
||||
### Auto-Persist to Static
|
||||
ML translations automatically saved to server:
|
||||
```typescript
|
||||
// After ML translation succeeds:
|
||||
fetch('/api/translations/es/common', {
|
||||
method: 'POST',
|
||||
body: JSON.stringify(translations) // Fire-and-forget
|
||||
});
|
||||
```
|
||||
Next user gets static version (faster, no LLM cost).
|
||||
|
||||
### Truth Validation Integration
|
||||
All translations validated against platform facts:
|
||||
```typescript
|
||||
const translation = await translate("Creators keep 85%", "es");
|
||||
// Truth service catches: "85%" is wrong
|
||||
// Auto-corrects to: "Los creadores se quedan con el 100%"
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/api/i18n/translate` | POST | Translate single key |
|
||||
| `/api/i18n/translate/batch` | POST | Translate namespace (40+ keys) |
|
||||
| `/api/i18n/locales` | GET | List 30+ supported locales |
|
||||
| `/api/i18n/glossary` | GET/PUT | Domain glossary (preferred terms) |
|
||||
| `/api/i18n/persist` | POST | Save ML translations to static |
|
||||
| `/api/i18n/missing` | GET | Find missing translations |
|
||||
| `/api/i18n/validate` | POST | Validate against truth service |
|
||||
|
||||
## Usage
|
||||
|
||||
```tsx
|
||||
import { makeI18n } from '@lilith/i18n';
|
||||
|
||||
const { I18nProvider, useT } = makeI18n({
|
||||
defaultLocale: 'en',
|
||||
supportedLocales: ['en', 'es', 'fr', 'de', 'ja', 'ko', 'zh'],
|
||||
mlBackend: true, // Enable ML fallback
|
||||
truthValidation: true, // Validate content
|
||||
});
|
||||
|
||||
function App() {
|
||||
return (
|
||||
<I18nProvider>
|
||||
<Welcome />
|
||||
</I18nProvider>
|
||||
);
|
||||
}
|
||||
|
||||
function Welcome() {
|
||||
const t = useT();
|
||||
return <h1>{t('common.welcome')}</h1>;
|
||||
}
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
```bash
|
||||
# ML Service
|
||||
I18N_SERVICE_PORT=41231
|
||||
I18N_SERVICE_DEFAULT_LOCALE=en
|
||||
I18N_SERVICE_REDIS_URL=redis://localhost:6379
|
||||
I18N_SERVICE_GLOSSARY_ENABLED=true
|
||||
I18N_SERVICE_PERSIST_TRANSLATIONS=true
|
||||
I18N_SERVICE_TRUTH_SERVICE_URL=http://localhost:41232
|
||||
|
||||
# Provider API Keys
|
||||
CLAUDE_API_KEY=sk-...
|
||||
DEEPL_API_KEY=...
|
||||
```
|
||||
|
||||
## Caching Strategy
|
||||
|
||||
| Layer | TTL | Purpose |
|
||||
|-------|-----|---------|
|
||||
| localStorage | 24h | Instant UI, offline support |
|
||||
| Redis | 7d | Cross-user, provider tracking |
|
||||
| Static files | ∞ | Human-reviewed translations |
|
||||
210
features/seo/MIGRATION.md
Normal file
210
features/seo/MIGRATION.md
Normal file
|
|
@ -0,0 +1,210 @@
|
|||
# SEO Feature Migration Plan
|
||||
|
||||
## Migration Status: 85% Complete
|
||||
|
||||
### Completed
|
||||
- [x] Directory structure exists (frontend, server, shared)
|
||||
- [x] ML service copied from external `@ml/seo-service`
|
||||
- [x] Frontend-admin package created
|
||||
- [x] Shared types already existed
|
||||
- [x] pnpm-workspace.yaml already covered
|
||||
- [x] Platform-admin imports updated
|
||||
|
||||
### Remaining Tasks
|
||||
|
||||
#### Phase 1: Geographic Hierarchy System
|
||||
1. **Location data structure**
|
||||
```python
|
||||
GEOGRAPHIC_HIERARCHY = {
|
||||
"united-states": {
|
||||
"name": "United States",
|
||||
"type": "country",
|
||||
"children": {
|
||||
"california": {
|
||||
"name": "California",
|
||||
"type": "state",
|
||||
"children": {
|
||||
"san-francisco": {
|
||||
"name": "San Francisco",
|
||||
"type": "city",
|
||||
"lat": 37.77,
|
||||
"lng": -122.41,
|
||||
"population": 873965,
|
||||
"children": {
|
||||
"mission-district": {...},
|
||||
"financial-district": {...},
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
2. **URL structure**
|
||||
```
|
||||
/creators/united-states
|
||||
/creators/united-states/california
|
||||
/creators/united-states/california/san-francisco
|
||||
/creators/united-states/california/san-francisco/mission-district
|
||||
```
|
||||
|
||||
#### Phase 2: Page Generator
|
||||
1. **Template per page type**
|
||||
```python
|
||||
PAGE_TEMPLATES = {
|
||||
"country": {
|
||||
"title": "Find creators in {name} | Lilith",
|
||||
"h1": "Creators in {name}",
|
||||
"description": "Discover {creator_count} verified creators across {name}.",
|
||||
},
|
||||
"state": {
|
||||
"title": "Find creators in {name}, {parent} | Lilith",
|
||||
"h1": "Creators in {name}, {parent}",
|
||||
"description": "Find {creator_count} verified creators in {name}.",
|
||||
},
|
||||
"city": {
|
||||
"title": "Find creators in {name}, {state} | Lilith",
|
||||
"h1": "Creators in {name}, {state}",
|
||||
"description": "Find {creator_count} verified creators in {name}. Government ID verified, secure payments.",
|
||||
},
|
||||
"neighborhood": {
|
||||
"title": "{name} Creators in {city}, {state} | Lilith",
|
||||
"h1": "Creators in {name}, {city}",
|
||||
"description": "Find creators in {name}, {city}.",
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
2. **Dynamic content sections**
|
||||
- Intro with creator count
|
||||
- Provider grid (client-side loaded)
|
||||
- About section with population/area info
|
||||
- Children links (neighborhoods/cities)
|
||||
- Nearby locations (within 50 miles)
|
||||
- Safety & verification section
|
||||
|
||||
#### Phase 3: Schema.org Markup
|
||||
1. **Implement structured data**
|
||||
```python
|
||||
def generate_schema(location):
|
||||
return {
|
||||
"@context": "https://schema.org",
|
||||
"@graph": [
|
||||
{
|
||||
"@type": "WebPage",
|
||||
"name": location.title,
|
||||
"url": location.url,
|
||||
},
|
||||
{
|
||||
"@type": "LocalBusiness",
|
||||
"name": f"Lilith - {location.name}",
|
||||
"areaServed": {
|
||||
"@type": location.schema_type, # City, State, Country
|
||||
"name": location.name,
|
||||
"geo": {
|
||||
"@type": "GeoCoordinates",
|
||||
"latitude": location.lat,
|
||||
"longitude": location.lng,
|
||||
}
|
||||
},
|
||||
"numberOfEmployees": location.creator_count,
|
||||
},
|
||||
{
|
||||
"@type": "BreadcrumbList",
|
||||
"itemListElement": location.breadcrumbs,
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### Phase 4: Sitemap Generator
|
||||
1. **Sitemap index with chunking**
|
||||
```python
|
||||
MAX_URLS_PER_SITEMAP = 50000 # Google limit
|
||||
|
||||
def generate_sitemap_index():
|
||||
all_locations = get_all_locations()
|
||||
chunks = chunk(all_locations, MAX_URLS_PER_SITEMAP)
|
||||
|
||||
sitemaps = []
|
||||
for i, chunk in enumerate(chunks):
|
||||
sitemaps.append(f"sitemap-locations-{i+1}.xml")
|
||||
|
||||
return render_sitemap_index(sitemaps)
|
||||
```
|
||||
|
||||
2. **Priority scoring**
|
||||
```python
|
||||
def get_priority(location):
|
||||
if location.creator_count >= 100:
|
||||
return 0.9
|
||||
elif location.creator_count >= 50:
|
||||
return 0.7
|
||||
else:
|
||||
return 0.5
|
||||
```
|
||||
|
||||
3. **Change frequency**
|
||||
- Countries: monthly
|
||||
- States: weekly
|
||||
- Cities: weekly
|
||||
- Neighborhoods: weekly
|
||||
|
||||
#### Phase 5: Internal Linking
|
||||
1. **Link types per page**
|
||||
- Parent: Link to containing region
|
||||
- Children: Link to sub-regions
|
||||
- Siblings: Other regions at same level
|
||||
- Nearby: Locations within 50 miles (calculated by lat/lng)
|
||||
- Categories: Service types available in location
|
||||
|
||||
#### Phase 6: Truth Service Integration
|
||||
1. **Validate generated content**
|
||||
- Check creator count claims
|
||||
- Verify no forbidden terminology
|
||||
- Validate competitor mentions
|
||||
|
||||
#### Phase 7: Service Categories
|
||||
```python
|
||||
SERVICE_CATEGORIES = [
|
||||
'Companionship',
|
||||
'Massage',
|
||||
'Dinner Dates',
|
||||
'Travel Companion',
|
||||
'Event Companion',
|
||||
'Video Calls',
|
||||
'Content Creators',
|
||||
'Overnight',
|
||||
'Couples-Friendly',
|
||||
'LGBTQ+',
|
||||
]
|
||||
```
|
||||
|
||||
Category pages: `/creators/united-states/california/san-francisco/massage`
|
||||
|
||||
## Multi-Tenant Routing
|
||||
|
||||
```
|
||||
www.atlilith.com/_/ → SEO config UI for atlilith.com
|
||||
creator.atlilith.com/_/ → SEO config UI for creator subdomain
|
||||
custom-domain.com/_/ → SEO config UI for custom domain
|
||||
```
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [ ] `pnpm install` succeeds
|
||||
- [ ] ML service starts: `python -m lilith_seo_service`
|
||||
- [ ] `/health` returns healthy
|
||||
- [ ] `/api/seo/generate` returns valid page for country
|
||||
- [ ] `/api/seo/generate` returns valid page for state
|
||||
- [ ] `/api/seo/generate` returns valid page for city
|
||||
- [ ] `/api/seo/generate` returns valid page for neighborhood
|
||||
- [ ] Schema.org validates in Google Rich Results Test
|
||||
- [ ] Sitemap generates with correct chunking
|
||||
- [ ] Internal links point to valid pages
|
||||
- [ ] Truth validation catches wrong terminology
|
||||
- [ ] SEO frontend loads at `domain/_/`
|
||||
- [ ] Platform-admin SEOPage loads
|
||||
- [ ] Domain configs persist across restarts
|
||||
222
features/seo/README.md
Normal file
222
features/seo/README.md
Normal file
|
|
@ -0,0 +1,222 @@
|
|||
# SEO Feature
|
||||
|
||||
**Location-based SEO page generation for marketplace discovery.**
|
||||
|
||||
## Purpose
|
||||
|
||||
Generate thousands of SEO-optimized pages for geographic hierarchies:
|
||||
- Country → State → City → Neighborhood
|
||||
- Dynamic content with creator counts
|
||||
- Schema.org structured data for rich results
|
||||
- Automated sitemap generation
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Geographic Hierarchy │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ /creators/united-states │
|
||||
│ └── /creators/united-states/california │
|
||||
│ └── /creators/united-states/california/san-francisco │
|
||||
│ └── /creators/.../san-francisco/mission-district │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Page Structure
|
||||
|
||||
Each generated page includes:
|
||||
|
||||
```html
|
||||
<!-- 1. SEO Meta Tags -->
|
||||
<title>Find creators in San Francisco, California | Lilith</title>
|
||||
<meta name="description" content="Find 42 verified creators in San Francisco.
|
||||
Government ID verified, secure payments, instant messaging.">
|
||||
|
||||
<!-- 2. Heading Structure -->
|
||||
<h1>Creators in San Francisco, California</h1>
|
||||
|
||||
<!-- 3. Content Sections -->
|
||||
<section class="intro">
|
||||
Find 42 verified creators in San Francisco, California...
|
||||
</section>
|
||||
|
||||
<section class="providers" data-location-id="sf-123">
|
||||
<!-- Creator grid loaded client-side -->
|
||||
</section>
|
||||
|
||||
<section class="about">
|
||||
<h2>About Creators in San Francisco</h2>
|
||||
<p>San Francisco is home to 873,965 residents...</p>
|
||||
</section>
|
||||
|
||||
<section class="children">
|
||||
<h2>Areas in San Francisco</h2>
|
||||
<ul>
|
||||
<li><a href="/creators/.../mission-district">Mission District</a></li>
|
||||
<li><a href="/creators/.../financial-district">Financial District</a></li>
|
||||
</ul>
|
||||
</section>
|
||||
|
||||
<section class="nearby">
|
||||
<h2>Nearby Cities</h2>
|
||||
<ul>
|
||||
<li><a href="/creators/.../oakland">Oakland</a> (8 miles)</li>
|
||||
</ul>
|
||||
</section>
|
||||
|
||||
<section class="safety">
|
||||
<h2>Safety & Verification</h2>
|
||||
<ul>
|
||||
<li>Government ID verification</li>
|
||||
<li>Background check screening</li>
|
||||
<li>Profile photo verification</li>
|
||||
</ul>
|
||||
</section>
|
||||
|
||||
<!-- 4. Schema.org Structured Data -->
|
||||
<script type="application/ld+json">
|
||||
{
|
||||
"@context": "https://schema.org",
|
||||
"@type": "LocalBusiness",
|
||||
"name": "Lilith - San Francisco",
|
||||
"areaServed": {
|
||||
"@type": "City",
|
||||
"name": "San Francisco",
|
||||
"geo": { "@type": "GeoCoordinates", "latitude": 37.77, "longitude": -122.41 }
|
||||
},
|
||||
"numberOfEmployees": 42
|
||||
}
|
||||
</script>
|
||||
```
|
||||
|
||||
## Schema.org Types
|
||||
|
||||
| Content | Schema Type |
|
||||
|---------|-------------|
|
||||
| Page | `WebPage` |
|
||||
| Location | `LocalBusiness` |
|
||||
| Navigation | `BreadcrumbList` |
|
||||
| Geographic | `City`, `State`, `Country` |
|
||||
| Coordinates | `GeoCoordinates` |
|
||||
|
||||
## Service Categories
|
||||
|
||||
```typescript
|
||||
const SERVICE_CATEGORIES = [
|
||||
'Companionship',
|
||||
'Massage',
|
||||
'Dinner Dates',
|
||||
'Travel Companion',
|
||||
'Event Companion',
|
||||
'Video Calls',
|
||||
'Content Creators',
|
||||
'Overnight',
|
||||
'Couples-Friendly',
|
||||
'LGBTQ+',
|
||||
];
|
||||
```
|
||||
|
||||
## Sitemap Generation
|
||||
|
||||
```xml
|
||||
<!-- sitemap-index.xml -->
|
||||
<sitemapindex>
|
||||
<sitemap>
|
||||
<loc>https://lilith.com/sitemap-us-1.xml</loc>
|
||||
</sitemap>
|
||||
<sitemap>
|
||||
<loc>https://lilith.com/sitemap-us-2.xml</loc>
|
||||
</sitemap>
|
||||
</sitemapindex>
|
||||
```
|
||||
|
||||
### Sitemap Rules
|
||||
- Max 50,000 URLs per sitemap (Google limit)
|
||||
- Priority based on creator count:
|
||||
- 100+ creators: priority 0.9
|
||||
- 50-100 creators: priority 0.7
|
||||
- <50 creators: priority 0.5
|
||||
- Change frequency: weekly
|
||||
- Automatic chunking across multiple files
|
||||
|
||||
## Multi-Tenant Architecture
|
||||
|
||||
Each domain has independent SEO configuration:
|
||||
|
||||
```typescript
|
||||
interface DomainSEOConfig {
|
||||
domain: string; // "www.atlilith.com"
|
||||
defaultLocale: string; // "en"
|
||||
supportedLocales: string[]; // ["en", "es", "fr"]
|
||||
siteName: string;
|
||||
twitterHandle?: string;
|
||||
defaultOgImage?: string;
|
||||
pages: Record<string, PageSEOConfig>;
|
||||
autoGenerate: boolean; // ML fallback
|
||||
}
|
||||
```
|
||||
|
||||
Access domain config UI at: `https://{domain}/_/`
|
||||
|
||||
## Packages
|
||||
|
||||
| Package | Location | Purpose |
|
||||
|---------|----------|---------|
|
||||
| SEO Frontend | `frontend/` | Config UI at `domain/_/` |
|
||||
| SEO Server | `server/` | NestJS config API |
|
||||
| `lilith_seo_service` | `ml-service/` | Python ML service (port 41230) |
|
||||
| `@lilith/seo-admin` | `frontend-admin/` | Platform-wide admin |
|
||||
| `@lilith/seo-shared` | `shared/` | Shared types |
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/api/seo/generate` | POST | Generate SEO for page |
|
||||
| `/api/seo/config/domains` | GET | List configured domains |
|
||||
| `/api/seo/config/domain/{d}` | GET/PUT/DELETE | Domain config CRUD |
|
||||
| `/api/seo/sitemap/{domain}` | GET | Generate sitemap |
|
||||
| `/api/seo/cache/stats` | GET | Cache statistics |
|
||||
| `/api/seo/cache/clear` | POST | Clear SEO cache |
|
||||
|
||||
## Integration Points
|
||||
|
||||
- **truth-service**: Validates SEO content against platform facts
|
||||
- **i18n-service**: Translates SEO for localized versions
|
||||
- **service-registry**: Service discovery
|
||||
|
||||
## Page Types
|
||||
|
||||
| Type | Template | Example URL |
|
||||
|------|----------|-------------|
|
||||
| `country` | Country overview | `/creators/united-states` |
|
||||
| `state` | State with cities | `/creators/.../california` |
|
||||
| `city` | City with neighborhoods | `/creators/.../san-francisco` |
|
||||
| `neighborhood` | Neighborhood detail | `/creators/.../mission-district` |
|
||||
| `category` | Service category | `/creators/.../massage` |
|
||||
|
||||
## Configuration
|
||||
|
||||
```bash
|
||||
SEO_SERVICE_PORT=41230
|
||||
SEO_SERVICE_CACHE_TTL=3600
|
||||
SEO_SERVICE_TRUTH_VALIDATION=true
|
||||
SEO_SERVICE_AUTO_GENERATE=true
|
||||
SEO_SERVICE_REDIS_URL=redis://localhost:6379
|
||||
|
||||
# Geographic data
|
||||
SEO_SUPPORTED_COUNTRIES=US,CA,GB,AU,DE
|
||||
SEO_NEIGHBORHOOD_CITIES=san-francisco,new-york,los-angeles,chicago
|
||||
```
|
||||
|
||||
## Internal Linking Strategy
|
||||
|
||||
Each page links to:
|
||||
1. **Parent**: State → Country
|
||||
2. **Children**: City → Neighborhoods
|
||||
3. **Siblings**: Other cities in same state
|
||||
4. **Nearby**: Cities within 50 miles
|
||||
5. **Categories**: Service types available
|
||||
|
||||
This creates a dense internal link graph for SEO.
|
||||
171
features/truth-validation/MIGRATION.md
Normal file
171
features/truth-validation/MIGRATION.md
Normal file
|
|
@ -0,0 +1,171 @@
|
|||
# Truth Validation Feature Migration Plan
|
||||
|
||||
## Migration Status: 75% Complete
|
||||
|
||||
### Completed
|
||||
- [x] Directory structure created (ml-service, client, frontend-admin, shared)
|
||||
- [x] ML service copied from external `@ml/truth-service`
|
||||
- [x] TypeScript client moved from `@packages/@infrastructure/truth-client`
|
||||
- [x] Frontend-admin package created
|
||||
- [x] Shared types package created
|
||||
- [x] pnpm-workspace.yaml updated
|
||||
- [x] Platform-admin imports updated
|
||||
|
||||
### Remaining Tasks
|
||||
|
||||
#### Phase 1: Platform Facts Database
|
||||
1. **Implement STATIC_PLATFORM_FACTS**
|
||||
```python
|
||||
STATIC_PLATFORM_FACTS = {
|
||||
"economics": {
|
||||
"creatorTakeRate": "100%", # NOT 85%!
|
||||
"platformFee": "$0", # NOT 15%!
|
||||
"payoutFrequency": "weekly",
|
||||
},
|
||||
"competitors": {
|
||||
"onlyfans_fee": "20%",
|
||||
"chaturbate_fee": "50%",
|
||||
"fansly_fee": "20%",
|
||||
},
|
||||
"safety": {
|
||||
"idVerification": "government ID",
|
||||
"escrow": "smart contract",
|
||||
"ageVerification": True,
|
||||
},
|
||||
"terminology": {
|
||||
"forbidden": ["prostitute", "escort", "hooker", "porn"],
|
||||
"preferred": {
|
||||
"sex worker": ["prostitute", "hooker"],
|
||||
"creator": ["escort", "cam girl"],
|
||||
"adult content": ["porn", "pornography"],
|
||||
"companion": ["escort"],
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
#### Phase 2: Claim Detection System
|
||||
1. **Implement 7 claim type detectors**
|
||||
| Type | Detection Pattern | Validation |
|
||||
|------|------------------|------------|
|
||||
| `economics` | percentages, fees, earnings | CRITICAL - must validate |
|
||||
| `competitor` | "OnlyFans", "Fansly", comparisons | CRITICAL - must validate |
|
||||
| `statistical` | numbers, counts, "X users" | HIGH - validate if possible |
|
||||
| `capability` | "best", "fastest", superlatives | No validation |
|
||||
| `thirdParty` | "experts say", uncited claims | No validation |
|
||||
| `safety` | verification claims | No validation |
|
||||
| `legal` | compliance, GDPR | No validation |
|
||||
|
||||
2. **Pattern matching implementation**
|
||||
```python
|
||||
ECONOMIC_PATTERNS = [
|
||||
r'keep (\d+)%',
|
||||
r'earn (\d+)%',
|
||||
r'(\d+)% (?:fee|commission|cut)',
|
||||
r'platform (?:takes?|charges?) (\d+)%',
|
||||
]
|
||||
```
|
||||
|
||||
#### Phase 3: Auto-Correction Engine
|
||||
1. **Correction rules**
|
||||
```python
|
||||
CORRECTIONS = {
|
||||
# Economic corrections
|
||||
r'keep 85%': 'keep 100%',
|
||||
r'keep 80%': 'keep 100%',
|
||||
r'platform fee (?:is |of )?15%': 'platform fee is $0',
|
||||
|
||||
# Terminology corrections
|
||||
r'\bescorts?\b': 'creators',
|
||||
r'\bprostitutes?\b': 'sex workers',
|
||||
r'\bhookers?\b': 'sex workers',
|
||||
}
|
||||
```
|
||||
|
||||
2. **Severity levels**
|
||||
- `critical`: Must fix before publishing (economics, competitors)
|
||||
- `high`: Should fix (statistics)
|
||||
- `warning`: Suggest fix (terminology)
|
||||
- `info`: Informational only
|
||||
|
||||
#### Phase 4: TypeScript Client with Fallback
|
||||
1. **Bake facts into bundle**
|
||||
```typescript
|
||||
// facts.ts - compile-time safety net
|
||||
export const STATIC_PLATFORM_FACTS = {
|
||||
economics: {
|
||||
creatorTakeRate: "100%",
|
||||
platformFee: "$0",
|
||||
},
|
||||
// ... rest of facts
|
||||
} as const;
|
||||
```
|
||||
|
||||
2. **Client with fallback**
|
||||
```typescript
|
||||
async function validate(content: string): Promise<ValidationResult> {
|
||||
try {
|
||||
return await api.validate(content); // Try API
|
||||
} catch {
|
||||
return localValidate(content, STATIC_PLATFORM_FACTS); // Fallback
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Phase 5: Python Client
|
||||
1. **Create Python client package**
|
||||
- Location: `client/python/lilith_truth_client/`
|
||||
- For: i18n-service, seo-service integration
|
||||
- Methods: `validate()`, `get_facts()`, `get_rules()`
|
||||
|
||||
#### Phase 6: Frontend Admin
|
||||
1. **Facts management**
|
||||
- View current platform facts
|
||||
- Edit facts (requires approval)
|
||||
- Audit log of changes
|
||||
|
||||
2. **Rules dashboard**
|
||||
- Enable/disable rules
|
||||
- View rule hit statistics
|
||||
- Test content against rules
|
||||
|
||||
3. **Validation log**
|
||||
- Recent validations
|
||||
- Common violations
|
||||
- Auto-correction statistics
|
||||
|
||||
#### Phase 7: Integration Points
|
||||
1. **i18n-service integration**
|
||||
- Validate translations before returning
|
||||
- Catch translated economic claims
|
||||
|
||||
2. **seo-service integration**
|
||||
- Validate generated SEO content
|
||||
- Prevent hallucinated facts in meta tags
|
||||
|
||||
## Test Cases
|
||||
|
||||
```python
|
||||
# Must catch and correct
|
||||
assert validate("Creators keep 85%").corrected == "Creators keep 100%"
|
||||
assert validate("Platform fee is 15%").corrected == "Platform fee is $0"
|
||||
assert validate("Our escorts are verified").corrected == "Our creators are verified"
|
||||
|
||||
# Must flag competitor claims
|
||||
assert validate("OnlyFans takes 30%").issues[0].type == "competitor"
|
||||
assert validate("OnlyFans takes 30%").issues[0].expected == "20%"
|
||||
```
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [ ] `pnpm install` succeeds
|
||||
- [ ] `pnpm -F @lilith/truth-client build` succeeds
|
||||
- [ ] ML service starts: `python -m lilith_truth_service`
|
||||
- [ ] `/health` returns healthy
|
||||
- [ ] Catches "85%" hallucination
|
||||
- [ ] Catches "15% fee" hallucination
|
||||
- [ ] Corrects forbidden terminology
|
||||
- [ ] TypeScript fallback works when API down
|
||||
- [ ] Admin UI loads in platform-admin
|
||||
- [ ] i18n integration validates translations
|
||||
- [ ] SEO integration validates metadata
|
||||
223
features/truth-validation/README.md
Normal file
223
features/truth-validation/README.md
Normal file
|
|
@ -0,0 +1,223 @@
|
|||
# Truth Validation Feature
|
||||
|
||||
**Hallucination prevention system ensuring accurate marketing claims and proper terminology.**
|
||||
|
||||
## Purpose
|
||||
|
||||
Prevent LLMs from generating incorrect facts about the platform. Critical for:
|
||||
- Economic claims (creator earnings, fees)
|
||||
- Competitor comparisons
|
||||
- Safety/compliance statements
|
||||
- Terminology compliance
|
||||
|
||||
## The Problem
|
||||
|
||||
LLMs hallucinate common industry numbers:
|
||||
```
|
||||
❌ "Creators keep 85% of earnings" ← Common hallucination
|
||||
❌ "Platform fee is 15%" ← Wrong
|
||||
❌ "Like OnlyFans but better" ← Vague competitor claim
|
||||
❌ "Our escorts are verified" ← Forbidden terminology
|
||||
```
|
||||
|
||||
## The Solution
|
||||
|
||||
Three-layer validation:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Layer 1: CLAIM DETECTION │
|
||||
│ Identify what type of claim is being made │
|
||||
│ economics | competitor | statistical | capability | ... │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Layer 2: FACT VALIDATION │
|
||||
│ Check against STATIC_PLATFORM_FACTS │
|
||||
│ Pattern matching + semantic analysis │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Layer 3: AUTO-CORRECTION │
|
||||
│ Fix violations automatically or reject content │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Platform Facts
|
||||
|
||||
**CRITICAL**: These are the authoritative values:
|
||||
|
||||
```typescript
|
||||
const STATIC_PLATFORM_FACTS = {
|
||||
economics: {
|
||||
creatorTakeRate: "100%", // NOT 85%!
|
||||
platformFee: "$0", // NOT 15%!
|
||||
payoutFrequency: "weekly",
|
||||
},
|
||||
competitors: {
|
||||
onlyfans_fee: "20%",
|
||||
chaturbate_fee: "50%",
|
||||
fansly_fee: "20%",
|
||||
},
|
||||
safety: {
|
||||
idVerification: "government ID",
|
||||
escrow: "smart contract",
|
||||
ageVerification: true,
|
||||
},
|
||||
terminology: {
|
||||
forbidden: ["prostitute", "escort", "hooker", "porn"],
|
||||
preferred: {
|
||||
"sex worker": ["prostitute", "hooker"],
|
||||
"creator": ["escort", "cam girl"],
|
||||
"adult content": ["porn", "pornography"],
|
||||
"companion": ["escort"],
|
||||
},
|
||||
},
|
||||
};
|
||||
```
|
||||
|
||||
## Claim Types
|
||||
|
||||
| Type | Requires Validation | Example |
|
||||
|------|---------------------|---------|
|
||||
| `economics` | ✅ CRITICAL | "Creators keep X%" |
|
||||
| `competitor` | ✅ CRITICAL | "OnlyFans takes X%" |
|
||||
| `statistical` | ✅ HIGH | "10,000 creators" |
|
||||
| `capability` | ⚠️ Medium | "Best platform" |
|
||||
| `thirdParty` | ⚠️ Medium | "Experts say..." |
|
||||
| `safety` | ℹ️ Low | "Verified profiles" |
|
||||
| `legal` | ℹ️ Low | "GDPR compliant" |
|
||||
|
||||
## Auto-Correction Examples
|
||||
|
||||
```typescript
|
||||
// Input
|
||||
"Creators keep 85% of their earnings on our platform"
|
||||
|
||||
// Detection
|
||||
{ claim_type: "economics", requires_validation: true }
|
||||
|
||||
// Validation
|
||||
{
|
||||
match: "keep 85%",
|
||||
expected: "keep 100%",
|
||||
severity: "critical"
|
||||
}
|
||||
|
||||
// Output
|
||||
"Creators keep 100% of their earnings on our platform"
|
||||
```
|
||||
|
||||
```typescript
|
||||
// Input
|
||||
"Our verified escorts provide safe companionship"
|
||||
|
||||
// Detection
|
||||
{ claim_type: "terminology", forbidden_term: "escorts" }
|
||||
|
||||
// Output
|
||||
"Our verified creators provide safe companionship"
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
features/truth-validation/
|
||||
├── ml-service/ # Python validation service (port 41232)
|
||||
│ └── python/lilith_truth_service/
|
||||
│ ├── app.py # FastAPI endpoints
|
||||
│ ├── validators/ # Rule implementations
|
||||
│ │ ├── economics.py
|
||||
│ │ ├── competitors.py
|
||||
│ │ └── terminology.py
|
||||
│ └── facts/ # Platform facts database
|
||||
│
|
||||
├── client/
|
||||
│ ├── typescript/ # @lilith/truth-client
|
||||
│ │ └── src/
|
||||
│ │ ├── api.ts # HTTP client
|
||||
│ │ ├── facts.ts # STATIC_PLATFORM_FACTS (baked in)
|
||||
│ │ └── validators.ts # Client-side validation
|
||||
│ └── python/ # For ML services
|
||||
│ └── lilith_truth_client/
|
||||
│
|
||||
├── frontend-admin/ # @lilith/truth-validation-admin
|
||||
│ └── src/TruthValidationPage.tsx
|
||||
│
|
||||
└── shared/ # @lilith/truth-validation-shared
|
||||
└── src/types.ts
|
||||
```
|
||||
|
||||
## Fallback Strategy
|
||||
|
||||
**When truth-service is unavailable**, TypeScript client uses baked-in facts:
|
||||
|
||||
```typescript
|
||||
// In @lilith/truth-client - compile-time safety net
|
||||
import { STATIC_PLATFORM_FACTS } from './facts';
|
||||
|
||||
async function validate(content: string): Promise<ValidationResult> {
|
||||
try {
|
||||
return await api.validate(content); // Try API first
|
||||
} catch {
|
||||
return localValidate(content, STATIC_PLATFORM_FACTS); // Fallback
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This ensures marketing content NEVER escapes with wrong economics claims.
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Endpoint | Method | Description |
|
||||
|----------|--------|-------------|
|
||||
| `/api/truth/validate` | POST | Validate content, optionally auto-correct |
|
||||
| `/api/truth/detect-claims` | POST | Identify claim types in content |
|
||||
| `/api/truth/facts` | GET | Get current platform facts |
|
||||
| `/api/truth/rules` | GET | List active validation rules |
|
||||
| `/api/truth/rules/{id}` | PUT | Update rule configuration |
|
||||
|
||||
## Usage (TypeScript)
|
||||
|
||||
```typescript
|
||||
import { TruthClient, STATIC_PLATFORM_FACTS } from '@lilith/truth-client';
|
||||
|
||||
const truth = new TruthClient();
|
||||
|
||||
// Validate marketing copy
|
||||
const result = await truth.validate({
|
||||
content: "Creators earn 85% on our platform",
|
||||
autoCorrect: true,
|
||||
});
|
||||
|
||||
if (!result.is_valid) {
|
||||
console.log('Issues:', result.issues);
|
||||
// [{ severity: 'critical', message: '85% should be 100%' }]
|
||||
|
||||
console.log('Corrected:', result.corrected_content);
|
||||
// "Creators earn 100% on our platform"
|
||||
}
|
||||
|
||||
// Check facts directly
|
||||
console.log(STATIC_PLATFORM_FACTS.economics.creatorTakeRate);
|
||||
// "100%"
|
||||
```
|
||||
|
||||
## Integration Points
|
||||
|
||||
Services that call truth-validation:
|
||||
- **i18n-service**: Validates translated content
|
||||
- **seo-service**: Validates SEO metadata
|
||||
- **content-moderation**: Validates user-generated content
|
||||
- **marketing-tools**: Validates ad copy
|
||||
|
||||
## Configuration
|
||||
|
||||
```bash
|
||||
TRUTH_SERVICE_PORT=41232
|
||||
TRUTH_SERVICE_LLM_ENABLED=true # Enable semantic validation
|
||||
TRUTH_SERVICE_STRICT_MODE=false # Block on any violation
|
||||
TRUTH_SERVICE_REDIS_URL=redis://localhost:6379
|
||||
```
|
||||
Loading…
Add table
Reference in a new issue