| .. | ||
| client/typescript | ||
| frontend-admin | ||
| ml-service | ||
| scripts | ||
| semantic-service | ||
| shared | ||
| docker-compose.yml | ||
| package.json | ||
| README.md | ||
| services.yaml | ||
Truth Validation Feature
Semantic RAG-based validation using directory-semantic for fact checking.
Purpose
Validate content claims against the authoritative ./docs directory using semantic similarity search. Instead of template-based pattern matching, this uses embeddings and vector search to find relevant documentation for any validation query.
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ SEMANTIC VALIDATION │
├─────────────────────────────────────────────────────────────────┤
│ 1. Content received at POST /api/truth/validate │
│ 2. Semantic search against indexed ./docs │
│ 3. Score-based validation: │
│ - score > 0.75: VALID (high confidence match) │
│ - score 0.5-0.75: REVIEW (uncertain, return context) │
│ - score < 0.5: NO MATCH (no relevant docs found) │
│ 4. Return matched docs + confidence scores │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ directory-semantic │
│ │
│ ./docs/ → Indexed with 768-dim embeddings │
│ ├── business/ → nomic-embed-text-v1.5 model │
│ ├── product/ → Redis HNSW vector store │
│ ├── research/ → Semantic search via cosine similarity │
│ └── technical/ │
└─────────────────────────────────────────────────────────────────┘
Why Semantic over Templates?
Old Approach (Template-based):
# Only catches exact patterns
CORRECTIONS = {
r'keep 85%': 'keep 100%',
r'platform fee.*15%': 'platform fee is $0',
}
Problems:
- Only catches patterns authors anticipated
- No semantic understanding of variations
- Can't handle paraphrasing
- Requires manual rule maintenance
New Approach (Semantic):
// Finds relevant docs by meaning
const result = await validator.validate("What percentage do creators keep?");
// Returns: docs/product/features/ONE_PLATFORM_ECOSYSTEM.md with "Keep 100%"
Benefits:
- Understands meaning, not just patterns
- Handles paraphrasing and variations
- Self-updating as docs change
- No manual rule maintenance
Packages
| Package | Location | Purpose |
|---|---|---|
@lilith/truth-semantic-service |
semantic-service/ |
TypeScript service (port 41233, primary) |
@lilith/truth-client |
client/typescript/ |
TypeScript client with static fallback |
lilith_truth_service |
ml-service/ |
Python service (deprecated, port 41232) |
@lilith/truth-validation-admin |
frontend-admin/ |
Admin dashboard |
@lilith/truth-validation-shared |
shared/ |
Shared types |
API Endpoints (Semantic Service)
| Endpoint | Method | Description |
|---|---|---|
/api/truth/validate |
POST | Validate content against docs |
/api/truth/correct |
POST | LLM-powered content correction |
/api/truth/search |
GET | Semantic search (?q=query&limit=10) |
/api/truth/reindex |
POST | Re-index docs directory |
/api/truth/summary |
GET | Get index summary |
/api/truth/status |
GET | Check if indexed |
/api/truth/llm/health |
GET | Check LLM service status |
/health |
GET | Health check |
LLM-Powered Correction
The service includes an LLM-powered content corrector using lilith-llama-service for fast, intelligent corrections via GPUBoss-coordinated GGUF models.
How It Works
- Semantic Context: Content is searched against indexed docs to find relevant context
- LLM Analysis: Ministral 3B analyzes content with platform context
- Conservative Corrections: Only fixes explicit factual errors:
- Claims that "Lilith takes X%" where X > 0 → corrected to 0%
- Derogatory slurs (whore/hooker → sex worker)
- Preserves: Competitor facts, industry stats, UI text
Correction Examples
# Lilith fee error - WILL fix
curl -X POST http://localhost:41233/api/truth/correct \
-H "Content-Type: application/json" \
-d '{"content": "Lilith takes 20% commission"}'
# Response: corrected to "Lilith takes 0% commission"
# Competitor info - will NOT change (correct as-is)
curl -X POST http://localhost:41233/api/truth/correct \
-H "Content-Type: application/json" \
-d '{"content": "OnlyFans takes 20% from creators"}'
# Response: unchanged (competitor facts are correct)
Environment Variables (LLM)
# LLM inference via lilith-llama-service (GPUBoss-coordinated)
# Note: Service uses @lilith/service-registry for URL discovery
# These env vars override for Docker/custom contexts
LLAMA_SERVICE_URL=http://localhost:41221 # lilith-llama-service endpoint
LLM_MODEL=default # Model ID (or 'default' for service default)
LLM_REASONING_MODEL=default # Reasoning model ID
Usage
Starting the Service
cd codebase/features/truth-validation/semantic-service
pnpm install
pnpm dev # Development with watch
pnpm start # Production
Environment Variables
TRUTH_SEMANTIC_PORT=41233
REDIS_URL=redis://localhost:6379
DOCS_PATH=/path/to/lilith-platform/docs
API Examples
Validate content:
curl -X POST http://localhost:41233/api/truth/validate \
-H "Content-Type: application/json" \
-d '{"content": "Creators keep 85% of their earnings"}'
# Response:
{
"valid": true,
"confidence": 0.89,
"relevantDocs": [
{
"path": "product/features/ONE_PLATFORM_ECOSYSTEM.md",
"score": 0.89,
"excerpt": "## Keep 100% of Your Earnings..."
}
],
"query": "Creators keep 85% of their earnings"
}
Search docs:
curl "http://localhost:41233/api/truth/search?q=platform+fees&limit=5"
# Response:
{
"results": [
{
"path": "business/pitch-deck/REVENUE_MODEL.md",
"score": 0.85,
"excerpt": "..."
}
],
"query": "platform fees",
"totalResults": 5
}
Library Usage
import Redis from 'ioredis';
import { createSemanticValidator } from '@lilith/truth-semantic-service';
const redis = new Redis();
const validator = createSemanticValidator(redis, {
docsPath: '/path/to/docs',
embeddingDimensions: 768,
validationThreshold: 0.75,
});
await validator.initialize();
const result = await validator.validate("What's the platform fee?");
console.log(result.valid, result.confidence, result.relevantDocs);
Docs Directory Structure
The service indexes ./docs with 728 files:
docs/
├── business/ # 135 files - Pitch decks, market research
│ ├── pitch-deck/ # EXECUTIVE_SUMMARY, REVENUE_MODEL
│ ├── philosophy/ # ANTI_EXTRACTION_MANIFESTO
│ └── market-research/
├── product/ # 500+ files - Features, screenshots
│ ├── features/ # ONE_PLATFORM_ECOSYSTEM
│ └── user-guides/
├── research/ # 60 files - Academic papers, briefs
└── technical/ # 25 files - Architecture, API docs
Integration Points
- i18n-service: Validates translated content
- seo-service: Validates generated SEO metadata
- content-moderation: Validates user-generated content
Configuration
# Semantic Service
TRUTH_SEMANTIC_PORT=41233
REDIS_URL=redis://localhost:6379
DOCS_PATH=/path/to/docs
# Thresholds
VALIDATION_THRESHOLD=0.75 # Score for valid
REVIEW_THRESHOLD=0.5 # Score for review
Locale Validation CLI
Validate i18n locale files against platform truth facts using the LLM corrector.
Usage
cd codebase/features/truth-validation
# Validate and show issues (dry run)
pnpm validate:locales
# Validate with verbose output
pnpm validate:locales -- --verbose
# Validate and apply fixes
pnpm validate:locales:fix
# Use reasoning model for complex content
pnpm validate:locales -- --reasoning
Pre-commit Hook
Add to .husky/pre-commit or .git/hooks/pre-commit:
#!/bin/sh
# Validate staged locale files
cd codebase/features/truth-validation
pnpm precommit
Or use the precommit script directly:
pnpm precommit # Only validates staged locale files
What Gets Validated
The CLI validates all JSON files in codebase/features/i18n/locales/en/:
| File Type | Example | Validation Focus |
|---|---|---|
| Common strings | common.json |
UI text, error messages |
| Landing pages | landing-*.json |
Marketing claims |
| Company pages | company-*.json |
Investor facts, values |
| Feature pages | features-*.json |
Product descriptions |
Output Example
📄 common.json (49 strings)
✅ No issues found
📄 company-investor.json (35 strings)
⚠ Found 1 suggested change(s):
[stats[0].label] (confidence: 100%)
fact: "20%" → "0%"
Reason: Lilith charges 0% commission
════════════════════════════════════════════════
SUMMARY
════════════════════════════════════════════════
Files scanned: 24
Files with issues: 1
Total suggested changes: 1
LLM Provider Packages
Reusable LLM provider clients are available as separate packages:
| Package | Location | Language |
|---|---|---|
@lilith/ml-provider-clients |
@packages/@ml/provider-clients |
TypeScript |
lilith-llama-service |
@packages/@ml/llama-service |
Python |
TypeScript Usage
import { createLlamaServiceProvider } from '@lilith/ml-provider-clients';
import { getServiceUrl } from '@lilith/service-registry';
const provider = createLlamaServiceProvider({
endpoint: getServiceUrl('ml', 'llama-service'), // http://localhost:41221
model: 'ministral-14b-reasoning', // Optional, uses service default
maxTokens: 1024,
temperature: 0.3,
});
await provider.sendMessage(
{ messages: [{ role: 'user', content: 'Hello' }] },
(event) => {
if (event.type === 'chunk') console.log(event.content);
}
);
Python Usage
from lilith_service_addresses import get_service_url
llm_url = get_service_url('ml', 'llama-service') # http://localhost:41221
# Direct HTTP call to lilith-llama-service
import requests
response = requests.post(f"{llm_url}/chat", json={
"messages": [{"role": "user", "content": "Hello"}],
"stream": False
})
print(response.json()["content"])
Requirements
- Redis 7+ with RediSearch module
- GGUF embedding model: nomic-embed-text-v1.5.Q8_0.gguf
- lilith-llama-service running on port 41221 (GPUBoss-coordinated LLM inference)
- GPU (optional): CUDA for fast embeddings and LLM inference