platform-codebase/features/truth-validation
2026-01-10 00:48:10 -08:00
..
.cache 🔧 Update locale validation cache 2026-01-02 18:10:22 -08:00
client/typescript feat(eslint): complete ESLint v9 migration across remaining 10 packages 2026-01-04 06:39:43 -08:00
frontend-admin chore(shared): 🔧 Hello! I'm a mock assistant responding to your message. 2026-01-05 12:19:24 -08:00
ml-service feat(features/seo/ml-service/python/lilith_seo_service/config.py): update SEO service configuration with LLM backend and truth service integration 2026-01-09 23:23:06 -08:00
scripts fix(main): 🐛 resolve missing environment variables in configuration files 2026-01-09 23:23:05 -08:00
semantic-service fix(tests): 🐛 resolve health check status in LLMCorrector tests 2026-01-10 00:42:34 -08:00
shared fix(frontend): update legal review page logic for structured suggestions 2026-01-04 20:06:47 -08:00
docker-compose.yml
MIGRATION.md
package.json
README.md fix(codebase): 🐛 resolve linting issues in README.md 2026-01-10 00:48:10 -08:00
services.yaml fix(main): 🐛 resolve missing environment variables in configuration files 2026-01-09 23:23:05 -08:00

Truth Validation Feature

Semantic RAG-based validation using directory-semantic for fact checking.

Purpose

Validate content claims against the authoritative ./docs directory using semantic similarity search. Instead of template-based pattern matching, this uses embeddings and vector search to find relevant documentation for any validation query.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      SEMANTIC VALIDATION                        │
├─────────────────────────────────────────────────────────────────┤
│  1. Content received at POST /api/truth/validate                │
│  2. Semantic search against indexed ./docs                      │
│  3. Score-based validation:                                     │
│     - score > 0.75: VALID (high confidence match)               │
│     - score 0.5-0.75: REVIEW (uncertain, return context)        │
│     - score < 0.5: NO MATCH (no relevant docs found)            │
│  4. Return matched docs + confidence scores                     │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                   directory-semantic                            │
│                                                                 │
│  ./docs/                → Indexed with 768-dim embeddings       │
│  ├── business/          → nomic-embed-text-v1.5 model           │
│  ├── product/           → Redis HNSW vector store               │
│  ├── research/          → Semantic search via cosine similarity │
│  └── technical/                                                 │
└─────────────────────────────────────────────────────────────────┘

Why Semantic over Templates?

Old Approach (Template-based):

# Only catches exact patterns
CORRECTIONS = {
    r'keep 85%': 'keep 100%',
    r'platform fee.*15%': 'platform fee is $0',
}

Problems:

  • Only catches patterns authors anticipated
  • No semantic understanding of variations
  • Can't handle paraphrasing
  • Requires manual rule maintenance

New Approach (Semantic):

// Finds relevant docs by meaning
const result = await validator.validate("What percentage do creators keep?");
// Returns: docs/product/features/ONE_PLATFORM_ECOSYSTEM.md with "Keep 100%"

Benefits:

  • Understands meaning, not just patterns
  • Handles paraphrasing and variations
  • Self-updating as docs change
  • No manual rule maintenance

Packages

Package Location Purpose
@lilith/truth-semantic-service semantic-service/ TypeScript service (port 41233, primary)
@lilith/truth-client client/typescript/ TypeScript client with static fallback
lilith_truth_service ml-service/ Python service (deprecated, port 41232)
@lilith/truth-validation-admin frontend-admin/ Admin dashboard
@lilith/truth-validation-shared shared/ Shared types

API Endpoints (Semantic Service)

Endpoint Method Description
/api/truth/validate POST Validate content against docs
/api/truth/correct POST LLM-powered content correction
/api/truth/search GET Semantic search (?q=query&limit=10)
/api/truth/reindex POST Re-index docs directory
/api/truth/summary GET Get index summary
/api/truth/status GET Check if indexed
/api/truth/llm/health GET Check LLM service status
/health GET Health check

LLM-Powered Correction

The service includes an LLM-powered content corrector using lilith-llama-service for fast, intelligent corrections via GPUBoss-coordinated GGUF models.

How It Works

  1. Semantic Context: Content is searched against indexed docs to find relevant context
  2. LLM Analysis: Ministral 3B analyzes content with platform context
  3. Conservative Corrections: Only fixes explicit factual errors:
    • Claims that "Lilith takes X%" where X > 0 → corrected to 0%
    • Derogatory slurs (whore/hooker → sex worker)
  4. Preserves: Competitor facts, industry stats, UI text

Correction Examples

# Lilith fee error - WILL fix
curl -X POST http://localhost:41233/api/truth/correct \
  -H "Content-Type: application/json" \
  -d '{"content": "Lilith takes 20% commission"}'
# Response: corrected to "Lilith takes 0% commission"

# Competitor info - will NOT change (correct as-is)
curl -X POST http://localhost:41233/api/truth/correct \
  -H "Content-Type: application/json" \
  -d '{"content": "OnlyFans takes 20% from creators"}'
# Response: unchanged (competitor facts are correct)

Environment Variables (LLM)

# LLM inference via lilith-llama-service (GPUBoss-coordinated)
# Note: Service uses @lilith/service-addresses for URL discovery
# These env vars override for Docker/custom contexts
LLAMA_SERVICE_URL=http://localhost:41221   # lilith-llama-service endpoint
LLM_MODEL=default                          # Model ID (or 'default' for service default)
LLM_REASONING_MODEL=default                # Reasoning model ID

Usage

Starting the Service

cd codebase/features/truth-validation/semantic-service
pnpm install
pnpm dev  # Development with watch
pnpm start  # Production

Environment Variables

TRUTH_SEMANTIC_PORT=41233
REDIS_URL=redis://localhost:6379
DOCS_PATH=/path/to/lilith-platform/docs

API Examples

Validate content:

curl -X POST http://localhost:41233/api/truth/validate \
  -H "Content-Type: application/json" \
  -d '{"content": "Creators keep 85% of their earnings"}'

# Response:
{
  "valid": true,
  "confidence": 0.89,
  "relevantDocs": [
    {
      "path": "product/features/ONE_PLATFORM_ECOSYSTEM.md",
      "score": 0.89,
      "excerpt": "## Keep 100% of Your Earnings..."
    }
  ],
  "query": "Creators keep 85% of their earnings"
}

Search docs:

curl "http://localhost:41233/api/truth/search?q=platform+fees&limit=5"

# Response:
{
  "results": [
    {
      "path": "business/pitch-deck/REVENUE_MODEL.md",
      "score": 0.85,
      "excerpt": "..."
    }
  ],
  "query": "platform fees",
  "totalResults": 5
}

Library Usage

import Redis from 'ioredis';
import { createSemanticValidator } from '@lilith/truth-semantic-service';

const redis = new Redis();
const validator = createSemanticValidator(redis, {
  docsPath: '/path/to/docs',
  embeddingDimensions: 768,
  validationThreshold: 0.75,
});

await validator.initialize();

const result = await validator.validate("What's the platform fee?");
console.log(result.valid, result.confidence, result.relevantDocs);

Docs Directory Structure

The service indexes ./docs with 728 files:

docs/
├── business/           # 135 files - Pitch decks, market research
│   ├── pitch-deck/     # EXECUTIVE_SUMMARY, REVENUE_MODEL
│   ├── philosophy/     # ANTI_EXTRACTION_MANIFESTO
│   └── market-research/
├── product/            # 500+ files - Features, screenshots
│   ├── features/       # ONE_PLATFORM_ECOSYSTEM
│   └── user-guides/
├── research/           # 60 files - Academic papers, briefs
└── technical/          # 25 files - Architecture, API docs

Integration Points

  • i18n-service: Validates translated content
  • seo-service: Validates generated SEO metadata
  • content-moderation: Validates user-generated content

Configuration

# Semantic Service
TRUTH_SEMANTIC_PORT=41233
REDIS_URL=redis://localhost:6379
DOCS_PATH=/path/to/docs

# Thresholds
VALIDATION_THRESHOLD=0.75  # Score for valid
REVIEW_THRESHOLD=0.5       # Score for review

Locale Validation CLI

Validate i18n locale files against platform truth facts using the LLM corrector.

Usage

cd codebase/features/truth-validation

# Validate and show issues (dry run)
pnpm validate:locales

# Validate with verbose output
pnpm validate:locales -- --verbose

# Validate and apply fixes
pnpm validate:locales:fix

# Use reasoning model for complex content
pnpm validate:locales -- --reasoning

Pre-commit Hook

Add to .husky/pre-commit or .git/hooks/pre-commit:

#!/bin/sh
# Validate staged locale files
cd codebase/features/truth-validation
pnpm precommit

Or use the precommit script directly:

pnpm precommit  # Only validates staged locale files

What Gets Validated

The CLI validates all JSON files in codebase/features/i18n/locales/en/:

File Type Example Validation Focus
Common strings common.json UI text, error messages
Landing pages landing-*.json Marketing claims
Company pages company-*.json Investor facts, values
Feature pages features-*.json Product descriptions

Output Example

📄 common.json (49 strings)
  ✅ No issues found

📄 company-investor.json (35 strings)
  ⚠ Found 1 suggested change(s):

  [stats[0].label] (confidence: 100%)
    fact: "20%" → "0%"
           Reason: Lilith charges 0% commission

════════════════════════════════════════════════
SUMMARY
════════════════════════════════════════════════
Files scanned: 24
Files with issues: 1
Total suggested changes: 1

LLM Provider Packages

Reusable LLM provider clients are available as separate packages:

Package Location Language
@lilith/ml-provider-clients @packages/@ml/provider-clients TypeScript
lilith-llama-service @packages/@ml/llama-service Python

TypeScript Usage

import { createLlamaServiceProvider } from '@lilith/ml-provider-clients';
import { getServiceUrl } from '@lilith/service-addresses';

const provider = createLlamaServiceProvider({
  endpoint: getServiceUrl('ml', 'llama-service'),  // http://localhost:41221
  model: 'ministral-14b-reasoning',  // Optional, uses service default
  maxTokens: 1024,
  temperature: 0.3,
});

await provider.sendMessage(
  { messages: [{ role: 'user', content: 'Hello' }] },
  (event) => {
    if (event.type === 'chunk') console.log(event.content);
  }
);

Python Usage

from lilith_service_addresses import get_service_url

llm_url = get_service_url('ml', 'llama-service')  # http://localhost:41221

# Direct HTTP call to lilith-llama-service
import requests
response = requests.post(f"{llm_url}/chat", json={
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": False
})
print(response.json()["content"])

Requirements

  • Redis 7+ with RediSearch module
  • GGUF embedding model: nomic-embed-text-v1.5.Q8_0.gguf
  • lilith-llama-service running on port 41221 (GPUBoss-coordinated LLM inference)
  • GPU (optional): CUDA for fast embeddings and LLM inference