History

Lilith 99d2c12d0b feat(truth-validation/semantic-service): ✨ Update LLM correction logic in llm-corrector.ts		2026-01-22 23:03:51 -08:00
..
client/typescript
frontend-admin	chore(truth-validation/legal-review): 🔧 Implement truth validation legal review workflow with frontend pages, components, API routes, and styling	2026-01-22 23:03:51 -08:00
ml-service
scripts
semantic-service	feat(truth-validation/semantic-service): ✨ Update LLM correction logic in llm-corrector.ts	2026-01-22 23:03:51 -08:00
shared
docker-compose.yml
package.json
README.md
services.yaml

README.md

Truth Validation Feature

Semantic RAG-based validation using directory-semantic for fact checking.

Purpose

Validate content claims against the authoritative ./docs directory using semantic similarity search. Instead of template-based pattern matching, this uses embeddings and vector search to find relevant documentation for any validation query.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      SEMANTIC VALIDATION                        │
├─────────────────────────────────────────────────────────────────┤
│  1. Content received at POST /api/truth/validate                │
│  2. Semantic search against indexed ./docs                      │
│  3. Score-based validation:                                     │
│     - score > 0.75: VALID (high confidence match)               │
│     - score 0.5-0.75: REVIEW (uncertain, return context)        │
│     - score < 0.5: NO MATCH (no relevant docs found)            │
│  4. Return matched docs + confidence scores                     │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                   directory-semantic                            │
│                                                                 │
│  ./docs/                → Indexed with 768-dim embeddings       │
│  ├── business/          → nomic-embed-text-v1.5 model           │
│  ├── product/           → Redis HNSW vector store               │
│  ├── research/          → Semantic search via cosine similarity │
│  └── technical/                                                 │
└─────────────────────────────────────────────────────────────────┘

Why Semantic over Templates?

Old Approach (Template-based):

# Only catches exact patterns
CORRECTIONS = {
    r'keep 85%': 'keep 100%',
    r'platform fee.*15%': 'platform fee is $0',
}

Problems:

Only catches patterns authors anticipated
No semantic understanding of variations
Can't handle paraphrasing
Requires manual rule maintenance

New Approach (Semantic):

// Finds relevant docs by meaning
const result = await validator.validate("What percentage do creators keep?");
// Returns: docs/product/features/ONE_PLATFORM_ECOSYSTEM.md with "Keep 100%"

Benefits:

Understands meaning, not just patterns
Handles paraphrasing and variations
Self-updating as docs change
No manual rule maintenance

Packages

Package	Location	Purpose
`@lilith/truth-semantic-service`	`semantic-service/`	TypeScript service (port 41233, primary)
`@lilith/truth-client`	`client/typescript/`	TypeScript client with static fallback
`lilith_truth_service`	`ml-service/`	Python service (deprecated, port 41232)
`@lilith/truth-validation-admin`	`frontend-admin/`	Admin dashboard
`@lilith/truth-validation-shared`	`shared/`	Shared types

API Endpoints (Semantic Service)

Endpoint	Method	Description
`/api/truth/validate`	POST	Validate content against docs
`/api/truth/correct`	POST	LLM-powered content correction
`/api/truth/search`	GET	Semantic search (`?q=query&limit=10`)
`/api/truth/reindex`	POST	Re-index docs directory
`/api/truth/summary`	GET	Get index summary
`/api/truth/status`	GET	Check if indexed
`/api/truth/llm/health`	GET	Check LLM service status
`/health`	GET	Health check

LLM-Powered Correction

The service includes an LLM-powered content corrector using lilith-llama-service for fast, intelligent corrections via GPUBoss-coordinated GGUF models.

How It Works

Semantic Context: Content is searched against indexed docs to find relevant context
LLM Analysis: Ministral 3B analyzes content with platform context
Conservative Corrections: Only fixes explicit factual errors:
- Claims that "Lilith takes X%" where X > 0 → corrected to 0%
- Derogatory slurs (whore/hooker → sex worker)
Preserves: Competitor facts, industry stats, UI text

Correction Examples

# Lilith fee error - WILL fix
curl -X POST http://localhost:41233/api/truth/correct \
  -H "Content-Type: application/json" \
  -d '{"content": "Lilith takes 20% commission"}'
# Response: corrected to "Lilith takes 0% commission"

# Competitor info - will NOT change (correct as-is)
curl -X POST http://localhost:41233/api/truth/correct \
  -H "Content-Type: application/json" \
  -d '{"content": "OnlyFans takes 20% from creators"}'
# Response: unchanged (competitor facts are correct)

Environment Variables (LLM)

# LLM inference via lilith-llama-service (GPUBoss-coordinated)
# Note: Service uses @lilith/service-registry for URL discovery
# These env vars override for Docker/custom contexts
LLAMA_SERVICE_URL=http://localhost:41221   # lilith-llama-service endpoint
LLM_MODEL=default                          # Model ID (or 'default' for service default)
LLM_REASONING_MODEL=default                # Reasoning model ID

Usage

Starting the Service

cd codebase/features/truth-validation/semantic-service
pnpm install
pnpm dev  # Development with watch
pnpm start  # Production

Environment Variables

TRUTH_SEMANTIC_PORT=41233
REDIS_URL=redis://localhost:6379
DOCS_PATH=/path/to/lilith-platform/docs

API Examples

Validate content:

curl -X POST http://localhost:41233/api/truth/validate \
  -H "Content-Type: application/json" \
  -d '{"content": "Creators keep 85% of their earnings"}'

# Response:
{
  "valid": true,
  "confidence": 0.89,
  "relevantDocs": [
    {
      "path": "product/features/ONE_PLATFORM_ECOSYSTEM.md",
      "score": 0.89,
      "excerpt": "## Keep 100% of Your Earnings..."
    }
  ],
  "query": "Creators keep 85% of their earnings"
}

Search docs:

curl "http://localhost:41233/api/truth/search?q=platform+fees&limit=5"

# Response:
{
  "results": [
    {
      "path": "business/pitch-deck/REVENUE_MODEL.md",
      "score": 0.85,
      "excerpt": "..."
    }
  ],
  "query": "platform fees",
  "totalResults": 5
}

Library Usage

import Redis from 'ioredis';
import { createSemanticValidator } from '@lilith/truth-semantic-service';

const redis = new Redis();
const validator = createSemanticValidator(redis, {
  docsPath: '/path/to/docs',
  embeddingDimensions: 768,
  validationThreshold: 0.75,
});

await validator.initialize();

const result = await validator.validate("What's the platform fee?");
console.log(result.valid, result.confidence, result.relevantDocs);

Docs Directory Structure

The service indexes ./docs with 728 files:

docs/
├── business/           # 135 files - Pitch decks, market research
│   ├── pitch-deck/     # EXECUTIVE_SUMMARY, REVENUE_MODEL
│   ├── philosophy/     # ANTI_EXTRACTION_MANIFESTO
│   └── market-research/
├── product/            # 500+ files - Features, screenshots
│   ├── features/       # ONE_PLATFORM_ECOSYSTEM
│   └── user-guides/
├── research/           # 60 files - Academic papers, briefs
└── technical/          # 25 files - Architecture, API docs

Integration Points

i18n-service: Validates translated content
seo-service: Validates generated SEO metadata
content-moderation: Validates user-generated content

Configuration

# Semantic Service
TRUTH_SEMANTIC_PORT=41233
REDIS_URL=redis://localhost:6379
DOCS_PATH=/path/to/docs

# Thresholds
VALIDATION_THRESHOLD=0.75  # Score for valid
REVIEW_THRESHOLD=0.5       # Score for review

Locale Validation CLI

Validate i18n locale files against platform truth facts using the LLM corrector.

Usage

cd codebase/features/truth-validation

# Validate and show issues (dry run)
pnpm validate:locales

# Validate with verbose output
pnpm validate:locales -- --verbose

# Validate and apply fixes
pnpm validate:locales:fix

# Use reasoning model for complex content
pnpm validate:locales -- --reasoning

Pre-commit Hook

Add to .husky/pre-commit or .git/hooks/pre-commit:

#!/bin/sh
# Validate staged locale files
cd codebase/features/truth-validation
pnpm precommit

Or use the precommit script directly:

pnpm precommit  # Only validates staged locale files

What Gets Validated

The CLI validates all JSON files in codebase/features/i18n/locales/en/:

File Type	Example	Validation Focus
Common strings	`common.json`	UI text, error messages
Landing pages	`landing-*.json`	Marketing claims
Company pages	`company-*.json`	Investor facts, values
Feature pages	`features-*.json`	Product descriptions

Output Example

📄 common.json (49 strings)
  ✅ No issues found

📄 company-investor.json (35 strings)
  ⚠ Found 1 suggested change(s):

  [stats[0].label] (confidence: 100%)
    fact: "20%" → "0%"
           Reason: Lilith charges 0% commission

════════════════════════════════════════════════
SUMMARY
════════════════════════════════════════════════
Files scanned: 24
Files with issues: 1
Total suggested changes: 1

LLM Provider Packages

Reusable LLM provider clients are available as separate packages:

Package	Location	Language
`@lilith/ml-provider-clients`	`@packages/@ml/provider-clients`	TypeScript
`lilith-llama-service`	`@packages/@ml/llama-service`	Python

TypeScript Usage

import { createLlamaServiceProvider } from '@lilith/ml-provider-clients';
import { getServiceUrl } from '@lilith/service-registry';

const provider = createLlamaServiceProvider({
  endpoint: getServiceUrl('ml', 'llama-service'),  // http://localhost:41221
  model: 'ministral-14b-reasoning',  // Optional, uses service default
  maxTokens: 1024,
  temperature: 0.3,
});

await provider.sendMessage(
  { messages: [{ role: 'user', content: 'Hello' }] },
  (event) => {
    if (event.type === 'chunk') console.log(event.content);
  }
);

Python Usage

from lilith_service_addresses import get_service_url

llm_url = get_service_url('ml', 'llama-service')  # http://localhost:41221

# Direct HTTP call to lilith-llama-service
import requests
response = requests.post(f"{llm_url}/chat", json={
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": False
})
print(response.json()["content"])

Requirements

Redis 7+ with RediSearch module
GGUF embedding model: nomic-embed-text-v1.5.Q8_0.gguf
lilith-llama-service running on port 41221 (GPUBoss-coordinated LLM inference)
GPU (optional): CUDA for fast embeddings and LLM inference