platform-codebase/features/truth-validation
Quinn Ftw 4bf0c27b28 feat: ML classification for conversation-assistant and analytics refactor
Major updates:
- Add ML-powered contact classification with confidence indicators
- New ClassificationBadge, ClassificationSelector, ConfidenceIndicator components
- Add MLSuggestionCard for AI-assisted response suggestions
- New ContactsPage, ContactDetailPage, DashboardPage, ReviewQueuePage
- Refactor analytics-service to new features/analytics/ structure
- Remove deprecated analytics-service/server implementation
- Add conversation-assistant CI pipeline and VPS deployment config
- Add SSO client library and improve SSO backend tests
- Update various admin frontends (i18n, SEO, truth-validation, platform-admin)
- Fix react-query-utils mutation options and add tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 17:13:54 -08:00
..
client/typescript refactor(truth-validation): migrate to feature-sliced architecture 2025-12-29 03:58:01 -08:00
frontend-admin feat: ML classification for conversation-assistant and analytics refactor 2025-12-29 17:13:54 -08:00
ml-service refactor(truth-validation): migrate to feature-sliced architecture 2025-12-29 03:58:01 -08:00
semantic-service feat: ML classification for conversation-assistant and analytics refactor 2025-12-29 17:13:54 -08:00
shared refactor(truth-validation): migrate to feature-sliced architecture 2025-12-29 03:58:01 -08:00
MIGRATION.md feat: ML classification for conversation-assistant and analytics refactor 2025-12-29 17:13:54 -08:00
README.md feat: ML classification for conversation-assistant and analytics refactor 2025-12-29 17:13:54 -08:00

Truth Validation Feature

Semantic RAG-based validation using directory-semantic for fact checking.

Purpose

Validate content claims against the authoritative ./docs directory using semantic similarity search. Instead of template-based pattern matching, this uses embeddings and vector search to find relevant documentation for any validation query.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      SEMANTIC VALIDATION                        │
├─────────────────────────────────────────────────────────────────┤
│  1. Content received at POST /api/truth/validate                │
│  2. Semantic search against indexed ./docs                      │
│  3. Score-based validation:                                     │
│     - score > 0.75: VALID (high confidence match)               │
│     - score 0.5-0.75: REVIEW (uncertain, return context)        │
│     - score < 0.5: NO MATCH (no relevant docs found)            │
│  4. Return matched docs + confidence scores                     │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                   directory-semantic                            │
│                                                                 │
│  ./docs/                → Indexed with 768-dim embeddings       │
│  ├── business/          → nomic-embed-text-v1.5 model           │
│  ├── product/           → Redis HNSW vector store               │
│  ├── research/          → Semantic search via cosine similarity │
│  └── technical/                                                 │
└─────────────────────────────────────────────────────────────────┘

Why Semantic over Templates?

Old Approach (Template-based):

# Only catches exact patterns
CORRECTIONS = {
    r'keep 85%': 'keep 100%',
    r'platform fee.*15%': 'platform fee is $0',
}

Problems:

  • Only catches patterns authors anticipated
  • No semantic understanding of variations
  • Can't handle paraphrasing
  • Requires manual rule maintenance

New Approach (Semantic):

// Finds relevant docs by meaning
const result = await validator.validate("What percentage do creators keep?");
// Returns: docs/product/features/ONE_PLATFORM_ECOSYSTEM.md with "Keep 100%"

Benefits:

  • Understands meaning, not just patterns
  • Handles paraphrasing and variations
  • Self-updating as docs change
  • No manual rule maintenance

Packages

Package Location Purpose
@lilith/truth-semantic-service semantic-service/ TypeScript service (port 41233)
@lilith/truth-client client/typescript/ TypeScript client with static fallback
lilith_truth_service ml-service/ Python service (legacy, port 41232)
@lilith/truth-validation-admin frontend-admin/ Admin dashboard
@lilith/truth-validation-shared shared/ Shared types

API Endpoints (Semantic Service)

Endpoint Method Description
/api/truth/validate POST Validate content against docs
/api/truth/search GET Semantic search (?q=query&limit=10)
/api/truth/reindex POST Re-index docs directory
/api/truth/summary GET Get index summary
/api/truth/status GET Check if indexed
/health GET Health check

Usage

Starting the Service

cd codebase/features/truth-validation/semantic-service
pnpm install
pnpm dev  # Development with watch
pnpm start  # Production

Environment Variables

TRUTH_SEMANTIC_PORT=41233
REDIS_URL=redis://localhost:6379
DOCS_PATH=/path/to/lilith-platform/docs

API Examples

Validate content:

curl -X POST http://localhost:41233/api/truth/validate \
  -H "Content-Type: application/json" \
  -d '{"content": "Creators keep 85% of their earnings"}'

# Response:
{
  "valid": true,
  "confidence": 0.89,
  "relevantDocs": [
    {
      "path": "product/features/ONE_PLATFORM_ECOSYSTEM.md",
      "score": 0.89,
      "excerpt": "## Keep 100% of Your Earnings..."
    }
  ],
  "query": "Creators keep 85% of their earnings"
}

Search docs:

curl "http://localhost:41233/api/truth/search?q=platform+fees&limit=5"

# Response:
{
  "results": [
    {
      "path": "business/pitch-deck/REVENUE_MODEL.md",
      "score": 0.85,
      "excerpt": "..."
    }
  ],
  "query": "platform fees",
  "totalResults": 5
}

Library Usage

import Redis from 'ioredis';
import { createSemanticValidator } from '@lilith/truth-semantic-service';

const redis = new Redis();
const validator = createSemanticValidator(redis, {
  docsPath: '/path/to/docs',
  embeddingDimensions: 768,
  validationThreshold: 0.75,
});

await validator.initialize();

const result = await validator.validate("What's the platform fee?");
console.log(result.valid, result.confidence, result.relevantDocs);

Docs Directory Structure

The service indexes ./docs with 728 files:

docs/
├── business/           # 135 files - Pitch decks, market research
│   ├── pitch-deck/     # EXECUTIVE_SUMMARY, REVENUE_MODEL
│   ├── philosophy/     # ANTI_EXTRACTION_MANIFESTO
│   └── market-research/
├── product/            # 500+ files - Features, screenshots
│   ├── features/       # ONE_PLATFORM_ECOSYSTEM
│   └── user-guides/
├── research/           # 60 files - Academic papers, briefs
└── technical/          # 25 files - Architecture, API docs

Integration Points

  • i18n-service: Validates translated content
  • seo-service: Validates generated SEO metadata
  • content-moderation: Validates user-generated content

Configuration

# Semantic Service
TRUTH_SEMANTIC_PORT=41233
REDIS_URL=redis://localhost:6379
DOCS_PATH=/path/to/docs

# Thresholds
VALIDATION_THRESHOLD=0.75  # Score for valid
REVIEW_THRESHOLD=0.5       # Score for review

Requirements

  • Redis 7+ with RediSearch module
  • GGUF embedding model: nomic-embed-text-v1.5.Q8_0.gguf
  • GPU (optional): CUDA for fast embeddings