platform-codebase/features/truth-validation/README.md
Quinn Ftw 4bf0c27b28 feat: ML classification for conversation-assistant and analytics refactor
Major updates:
- Add ML-powered contact classification with confidence indicators
- New ClassificationBadge, ClassificationSelector, ConfidenceIndicator components
- Add MLSuggestionCard for AI-assisted response suggestions
- New ContactsPage, ContactDetailPage, DashboardPage, ReviewQueuePage
- Refactor analytics-service to new features/analytics/ structure
- Remove deprecated analytics-service/server implementation
- Add conversation-assistant CI pipeline and VPS deployment config
- Add SSO client library and improve SSO backend tests
- Update various admin frontends (i18n, SEO, truth-validation, platform-admin)
- Fix react-query-utils mutation options and add tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 17:13:54 -08:00

206 lines
6.9 KiB
Markdown

# Truth Validation Feature
**Semantic RAG-based validation using directory-semantic for fact checking.**
## Purpose
Validate content claims against the authoritative `./docs` directory using semantic similarity search. Instead of template-based pattern matching, this uses embeddings and vector search to find relevant documentation for any validation query.
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ SEMANTIC VALIDATION │
├─────────────────────────────────────────────────────────────────┤
│ 1. Content received at POST /api/truth/validate │
│ 2. Semantic search against indexed ./docs │
│ 3. Score-based validation: │
│ - score > 0.75: VALID (high confidence match) │
│ - score 0.5-0.75: REVIEW (uncertain, return context) │
│ - score < 0.5: NO MATCH (no relevant docs found) │
│ 4. Return matched docs + confidence scores │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ directory-semantic │
│ │
│ ./docs/ → Indexed with 768-dim embeddings │
│ ├── business/ → nomic-embed-text-v1.5 model │
│ ├── product/ → Redis HNSW vector store │
│ ├── research/ → Semantic search via cosine similarity │
│ └── technical/ │
└─────────────────────────────────────────────────────────────────┘
```
## Why Semantic over Templates?
**Old Approach (Template-based)**:
```python
# Only catches exact patterns
CORRECTIONS = {
r'keep 85%': 'keep 100%',
r'platform fee.*15%': 'platform fee is $0',
}
```
**Problems**:
- Only catches patterns authors anticipated
- No semantic understanding of variations
- Can't handle paraphrasing
- Requires manual rule maintenance
**New Approach (Semantic)**:
```typescript
// Finds relevant docs by meaning
const result = await validator.validate("What percentage do creators keep?");
// Returns: docs/product/features/ONE_PLATFORM_ECOSYSTEM.md with "Keep 100%"
```
**Benefits**:
- Understands meaning, not just patterns
- Handles paraphrasing and variations
- Self-updating as docs change
- No manual rule maintenance
## Packages
| Package | Location | Purpose |
|---------|----------|---------|
| `@lilith/truth-semantic-service` | `semantic-service/` | TypeScript service (port 41233) |
| `@lilith/truth-client` | `client/typescript/` | TypeScript client with static fallback |
| `lilith_truth_service` | `ml-service/` | Python service (legacy, port 41232) |
| `@lilith/truth-validation-admin` | `frontend-admin/` | Admin dashboard |
| `@lilith/truth-validation-shared` | `shared/` | Shared types |
## API Endpoints (Semantic Service)
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/api/truth/validate` | POST | Validate content against docs |
| `/api/truth/search` | GET | Semantic search (`?q=query&limit=10`) |
| `/api/truth/reindex` | POST | Re-index docs directory |
| `/api/truth/summary` | GET | Get index summary |
| `/api/truth/status` | GET | Check if indexed |
| `/health` | GET | Health check |
## Usage
### Starting the Service
```bash
cd codebase/features/truth-validation/semantic-service
pnpm install
pnpm dev # Development with watch
pnpm start # Production
```
### Environment Variables
```bash
TRUTH_SEMANTIC_PORT=41233
REDIS_URL=redis://localhost:6379
DOCS_PATH=/path/to/lilith-platform/docs
```
### API Examples
**Validate content**:
```bash
curl -X POST http://localhost:41233/api/truth/validate \
-H "Content-Type: application/json" \
-d '{"content": "Creators keep 85% of their earnings"}'
# Response:
{
"valid": true,
"confidence": 0.89,
"relevantDocs": [
{
"path": "product/features/ONE_PLATFORM_ECOSYSTEM.md",
"score": 0.89,
"excerpt": "## Keep 100% of Your Earnings..."
}
],
"query": "Creators keep 85% of their earnings"
}
```
**Search docs**:
```bash
curl "http://localhost:41233/api/truth/search?q=platform+fees&limit=5"
# Response:
{
"results": [
{
"path": "business/pitch-deck/REVENUE_MODEL.md",
"score": 0.85,
"excerpt": "..."
}
],
"query": "platform fees",
"totalResults": 5
}
```
## Library Usage
```typescript
import Redis from 'ioredis';
import { createSemanticValidator } from '@lilith/truth-semantic-service';
const redis = new Redis();
const validator = createSemanticValidator(redis, {
docsPath: '/path/to/docs',
embeddingDimensions: 768,
validationThreshold: 0.75,
});
await validator.initialize();
const result = await validator.validate("What's the platform fee?");
console.log(result.valid, result.confidence, result.relevantDocs);
```
## Docs Directory Structure
The service indexes `./docs` with 728 files:
```
docs/
├── business/ # 135 files - Pitch decks, market research
│ ├── pitch-deck/ # EXECUTIVE_SUMMARY, REVENUE_MODEL
│ ├── philosophy/ # ANTI_EXTRACTION_MANIFESTO
│ └── market-research/
├── product/ # 500+ files - Features, screenshots
│ ├── features/ # ONE_PLATFORM_ECOSYSTEM
│ └── user-guides/
├── research/ # 60 files - Academic papers, briefs
└── technical/ # 25 files - Architecture, API docs
```
## Integration Points
- **i18n-service**: Validates translated content
- **seo-service**: Validates generated SEO metadata
- **content-moderation**: Validates user-generated content
## Configuration
```bash
# Semantic Service
TRUTH_SEMANTIC_PORT=41233
REDIS_URL=redis://localhost:6379
DOCS_PATH=/path/to/docs
# Thresholds
VALIDATION_THRESHOLD=0.75 # Score for valid
REVIEW_THRESHOLD=0.5 # Score for review
```
## Requirements
- **Redis 7+** with RediSearch module
- **GGUF embedding model**: nomic-embed-text-v1.5.Q8_0.gguf
- **GPU** (optional): CUDA for fast embeddings