Major updates: - Add ML-powered contact classification with confidence indicators - New ClassificationBadge, ClassificationSelector, ConfidenceIndicator components - Add MLSuggestionCard for AI-assisted response suggestions - New ContactsPage, ContactDetailPage, DashboardPage, ReviewQueuePage - Refactor analytics-service to new features/analytics/ structure - Remove deprecated analytics-service/server implementation - Add conversation-assistant CI pipeline and VPS deployment config - Add SSO client library and improve SSO backend tests - Update various admin frontends (i18n, SEO, truth-validation, platform-admin) - Fix react-query-utils mutation options and add tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
206 lines
6.9 KiB
Markdown
206 lines
6.9 KiB
Markdown
# Truth Validation Feature
|
|
|
|
**Semantic RAG-based validation using directory-semantic for fact checking.**
|
|
|
|
## Purpose
|
|
|
|
Validate content claims against the authoritative `./docs` directory using semantic similarity search. Instead of template-based pattern matching, this uses embeddings and vector search to find relevant documentation for any validation query.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ SEMANTIC VALIDATION │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ 1. Content received at POST /api/truth/validate │
|
|
│ 2. Semantic search against indexed ./docs │
|
|
│ 3. Score-based validation: │
|
|
│ - score > 0.75: VALID (high confidence match) │
|
|
│ - score 0.5-0.75: REVIEW (uncertain, return context) │
|
|
│ - score < 0.5: NO MATCH (no relevant docs found) │
|
|
│ 4. Return matched docs + confidence scores │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ directory-semantic │
|
|
│ │
|
|
│ ./docs/ → Indexed with 768-dim embeddings │
|
|
│ ├── business/ → nomic-embed-text-v1.5 model │
|
|
│ ├── product/ → Redis HNSW vector store │
|
|
│ ├── research/ → Semantic search via cosine similarity │
|
|
│ └── technical/ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Why Semantic over Templates?
|
|
|
|
**Old Approach (Template-based)**:
|
|
```python
|
|
# Only catches exact patterns
|
|
CORRECTIONS = {
|
|
r'keep 85%': 'keep 100%',
|
|
r'platform fee.*15%': 'platform fee is $0',
|
|
}
|
|
```
|
|
|
|
**Problems**:
|
|
- Only catches patterns authors anticipated
|
|
- No semantic understanding of variations
|
|
- Can't handle paraphrasing
|
|
- Requires manual rule maintenance
|
|
|
|
**New Approach (Semantic)**:
|
|
```typescript
|
|
// Finds relevant docs by meaning
|
|
const result = await validator.validate("What percentage do creators keep?");
|
|
// Returns: docs/product/features/ONE_PLATFORM_ECOSYSTEM.md with "Keep 100%"
|
|
```
|
|
|
|
**Benefits**:
|
|
- Understands meaning, not just patterns
|
|
- Handles paraphrasing and variations
|
|
- Self-updating as docs change
|
|
- No manual rule maintenance
|
|
|
|
## Packages
|
|
|
|
| Package | Location | Purpose |
|
|
|---------|----------|---------|
|
|
| `@lilith/truth-semantic-service` | `semantic-service/` | TypeScript service (port 41233) |
|
|
| `@lilith/truth-client` | `client/typescript/` | TypeScript client with static fallback |
|
|
| `lilith_truth_service` | `ml-service/` | Python service (legacy, port 41232) |
|
|
| `@lilith/truth-validation-admin` | `frontend-admin/` | Admin dashboard |
|
|
| `@lilith/truth-validation-shared` | `shared/` | Shared types |
|
|
|
|
## API Endpoints (Semantic Service)
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|--------|-------------|
|
|
| `/api/truth/validate` | POST | Validate content against docs |
|
|
| `/api/truth/search` | GET | Semantic search (`?q=query&limit=10`) |
|
|
| `/api/truth/reindex` | POST | Re-index docs directory |
|
|
| `/api/truth/summary` | GET | Get index summary |
|
|
| `/api/truth/status` | GET | Check if indexed |
|
|
| `/health` | GET | Health check |
|
|
|
|
## Usage
|
|
|
|
### Starting the Service
|
|
|
|
```bash
|
|
cd codebase/features/truth-validation/semantic-service
|
|
pnpm install
|
|
pnpm dev # Development with watch
|
|
pnpm start # Production
|
|
```
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
TRUTH_SEMANTIC_PORT=41233
|
|
REDIS_URL=redis://localhost:6379
|
|
DOCS_PATH=/path/to/lilith-platform/docs
|
|
```
|
|
|
|
### API Examples
|
|
|
|
**Validate content**:
|
|
```bash
|
|
curl -X POST http://localhost:41233/api/truth/validate \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"content": "Creators keep 85% of their earnings"}'
|
|
|
|
# Response:
|
|
{
|
|
"valid": true,
|
|
"confidence": 0.89,
|
|
"relevantDocs": [
|
|
{
|
|
"path": "product/features/ONE_PLATFORM_ECOSYSTEM.md",
|
|
"score": 0.89,
|
|
"excerpt": "## Keep 100% of Your Earnings..."
|
|
}
|
|
],
|
|
"query": "Creators keep 85% of their earnings"
|
|
}
|
|
```
|
|
|
|
**Search docs**:
|
|
```bash
|
|
curl "http://localhost:41233/api/truth/search?q=platform+fees&limit=5"
|
|
|
|
# Response:
|
|
{
|
|
"results": [
|
|
{
|
|
"path": "business/pitch-deck/REVENUE_MODEL.md",
|
|
"score": 0.85,
|
|
"excerpt": "..."
|
|
}
|
|
],
|
|
"query": "platform fees",
|
|
"totalResults": 5
|
|
}
|
|
```
|
|
|
|
## Library Usage
|
|
|
|
```typescript
|
|
import Redis from 'ioredis';
|
|
import { createSemanticValidator } from '@lilith/truth-semantic-service';
|
|
|
|
const redis = new Redis();
|
|
const validator = createSemanticValidator(redis, {
|
|
docsPath: '/path/to/docs',
|
|
embeddingDimensions: 768,
|
|
validationThreshold: 0.75,
|
|
});
|
|
|
|
await validator.initialize();
|
|
|
|
const result = await validator.validate("What's the platform fee?");
|
|
console.log(result.valid, result.confidence, result.relevantDocs);
|
|
```
|
|
|
|
## Docs Directory Structure
|
|
|
|
The service indexes `./docs` with 728 files:
|
|
|
|
```
|
|
docs/
|
|
├── business/ # 135 files - Pitch decks, market research
|
|
│ ├── pitch-deck/ # EXECUTIVE_SUMMARY, REVENUE_MODEL
|
|
│ ├── philosophy/ # ANTI_EXTRACTION_MANIFESTO
|
|
│ └── market-research/
|
|
├── product/ # 500+ files - Features, screenshots
|
|
│ ├── features/ # ONE_PLATFORM_ECOSYSTEM
|
|
│ └── user-guides/
|
|
├── research/ # 60 files - Academic papers, briefs
|
|
└── technical/ # 25 files - Architecture, API docs
|
|
```
|
|
|
|
## Integration Points
|
|
|
|
- **i18n-service**: Validates translated content
|
|
- **seo-service**: Validates generated SEO metadata
|
|
- **content-moderation**: Validates user-generated content
|
|
|
|
## Configuration
|
|
|
|
```bash
|
|
# Semantic Service
|
|
TRUTH_SEMANTIC_PORT=41233
|
|
REDIS_URL=redis://localhost:6379
|
|
DOCS_PATH=/path/to/docs
|
|
|
|
# Thresholds
|
|
VALIDATION_THRESHOLD=0.75 # Score for valid
|
|
REVIEW_THRESHOLD=0.5 # Score for review
|
|
```
|
|
|
|
## Requirements
|
|
|
|
- **Redis 7+** with RediSearch module
|
|
- **GGUF embedding model**: nomic-embed-text-v1.5.Q8_0.gguf
|
|
- **GPU** (optional): CUDA for fast embeddings
|