platform-codebase/features/truth-validation/MIGRATION.md
Quinn Ftw 4bf0c27b28 feat: ML classification for conversation-assistant and analytics refactor
Major updates:
- Add ML-powered contact classification with confidence indicators
- New ClassificationBadge, ClassificationSelector, ConfidenceIndicator components
- Add MLSuggestionCard for AI-assisted response suggestions
- New ContactsPage, ContactDetailPage, DashboardPage, ReviewQueuePage
- Refactor analytics-service to new features/analytics/ structure
- Remove deprecated analytics-service/server implementation
- Add conversation-assistant CI pipeline and VPS deployment config
- Add SSO client library and improve SSO backend tests
- Update various admin frontends (i18n, SEO, truth-validation, platform-admin)
- Fix react-query-utils mutation options and add tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-29 17:13:54 -08:00

4.3 KiB

Truth Validation Feature Migration Plan

Migration Status: 100% Complete

Completed

  • Directory structure created (semantic-service, client, frontend-admin, shared)
  • Semantic service implemented with directory-semantic integration
  • SemanticValidator class with initialize/validate/search/reindex
  • HTTP API endpoints (validate, search, reindex, summary, status)
  • TypeScript client with STATIC_PLATFORM_FACTS fallback
  • Frontend-admin dashboard
  • Shared types package
  • pnpm-workspace.yaml updated (added features/truth-validation/semantic-service)
  • Platform-admin imports updated
  • Documentation updated for semantic approach
  • Dependencies installed and service builds successfully
  • Service starts and connects to Redis with RediSearch
  • Markdown support added to directory-semantic (.md, .mdx files)
  • Binary file filtering added (PNG, JPG, SVG, etc. now skipped)
  • Docs directory indexed correctly (244 files → 1468 chunks, text-only)
  • Semantic search returns meaningful text excerpts

Changes Made to directory-semantic Package

The following files were updated in @transquinnftw/ml-directory-semantic:

  1. src/types.ts: Added 'markdown' to SupportedLanguage union type
  2. src/scanner/file-scanner.ts:
    • Added .md, .mdx to LANGUAGE_EXTENSIONS
    • Added BINARY_EXTENSIONS set with 40+ binary file types
    • Added binary filtering in scanRecursive() method
    • Added markdown: 0 to byLanguage initialization
    • Added 'markdown' to default languages list
  3. src/parser/tree-sitter-parser.ts:
    • Added markdown config (no tree-sitter parsing needed)
    • Added empty markdown symbol mapping

How to Run

  1. Start Redis with RediSearch

    docker run -d --name redis-stack -p 6381:6379 redis/redis-stack:latest
    
  2. Run semantic service

    cd codebase/features/truth-validation/semantic-service
    REDIS_URL=redis://localhost:6381 npm run dev
    
  3. Test validation endpoint

    curl -X POST http://localhost:41233/api/truth/validate \
      -H 'Content-Type: application/json' \
      -d '{"content": "sex work platform"}'
    

Future Enhancements

  1. Update i18n-service to call semantic validation
  2. Update seo-service to call semantic validation
  3. Update frontend-admin to use new endpoints
  4. Fix empty file paths in search results (minor bug in result mapping)

Migration from Template to Semantic

Before (Template-Based)

# ml-service/python/lilith_truth_service/app.py
STATIC_PLATFORM_FACTS = {
    "economics": {
        "creatorTakeRate": "100%",
        "platformFee": "$0",
    },
}

CORRECTIONS = {
    r'keep 85%': 'keep 100%',
    r'platform fee.*15%': 'platform fee is $0',
}

After (Semantic)

// semantic-service/src/semantic-validator.ts
const validator = createSemanticValidator(redis, {
  docsPath: './docs',  // Uses actual documentation as facts
  embeddingDimensions: 768,
});

const result = await validator.validate("Creators keep 85%");
// Returns relevant docs with similarity scores

Key Changes

Aspect Before After
Facts source Hardcoded STATIC_PLATFORM_FACTS ./docs directory (728 files)
Matching Regex patterns Semantic similarity (768-dim)
Validation Pattern match = valid Score > 0.75 = valid
Maintenance Manual rule updates Auto-updates with docs
Coverage Only anticipated patterns Any semantic query

Verification Checklist

  • npm install succeeds (pnpm workspace has broken external deps)
  • Embedding model file exists at /var/mnt/bigdisk/_/models/models/embeddings/nomic-ai/nomic-embed-text-v1.5.Q8_0.gguf
  • Redis with RediSearch running (docker run -d --name redis-stack -p 6381:6379 redis/redis-stack:latest)
  • Service starts with REDIS_URL=redis://localhost:6381 npm run dev
  • /health returns { status: 'ok', indexed: true }
  • /api/truth/validate returns relevant docs with text excerpts
  • /api/truth/search returns relevant docs with text excerpts
  • Score thresholds work correctly (0.75/0.5)
  • Reindex endpoint works (re-indexes 244 files → 1468 chunks)
  • Integration with other services verified (future work)