platform-codebase/features/ML_INTEGRATION.md
Lilith aa115d3637 📝 Update ML_INTEGRATION.md documentation
Update port references and service configurations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-03 00:18:08 -08:00

7.9 KiB

ML Features Integration Plan

Overview

Three ML-powered features that work together to provide intelligent content management with semantic RAG-based validation:

┌─────────────────────────────────────────────────────────────────┐
│                   SEMANTIC RAG ARCHITECTURE                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ./docs (728 files)                                             │
│  ├── business/          ─┐                                      │
│  ├── product/            │── Indexed with 768-dim embeddings    │
│  ├── research/           │   nomic-embed-text-v1.5 model        │
│  └── technical/         ─┘   Redis HNSW vector store            │
│                                                                 │
│  Content → Semantic Search → Score-Based Validation             │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                  directory-semantic (ML)                        │
│                ~/Code/@packages/@ml/directory-semantic          │
└─────────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
┌───────────────┐   ┌─────────────────┐   ┌───────────────┐
│  i18n-service │   │ truth-semantic  │   │  seo-service  │
│   Port 3300   │   │   Port 41233    │   │   Port 3014   │
│               │   │                 │   │               │
│  6 providers  │   │  Semantic RAG   │   │  Geographic   │
│  Auto-fallback│   │  Score-based    │   │  hierarchy    │
└───────┬───────┘   └────────┬────────┘   └───────┬───────┘
        │                    │                    │
        │          ┌─────────┴─────────┐          │
        └──────────►  validates both   ◄──────────┘
                   └───────────────────┘

Service Dependencies

Service Port Depends On Used By
directory-semantic - Redis + RediSearch, GPU truth-semantic
truth-semantic 41233 directory-semantic i18n, seo
i18n-service 3300 llama-service, truth-semantic React apps
seo-service 3014 llama-service, truth-semantic All frontends

Integration Flows

Flow 1: Translation with Semantic Validation

User requests translation
        │
        ▼
┌───────────────┐
│ i18n-service  │──── 1. Get translation from LLM
│               │◄─── llama-service returns translation
│               │
│               │──── 2. Semantic validate translation
│               │◄─── truth-semantic returns confidence
│               │
│               │──── 3. Return (flag if low confidence)
└───────────────┘
        │
        ▼
   React app displays

Flow 2: SEO Generation with Semantic Validation

User configures SEO
        │
        ▼
┌───────────────┐
│  seo-service  │──── 1. Generate metadata from LLM
│               │◄─── llama-service returns SEO
│               │
│               │──── 2. Semantic validate against docs
│               │◄─── truth-semantic returns relevant docs
│               │
│               │──── 3. Cache and return
└───────────────┘
        │
        ▼
   HTML <head> tags

Flow 3: Content Publishing

Creator writes content
        │
        ▼
┌─────────────────┐
│ truth-semantic  │◄─── Semantic search for relevant facts
│                 │
│ Score-based:    │
│   > 0.75: Valid │
│   0.5-0.75: Review
│   < 0.5: No match
└─────────────────┘
        │
        ▼
┌─────────────────┐
│  i18n-service   │◄─── Translate to other locales
└─────────────────┘
        │
        ▼
   Published in all locales

Semantic Validation Details

How It Works

  1. Index docs directory on startup

    • 728 files (135 markdown, 447 images, 54 code files)
    • Chunked and embedded with nomic-embed-text-v1.5 (768 dimensions)
    • Stored in Redis with HNSW indexing
  2. Validate content by semantic search

    • Content → embedding → KNN search → top matches
    • Returns relevant docs with similarity scores
  3. Score-based decisions

    • score > 0.75: Content matches docs = VALID
    • score 0.5-0.75: Uncertain, return context for review
    • score < 0.5: No matching documentation

Example Validation

// Input: Marketing claim to validate
const result = await validator.validate("Creators keep 100% of earnings");

// Output: Matched against docs
{
  valid: true,
  confidence: 0.92,
  relevantDocs: [
    {
      path: "product/features/ONE_PLATFORM_ECOSYSTEM.md",
      score: 0.92,
      excerpt: "## Keep 100% of Your Earnings..."
    },
    {
      path: "business/pitch-deck/EXECUTIVE_SUMMARY.md",
      score: 0.87,
      excerpt: "...creators retain all earnings..."
    }
  ]
}

Deployment Order

  1. Redis with RediSearch - Vector store
  2. truth-semantic-service - Indexes docs, provides validation API
  3. i18n-service - Uses truth-semantic for translation validation
  4. seo-service - Uses truth-semantic for SEO validation

Health Check Chain

GET /health on each service should verify:

truth-semantic-service:
  - Redis reachable
  - Embedding model loaded
  - Docs directory indexed

i18n-service:
  - llama-service reachable
  - truth-semantic reachable
  - Glossary loaded

seo-service:
  - llama-service reachable
  - truth-semantic reachable
  - Cache initialized

API Gateway Routing

# ML Services
location /api/i18n/ {
    proxy_pass http://i18n-service:41231/api/i18n/;
}

location /api/truth/ {
    proxy_pass http://truth-semantic-service:41233/api/truth/;
}

location /api/seo/ {
    proxy_pass http://seo-service:3014/api/seo/;
}

Monitoring

Each service exposes Prometheus metrics:

  • Request count/latency
  • Semantic search latency
  • Cache hit rates
  • Validation confidence distribution

Dashboard in platform-admin shows:

  • Service health status
  • Docs index statistics
  • Validation activity
  • Confidence score distribution