platform-codebase/features/conversation-assistant/docs/ARCHITECTURE.md

28 KiB
Executable file

Conversation Assistant Architecture

Comprehensive documentation of the Conversation Assistant feature - an AI-powered iMessage response generation and training system.

System Overview

The Conversation Assistant enables AI-generated responses for iMessage conversations through a distributed architecture:

┌──────────────────────────────────────────────────────────────────────────┐
│                          macOS App (Swift)                               │
│  - Reads iMessage SQLite database (~Library/Messages/chat.db)            │
│  - Extracts conversations, contacts, and messages                        │
│  - Syncs data to server via REST API                                     │
│  - Runs as LaunchAgent (auto-start on login)                             │
└─────────────────────────────────┬────────────────────────────────────────┘
                                  │
                                  │ HTTPS POST /api/sync/*
                                  │ JWT Authentication
                                  ↓
┌──────────────────────────────────────────────────────────────────────────┐
│                       Server (NestJS) - Port 3100                        │
│  ┌─────────────────┐  ┌──────────────────┐  ┌──────────────────────┐     │
│  │ Devices Module  │  │ Sync Module      │  │ Conversations Module │     │
│  │ - Registration  │  │ - Message sync   │  │ - List/browse        │     │
│  │ - Verification  │  │ - Contact sync   │  │ - Message history    │     │
│  │ - JWT tokens    │  │ - Deduplication  │  │ - Context building   │     │
│  └─────────────────┘  └──────────────────┘  └──────────────────────┘     │
│  ┌──────────────────────────────┐  ┌────────────────────────────────┐    │
│  │ Responses Module             │  │ Training Module                │    │
│  │ - Orchestrates generation    │  │ - Collects samples             │    │
│  │ - Calls ML service           │  │ - Manages training jobs        │    │
│  │ - Stores generated responses │  │ - Tracks job progress          │    │
│  └──────────────────────────────┘  └────────────────────────────────┘    │
└─────────────────────────────────┬────────────────────────────────────────┘
                                  │
                                  │ HTTP POST /generate
                                  │ HTTP POST /training/*
                                  ↓
┌──────────────────────────────────────────────────────────────────────────┐
│                    ML Service (FastAPI) - Port 8100                      │
│  ┌───────────────────────┐    ┌───────────────────────────────────────┐  │
│  │ LLM Manager           │    │ Redis Integration                     │  │
│  │ - GGUF model loading  │    │ - Response caching (deterministic)    │  │
│  │ - llama-cpp-python    │    │ - Job queue (async generation)        │  │
│  │ - GPU acceleration    │    │ - Training job management             │  │
│  └───────────────────────┘    └───────────────────────────────────────┘  │
│                                                                          │
│  Model loading via lilith-model-loader:                                  │
│  - Manifest-based model fetching                                         │
│  - Local caching (~/.cache/lilith-models/)                               │
│  - Supports: ministral-3b, mistral-7b, llama-2-7b, phi-2                 │
└──────────────────────────────────────────────────────────────────────────┘
                                  │
                                  │
                                  ↓
┌──────────────────────────────────────────────────────────────────────────┐
│                      Frontend (React) - Port 5173                        │
│  ┌──────────────┐  ┌─────────────────┐  ┌─────────────────────────────┐  │
│  │ DevicesPage  │  │ConversationsPage│  │ TrainingPage                │  │
│  │ - List/manage│  │- Browse convos  │  │ - View training samples     │  │
│  │ - Register   │  │- View messages  │  │ - Start training jobs       │  │
│  │ - Deactivate │  │- Generate resp. │  │ - Monitor job progress      │  │
│  └──────────────┘  └─────────────────┘  └─────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────────┘

Data Flow

1. Device Registration Flow

macOS App                    Server                      User
    │                           │                          │
    │── POST /devices/register ─→│                          │
    │   {name, hardwareId,      │                          │
    │    platform, osVersion}   │                          │
    │                           │                          │
    │←── {deviceId, code,       │                          │
    │     expiresAt}            │                          │
    │                           │                          │
    │                           │←── User enters 6-digit ──│
    │                           │    code in settings UI   │
    │                           │                          │
    │── POST /devices/verify ──→│                          │
    │   {deviceId, code}        │                          │
    │                           │                          │
    │←── {token, expiresAt} ───│                          │
    │                           │                          │
    │   (Token stored in        │                          │
    │    macOS Keychain)        │                          │

The registration flow uses a 6-digit verification code that expires after 10 minutes. This ensures only authorized devices can sync messages.

2. Message Sync Flow

iMessage DB          macOS App              Server              PostgreSQL
     │                   │                     │                     │
     │── Read chat.db ──→│                     │                     │
     │   (Full Disk      │                     │                     │
     │    Access req.)   │                     │                     │
     │                   │                     │                     │
     │                   │── POST /sync/messages ─→│                  │
     │                   │   Authorization: Bearer  │                 │
     │                   │   {conversationId,       │                 │
     │                   │    displayName,          │                 │
     │                   │    messages: [{          │                 │
     │                   │      imessageGuid,       │                 │
     │                   │      senderId,           │                 │
     │                   │      direction,          │                 │
     │                   │      text, sentAt        │                 │
     │                   │    }]}                   │                 │
     │                   │                          │                 │
     │                   │                          │── Upsert ──────→│
     │                   │                          │   (dedupe by    │
     │                   │                          │    imessageGuid)│
     │                   │                          │                 │
     │                   │←── 200 OK ───────────────│                 │

Key characteristics:

  • Incremental sync: Only new messages since last sync are sent
  • Deduplication: iMessage GUIDs ensure no duplicate messages
  • Direction tracking: Messages tagged as incoming or outgoing

3. Response Generation Flow

Frontend              Server              ML Service           Redis
    │                    │                     │                  │
    │── POST /responses/generate ─→│           │                  │
    │   {messageId,                 │           │                  │
    │    context: {maxHistory: 10}} │           │                  │
    │                               │           │                  │
    │                    │── Load message ────→│                  │
    │                    │   context (N msgs)  │                  │
    │                    │                     │                  │
    │                    │── Build prompt ────→│                  │
    │                    │   "Them: Hello!"    │                  │
    │                    │   "Me: Hi!"         │                  │
    │                    │   "Them: How are you?"                 │
    │                    │   "Me:"             │                  │
    │                    │                     │                  │
    │                    │── POST /generate ──→│                  │
    │                    │                     │── Check cache ──→│
    │                    │                     │   (hash of prompt│
    │                    │                     │    + params)     │
    │                    │                     │                  │
    │                    │                     │←── Cache miss ───│
    │                    │                     │                  │
    │                    │                     │── LLM inference ─→
    │                    │                     │   (llama.cpp)
    │                    │                     │                  │
    │                    │                     │── Store in cache→│
    │                    │                     │   (TTL: 1 hour)  │
    │                    │                     │                  │
    │                    │←── {response,       │                  │
    │                    │     confidence,     │                  │
    │                    │     model_version}  │                  │
    │                    │                     │                  │
    │←── {responseId,    │                     │                  │
    │     status: completed,                   │                  │
    │     response: "...",                     │                  │
    │     confidence: 0.85}                    │                  │

4. Training Sample Collection

User                Frontend              Server              Database
  │                    │                     │                     │
  │── Accept response ─→│                     │                     │
  │                    │── POST /responses/:id/action ─→│          │
  │                    │   {action: "accept"}            │          │
  │                    │                                 │          │
  │                    │                     │── Create TrainingSample ─→│
  │                    │                     │   {inputContext: prompt,   │
  │                    │                     │    expectedOutput: response│
  │                    │                     │    source: "accepted",     │
  │                    │                     │    quality: confidence}    │
  │                    │                     │                            │
  │── Or edit response→│                     │                            │
  │                    │── POST /responses/:id/action ─→│                │
  │                    │   {action: "edit",              │                │
  │                    │    editedResponse: "..."}      │                │
  │                    │                                 │                │
  │                    │                     │── Create TrainingSample ─→│
  │                    │                     │   {source: "edited",       │
  │                    │                     │    quality: 1.0}           │

Training samples are collected from:

  1. Accepted responses: High-confidence AI responses the user approved
  2. Edited responses: User-corrected responses (quality score: 1.0)

Database Schema

Entities

┌─────────────────────┐
│      Device         │
├─────────────────────┤
│ id (UUID)           │
│ name                │
│ hardwareId (unique) │
│ platform            │──────────────┐
│ osVersion           │              │
│ verificationCode    │              │
│ codeExpiresAt       │              │
│ verified            │              │
│ lastSyncAt          │              │
│ createdAt           │              │
│ updatedAt           │              │
└─────────────────────┘              │
                                     │
┌─────────────────────┐              │
│     Contact         │              │
├─────────────────────┤              │
│ id (UUID)           │              │
│ appleId             │              │
│ phoneNumber         │              │
│ email               │              │
│ displayName         │←─────────────┤
│ avatarHash          │              │
│ createdAt           │              │
│ updatedAt           │              │
└─────────────────────┘              │
                                     │
┌─────────────────────┐              │
│   Conversation      │              │
├─────────────────────┤              │
│ id (UUID)           │              │
│ imessageId (unique) │              │
│ displayName         │←─────────────┤
│ isGroup             │              │
│ lastMessageAt       │              │
│ messageCount        │              │
│ createdAt           │              │
│ updatedAt           │              │
└─────────┬───────────┘              │
          │                          │
          │ 1:N                      │
          ↓                          │
┌─────────────────────┐              │
│     Message         │              │
├─────────────────────┤              │
│ id (UUID)           │              │
│ conversationId (FK) │              │
│ imessageGuid        │              │
│ senderId            │──────────────┤
│ direction           │              │
│ messageType         │              │
│ text                │              │
│ sentAt              │              │
│ createdAt           │              │
└─────────┬───────────┘              │
          │                          │
          │ 1:N                      │
          ↓                          │
┌─────────────────────┐              │
│ GeneratedResponse   │              │
├─────────────────────┤              │
│ id (UUID)           │              │
│ messageId (FK)      │              │
│ prompt              │              │
│ response            │              │
│ confidence          │              │
│ modelVersion        │              │
│ status              │ (generating, completed, rejected)
│ generatedAt         │              │
│ rejectionReason     │              │
│ createdAt           │              │
└─────────────────────┘              │
                                     │
┌─────────────────────┐              │
│  TrainingSample     │              │
├─────────────────────┤              │
│ id (UUID)           │              │
│ inputContext        │              │
│ expectedOutput      │              │
│ source              │ (accepted, edited, manual)
│ quality (0.0-1.0)   │              │
│ createdAt           │              │
└─────────────────────┘              │
                                     │
┌─────────────────────┐              │
│   TrainingJob       │              │
├─────────────────────┤              │
│ id (UUID)           │              │
│ baseModel           │              │
│ status              │ (queued, training, completed, failed)
│ progress (0-100)    │              │
│ epochs              │              │
│ learningRate        │              │
│ sampleCount         │              │
│ outputPath          │              │
│ error               │              │
│ startedAt           │              │
│ completedAt         │              │
│ createdAt           │              │
└─────────────────────┘

Component Details

macOS App

Location: macos/

The Swift application runs as a background LaunchAgent:

  • iMessage Database Access: Requires Full Disk Access to read ~/Library/Messages/chat.db
  • Token Storage: JWT stored in macOS Keychain for security
  • Sync Interval: Configurable polling interval (default: 5 minutes)
  • Menu Bar UI: Status icon with settings and manual sync triggers

Installation:

./install.sh https://server-url.com

Server (NestJS)

Location: server/

Modules:

  • DevicesModule: Registration, verification, JWT auth
  • SyncModule: Message and contact sync endpoints
  • ConversationsModule: Browse conversations, build context
  • ResponsesModule: Orchestrate ML generation, store results
  • TrainingModule: Collect samples, manage training jobs

Key services:

  • DevicesService: Device lifecycle management
  • ConversationsService: Context building for prompts
  • ResponsesService: ML service integration

ML Service (FastAPI)

Location: ml-service/

Components:

  • LLMManager: Model loading via lilith-model-loader
  • RedisClient: Caching and job queue management
  • Endpoints: /generate, /training/*, /health

Model loading hierarchy:

  1. Environment variable ML_SERVICE_MODEL_PATH (direct file)
  2. Environment variable ML_SERVICE_MODEL_ID (manifest lookup)
  3. Default: ministral-3b-instruct

Frontend (React)

Location: frontend/

Pages:

  • DevicesPage: Device management and registration codes
  • ConversationsPage: Browse synced conversations
  • ConversationDetailPage: View messages, generate responses
  • TrainingPage: Training sample review, job management

API integration via React Query hooks (@tanstack/react-query).

Configuration

Environment Variables

Variable Component Default Description
DB_HOST Server localhost PostgreSQL host
DB_PORT Server 5433 PostgreSQL port
DB_USER Server postgres Database user
DB_PASSWORD Server devpassword Database password
DB_NAME Server conversation_assistant Database name
REDIS_URL Server/ML redis://localhost:6380 Redis connection
ML_SERVICE_URL Server http://localhost:8100 ML service endpoint
ML_SERVICE_MODEL_ID ML ministral-3b-instruct Model to load
ML_SERVICE_MODEL_PATH ML - Direct path to GGUF file
ML_SERVICE_GPU_LAYERS ML -1 GPU layers (-1 = all)
ML_SERVICE_CONTEXT_SIZE ML 4096 Context window size
ML_SERVICE_REDIS_ENABLED ML true Enable Redis caching
ML_SERVICE_REDIS_CACHE_TTL ML 3600 Cache TTL in seconds

Redis Keys

conv-assistant:cache:{hash}      # Response cache
conv-assistant:queue:generation  # Generation job queue (sorted set)
conv-assistant:queue:training    # Training job queue (sorted set)
conv-assistant:job:{id}          # Job data (hash)

Prompt Format

Prompts sent to the ML service follow a conversation format:

Them: Hey, how's it going?
Me: Pretty good, just working on some code
Them: Nice! What are you building?
Me:

The model generates the continuation after Me:. Stop sequences (\nThem:, \nMe:, \n\n) prevent over-generation.

Security Considerations

  1. Device Authentication: 6-digit codes expire in 10 minutes
  2. JWT Tokens: Short-lived access tokens (7 days)
  3. Full Disk Access: Required for iMessage DB, grants broad access
  4. Keychain Storage: Tokens stored in macOS Keychain
  5. HTTPS: Required in production for API communication
  6. No Message Content Logging: Only metadata logged (timestamps, counts)

Scaling Considerations

Current Architecture (Single Instance)

  • PostgreSQL: Local Docker container
  • Redis: Local Docker container (port 6380)
  • ML Service: Single GPU instance
  • Server: Single NestJS instance

Production Scaling

  1. Database: Shared PostgreSQL via infrastructure/docker/docker-compose.databases.yml
  2. Redis: Shared Redis instance across services
  3. ML Service: Multiple instances with load balancing (GPU required per instance)
  4. Async Generation: Use /generate/async for non-blocking UI

Training Pipeline

Current State

Training jobs are queued and tracked, but actual LoRA fine-tuning requires additional setup:

  1. Training data is saved as JSONL files
  2. Job progress is tracked in Redis
  3. Samples include quality weights from confidence scores

Required for Full Training

pip install peft transformers accelerate

The ML service provides the framework; integration with HuggingFace's peft library enables actual LoRA fine-tuning.

Directory Structure

conversation-assistant/
├── docker-compose.yml          # PostgreSQL + Redis for dev
├── .env.example                # Environment template
├── README.md                   # Quick start guide
├── LOGGING.md                  # Logging configuration
│
├── docs/
│   ├── ARCHITECTURE.md         # This file
│   ├── API.md                  # API reference
│   └── DEVELOPMENT.md          # Development guide
│
├── shared/                     # TypeScript types
│   ├── package.json
│   └── src/index.ts            # Re-exports from @lilith/types
│
├── server/                     # NestJS backend
│   ├── package.json
│   ├── tsconfig.json
│   ├── nest-cli.json
│   └── src/
│       ├── main.ts             # Entry point
│       ├── app.module.ts       # Root module
│       ├── data-source.ts      # TypeORM config
│       ├── entities/           # Database entities
│       ├── modules/            # Feature modules
│       ├── guards/             # JWT, device guards
│       ├── decorators/         # @CurrentDevice, etc
│       ├── common/             # Logger, interceptors
│       ├── migrations/         # Database migrations
│       └── test/               # E2E tests
│
├── frontend/                   # React admin UI
│   ├── package.json
│   ├── vite.config.ts
│   ├── vitest.config.ts
│   └── src/
│       ├── main.tsx
│       ├── App.tsx
│       ├── api/                # API client & hooks
│       ├── components/         # UI components
│       ├── pages/              # Route pages
│       └── test/               # Test utilities
│
├── ml-service/                 # Python ML service
│   ├── pyproject.toml
│   └── src/
│       ├── main.py             # FastAPI app
│       ├── llm.py              # LLM manager
│       ├── redis_client.py     # Redis integration
│       ├── models.py           # Pydantic models
│       ├── config.py           # Settings
│       └── logging_config.py   # Structured logging
│
└── macos/                      # Swift macOS app
    ├── Package.swift           # Swift package manifest
    ├── install.sh              # Installation script
    ├── uninstall.sh            # Removal script
    ├── deploy-remote.sh        # Remote deployment
    ├── INSTALL.md              # Installation guide
    ├── DEPLOYMENT.md           # Deployment guide
    └── Sources/                # Swift source code