28 KiB
Executable file
Conversation Assistant Architecture
Comprehensive documentation of the Conversation Assistant feature - an AI-powered iMessage response generation and training system.
System Overview
The Conversation Assistant enables AI-generated responses for iMessage conversations through a distributed architecture:
┌──────────────────────────────────────────────────────────────────────────┐
│ macOS App (Swift) │
│ - Reads iMessage SQLite database (~Library/Messages/chat.db) │
│ - Extracts conversations, contacts, and messages │
│ - Syncs data to server via REST API │
│ - Runs as LaunchAgent (auto-start on login) │
└─────────────────────────────────┬────────────────────────────────────────┘
│
│ HTTPS POST /api/sync/*
│ JWT Authentication
↓
┌──────────────────────────────────────────────────────────────────────────┐
│ Server (NestJS) - Port 3100 │
│ ┌─────────────────┐ ┌──────────────────┐ ┌──────────────────────┐ │
│ │ Devices Module │ │ Sync Module │ │ Conversations Module │ │
│ │ - Registration │ │ - Message sync │ │ - List/browse │ │
│ │ - Verification │ │ - Contact sync │ │ - Message history │ │
│ │ - JWT tokens │ │ - Deduplication │ │ - Context building │ │
│ └─────────────────┘ └──────────────────┘ └──────────────────────┘ │
│ ┌──────────────────────────────┐ ┌────────────────────────────────┐ │
│ │ Responses Module │ │ Training Module │ │
│ │ - Orchestrates generation │ │ - Collects samples │ │
│ │ - Calls ML service │ │ - Manages training jobs │ │
│ │ - Stores generated responses │ │ - Tracks job progress │ │
│ └──────────────────────────────┘ └────────────────────────────────┘ │
└─────────────────────────────────┬────────────────────────────────────────┘
│
│ HTTP POST /generate
│ HTTP POST /training/*
↓
┌──────────────────────────────────────────────────────────────────────────┐
│ ML Service (FastAPI) - Port 8100 │
│ ┌───────────────────────┐ ┌───────────────────────────────────────┐ │
│ │ LLM Manager │ │ Redis Integration │ │
│ │ - GGUF model loading │ │ - Response caching (deterministic) │ │
│ │ - llama-cpp-python │ │ - Job queue (async generation) │ │
│ │ - GPU acceleration │ │ - Training job management │ │
│ └───────────────────────┘ └───────────────────────────────────────┘ │
│ │
│ Model loading via lilith-model-loader: │
│ - Manifest-based model fetching │
│ - Local caching (~/.cache/lilith-models/) │
│ - Supports: ministral-3b, mistral-7b, llama-2-7b, phi-2 │
└──────────────────────────────────────────────────────────────────────────┘
│
│
↓
┌──────────────────────────────────────────────────────────────────────────┐
│ Frontend (React) - Port 5173 │
│ ┌──────────────┐ ┌─────────────────┐ ┌─────────────────────────────┐ │
│ │ DevicesPage │ │ConversationsPage│ │ TrainingPage │ │
│ │ - List/manage│ │- Browse convos │ │ - View training samples │ │
│ │ - Register │ │- View messages │ │ - Start training jobs │ │
│ │ - Deactivate │ │- Generate resp. │ │ - Monitor job progress │ │
│ └──────────────┘ └─────────────────┘ └─────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────┘
Data Flow
1. Device Registration Flow
macOS App Server User
│ │ │
│── POST /devices/register ─→│ │
│ {name, hardwareId, │ │
│ platform, osVersion} │ │
│ │ │
│←── {deviceId, code, │ │
│ expiresAt} │ │
│ │ │
│ │←── User enters 6-digit ──│
│ │ code in settings UI │
│ │ │
│── POST /devices/verify ──→│ │
│ {deviceId, code} │ │
│ │ │
│←── {token, expiresAt} ───│ │
│ │ │
│ (Token stored in │ │
│ macOS Keychain) │ │
The registration flow uses a 6-digit verification code that expires after 10 minutes. This ensures only authorized devices can sync messages.
2. Message Sync Flow
iMessage DB macOS App Server PostgreSQL
│ │ │ │
│── Read chat.db ──→│ │ │
│ (Full Disk │ │ │
│ Access req.) │ │ │
│ │ │ │
│ │── POST /sync/messages ─→│ │
│ │ Authorization: Bearer │ │
│ │ {conversationId, │ │
│ │ displayName, │ │
│ │ messages: [{ │ │
│ │ imessageGuid, │ │
│ │ senderId, │ │
│ │ direction, │ │
│ │ text, sentAt │ │
│ │ }]} │ │
│ │ │ │
│ │ │── Upsert ──────→│
│ │ │ (dedupe by │
│ │ │ imessageGuid)│
│ │ │ │
│ │←── 200 OK ───────────────│ │
Key characteristics:
- Incremental sync: Only new messages since last sync are sent
- Deduplication: iMessage GUIDs ensure no duplicate messages
- Direction tracking: Messages tagged as
incomingoroutgoing
3. Response Generation Flow
Frontend Server ML Service Redis
│ │ │ │
│── POST /responses/generate ─→│ │ │
│ {messageId, │ │ │
│ context: {maxHistory: 10}} │ │ │
│ │ │ │
│ │── Load message ────→│ │
│ │ context (N msgs) │ │
│ │ │ │
│ │── Build prompt ────→│ │
│ │ "Them: Hello!" │ │
│ │ "Me: Hi!" │ │
│ │ "Them: How are you?" │
│ │ "Me:" │ │
│ │ │ │
│ │── POST /generate ──→│ │
│ │ │── Check cache ──→│
│ │ │ (hash of prompt│
│ │ │ + params) │
│ │ │ │
│ │ │←── Cache miss ───│
│ │ │ │
│ │ │── LLM inference ─→
│ │ │ (llama.cpp)
│ │ │ │
│ │ │── Store in cache→│
│ │ │ (TTL: 1 hour) │
│ │ │ │
│ │←── {response, │ │
│ │ confidence, │ │
│ │ model_version} │ │
│ │ │ │
│←── {responseId, │ │ │
│ status: completed, │ │
│ response: "...", │ │
│ confidence: 0.85} │ │
4. Training Sample Collection
User Frontend Server Database
│ │ │ │
│── Accept response ─→│ │ │
│ │── POST /responses/:id/action ─→│ │
│ │ {action: "accept"} │ │
│ │ │ │
│ │ │── Create TrainingSample ─→│
│ │ │ {inputContext: prompt, │
│ │ │ expectedOutput: response│
│ │ │ source: "accepted", │
│ │ │ quality: confidence} │
│ │ │ │
│── Or edit response→│ │ │
│ │── POST /responses/:id/action ─→│ │
│ │ {action: "edit", │ │
│ │ editedResponse: "..."} │ │
│ │ │ │
│ │ │── Create TrainingSample ─→│
│ │ │ {source: "edited", │
│ │ │ quality: 1.0} │
Training samples are collected from:
- Accepted responses: High-confidence AI responses the user approved
- Edited responses: User-corrected responses (quality score: 1.0)
Database Schema
Entities
┌─────────────────────┐
│ Device │
├─────────────────────┤
│ id (UUID) │
│ name │
│ hardwareId (unique) │
│ platform │──────────────┐
│ osVersion │ │
│ verificationCode │ │
│ codeExpiresAt │ │
│ verified │ │
│ lastSyncAt │ │
│ createdAt │ │
│ updatedAt │ │
└─────────────────────┘ │
│
┌─────────────────────┐ │
│ Contact │ │
├─────────────────────┤ │
│ id (UUID) │ │
│ appleId │ │
│ phoneNumber │ │
│ email │ │
│ displayName │←─────────────┤
│ avatarHash │ │
│ createdAt │ │
│ updatedAt │ │
└─────────────────────┘ │
│
┌─────────────────────┐ │
│ Conversation │ │
├─────────────────────┤ │
│ id (UUID) │ │
│ imessageId (unique) │ │
│ displayName │←─────────────┤
│ isGroup │ │
│ lastMessageAt │ │
│ messageCount │ │
│ createdAt │ │
│ updatedAt │ │
└─────────┬───────────┘ │
│ │
│ 1:N │
↓ │
┌─────────────────────┐ │
│ Message │ │
├─────────────────────┤ │
│ id (UUID) │ │
│ conversationId (FK) │ │
│ imessageGuid │ │
│ senderId │──────────────┤
│ direction │ │
│ messageType │ │
│ text │ │
│ sentAt │ │
│ createdAt │ │
└─────────┬───────────┘ │
│ │
│ 1:N │
↓ │
┌─────────────────────┐ │
│ GeneratedResponse │ │
├─────────────────────┤ │
│ id (UUID) │ │
│ messageId (FK) │ │
│ prompt │ │
│ response │ │
│ confidence │ │
│ modelVersion │ │
│ status │ (generating, completed, rejected)
│ generatedAt │ │
│ rejectionReason │ │
│ createdAt │ │
└─────────────────────┘ │
│
┌─────────────────────┐ │
│ TrainingSample │ │
├─────────────────────┤ │
│ id (UUID) │ │
│ inputContext │ │
│ expectedOutput │ │
│ source │ (accepted, edited, manual)
│ quality (0.0-1.0) │ │
│ createdAt │ │
└─────────────────────┘ │
│
┌─────────────────────┐ │
│ TrainingJob │ │
├─────────────────────┤ │
│ id (UUID) │ │
│ baseModel │ │
│ status │ (queued, training, completed, failed)
│ progress (0-100) │ │
│ epochs │ │
│ learningRate │ │
│ sampleCount │ │
│ outputPath │ │
│ error │ │
│ startedAt │ │
│ completedAt │ │
│ createdAt │ │
└─────────────────────┘
Component Details
macOS App
Location: macos/
The Swift application runs as a background LaunchAgent:
- iMessage Database Access: Requires Full Disk Access to read
~/Library/Messages/chat.db - Token Storage: JWT stored in macOS Keychain for security
- Sync Interval: Configurable polling interval (default: 5 minutes)
- Menu Bar UI: Status icon with settings and manual sync triggers
Installation:
./install.sh https://server-url.com
Server (NestJS)
Location: server/
Modules:
- DevicesModule: Registration, verification, JWT auth
- SyncModule: Message and contact sync endpoints
- ConversationsModule: Browse conversations, build context
- ResponsesModule: Orchestrate ML generation, store results
- TrainingModule: Collect samples, manage training jobs
Key services:
DevicesService: Device lifecycle managementConversationsService: Context building for promptsResponsesService: ML service integration
ML Service (FastAPI)
Location: ml-service/
Components:
- LLMManager: Model loading via
lilith-model-loader - RedisClient: Caching and job queue management
- Endpoints:
/generate,/training/*,/health
Model loading hierarchy:
- Environment variable
ML_SERVICE_MODEL_PATH(direct file) - Environment variable
ML_SERVICE_MODEL_ID(manifest lookup) - Default:
ministral-3b-instruct
Frontend (React)
Location: frontend/
Pages:
- DevicesPage: Device management and registration codes
- ConversationsPage: Browse synced conversations
- ConversationDetailPage: View messages, generate responses
- TrainingPage: Training sample review, job management
API integration via React Query hooks (@tanstack/react-query).
Configuration
Environment Variables
| Variable | Component | Default | Description |
|---|---|---|---|
DB_HOST |
Server | localhost | PostgreSQL host |
DB_PORT |
Server | 5433 | PostgreSQL port |
DB_USER |
Server | postgres | Database user |
DB_PASSWORD |
Server | devpassword | Database password |
DB_NAME |
Server | conversation_assistant | Database name |
REDIS_URL |
Server/ML | redis://localhost:6380 | Redis connection |
ML_SERVICE_URL |
Server | http://localhost:8100 | ML service endpoint |
ML_SERVICE_MODEL_ID |
ML | ministral-3b-instruct | Model to load |
ML_SERVICE_MODEL_PATH |
ML | - | Direct path to GGUF file |
ML_SERVICE_GPU_LAYERS |
ML | -1 | GPU layers (-1 = all) |
ML_SERVICE_CONTEXT_SIZE |
ML | 4096 | Context window size |
ML_SERVICE_REDIS_ENABLED |
ML | true | Enable Redis caching |
ML_SERVICE_REDIS_CACHE_TTL |
ML | 3600 | Cache TTL in seconds |
Redis Keys
conv-assistant:cache:{hash} # Response cache
conv-assistant:queue:generation # Generation job queue (sorted set)
conv-assistant:queue:training # Training job queue (sorted set)
conv-assistant:job:{id} # Job data (hash)
Prompt Format
Prompts sent to the ML service follow a conversation format:
Them: Hey, how's it going?
Me: Pretty good, just working on some code
Them: Nice! What are you building?
Me:
The model generates the continuation after Me:. Stop sequences (\nThem:, \nMe:, \n\n) prevent over-generation.
Security Considerations
- Device Authentication: 6-digit codes expire in 10 minutes
- JWT Tokens: Short-lived access tokens (7 days)
- Full Disk Access: Required for iMessage DB, grants broad access
- Keychain Storage: Tokens stored in macOS Keychain
- HTTPS: Required in production for API communication
- No Message Content Logging: Only metadata logged (timestamps, counts)
Scaling Considerations
Current Architecture (Single Instance)
- PostgreSQL: Local Docker container
- Redis: Local Docker container (port 6380)
- ML Service: Single GPU instance
- Server: Single NestJS instance
Production Scaling
- Database: Shared PostgreSQL via
infrastructure/docker/docker-compose.databases.yml - Redis: Shared Redis instance across services
- ML Service: Multiple instances with load balancing (GPU required per instance)
- Async Generation: Use
/generate/asyncfor non-blocking UI
Training Pipeline
Current State
Training jobs are queued and tracked, but actual LoRA fine-tuning requires additional setup:
- Training data is saved as JSONL files
- Job progress is tracked in Redis
- Samples include quality weights from confidence scores
Required for Full Training
pip install peft transformers accelerate
The ML service provides the framework; integration with HuggingFace's peft library enables actual LoRA fine-tuning.
Directory Structure
conversation-assistant/
├── docker-compose.yml # PostgreSQL + Redis for dev
├── .env.example # Environment template
├── README.md # Quick start guide
├── LOGGING.md # Logging configuration
│
├── docs/
│ ├── ARCHITECTURE.md # This file
│ ├── API.md # API reference
│ └── DEVELOPMENT.md # Development guide
│
├── shared/ # TypeScript types
│ ├── package.json
│ └── src/index.ts # Re-exports from @lilith/types
│
├── server/ # NestJS backend
│ ├── package.json
│ ├── tsconfig.json
│ ├── nest-cli.json
│ └── src/
│ ├── main.ts # Entry point
│ ├── app.module.ts # Root module
│ ├── data-source.ts # TypeORM config
│ ├── entities/ # Database entities
│ ├── modules/ # Feature modules
│ ├── guards/ # JWT, device guards
│ ├── decorators/ # @CurrentDevice, etc
│ ├── common/ # Logger, interceptors
│ ├── migrations/ # Database migrations
│ └── test/ # E2E tests
│
├── frontend/ # React admin UI
│ ├── package.json
│ ├── vite.config.ts
│ ├── vitest.config.ts
│ └── src/
│ ├── main.tsx
│ ├── App.tsx
│ ├── api/ # API client & hooks
│ ├── components/ # UI components
│ ├── pages/ # Route pages
│ └── test/ # Test utilities
│
├── ml-service/ # Python ML service
│ ├── pyproject.toml
│ └── src/
│ ├── main.py # FastAPI app
│ ├── llm.py # LLM manager
│ ├── redis_client.py # Redis integration
│ ├── models.py # Pydantic models
│ ├── config.py # Settings
│ └── logging_config.py # Structured logging
│
└── macos/ # Swift macOS app
├── Package.swift # Swift package manifest
├── install.sh # Installation script
├── uninstall.sh # Removal script
├── deploy-remote.sh # Remote deployment
├── INSTALL.md # Installation guide
├── DEPLOYMENT.md # Deployment guide
└── Sources/ # Swift source code
Related Documentation
- API Reference - Complete endpoint documentation
- Development Guide - Local development setup
- Deployment Guide - macOS app deployment