platform-codebase/features/conversation-assistant/docs
..
API.md chore(conversation-assistant): 🔧 Add scam/freeloader pattern detection signals, update docs, and expand E2E testing 2026-01-18 09:20:35 -08:00
ARCHITECTURE.md
DEVELOPMENT.md
HOW_IT_WORKS.md
README.md

AI-Powered iMessage Response Generator with Self-Hosted ML

Automated iMessage response generation using self-hosted LLMs to save provider time and improve response quality

Quick Facts

Metric Value
Business Impact Cost reducer — Saves $800/month per provider in AI API costs
Primary Users Providers
Status Production
Dependencies None (standalone feature)

Overview

The Conversation Assistant is a distributed AI-powered system that syncs iMessage conversations from macOS devices and generates contextually appropriate response suggestions using self-hosted language models. It eliminates the time burden of responding to repetitive client inquiries while maintaining the provider's authentic voice through continuous learning from feedback.

This feature is transformative for provider productivity - providers spend 3-5 hours daily responding to client messages. Automating even 30% of responses saves 90-150 minutes per day, directly increasing earning capacity. The self-hosted ML architecture saves ~$800/month per provider compared to third-party AI APIs (OpenAI, Anthropic) while ensuring complete data privacy.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                   CONVERSATION ASSISTANT SYSTEM                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌──────────────────┐         ┌─────────────────────────────┐  │
│  │  macOS Agent     │         │  Backend API (NestJS)       │  │
│  │  (Swift)         │────────→│  Port: 3100                 │  │
│  │                  │  HTTPS  │                             │  │
│  │  - iMessage DB   │  +JWT   │  - Device registration      │  │
│  │    reader        │←────────│  - Message sync             │  │
│  │  - Background    │         │  - Conversation browsing    │  │
│  │    sync (5min)   │         │  - Response orchestration   │  │
│  │  - Keychain auth │         │  - Training sample mgmt     │  │
│  └──────────────────┘         └─────────────────────────────┘  │
│           │                              │          │          │
│           │                              ↓          ↓          │
│           │                   ┌──────────────┐  ┌──────────┐  │
│           │                   │ PostgreSQL   │  │ Redis    │  │
│           │                   │ Port: 25433  │  │ Port:    │  │
│           │                   │              │  │ 26380    │  │
│           │                   │ - devices    │  │          │  │
│           │                   │ - contacts   │  │ - cache  │  │
│           │                   │ - conversa   │  │ - queues │  │
│           │                   │   tions      │  │ - job    │  │
│           │                   │ - messages   │  │   mgmt   │  │
│           │                   │ - generated  │  └──────────┘  │
│           │                   │   responses  │                │
│           │                   │ - training   │                │
│           │                   │   samples    │                │
│           │                   └──────────────┘                │
│           │                              │                    │
│           │                              ↓                    │
│           │                   ┌─────────────────────────────┐ │
│           │                   │  ML Service (FastAPI)       │ │
│           │                   │  Port: 8100                 │ │
│           │                   │                             │ │
│           │                   │  - LLM Manager (llama-cpp)  │ │
│           │                   │  - Model loader (GGUF)      │ │
│           │                   │  - GPU acceleration         │ │
│           │                   │  - Redis caching            │ │
│           │                   │  - Training job mgmt        │ │
│           │                   │                             │ │
│           │                   │  Models:                    │ │
│           │                   │  - ministral-3b (default)   │ │
│           │                   │  - mistral-7b               │ │
│           │                   │  - llama-2-7b-chat          │ │
│           │                   │  - phi-2                    │ │
│           │                   └─────────────────────────────┘ │
│           │                                                   │
│           └──→ Web Dashboard (React, Port: 5173)              │
│                - Browse conversations                          │
│                - Generate responses                            │
│                - Accept/Edit/Reject feedback                   │
│                - Training job monitoring                       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Key Capabilities

  • Automated Message Sync: macOS agent reads iMessage database (~/Library/Messages/chat.db) and syncs conversations to server every 5 minutes
  • Contextual Response Generation: Analyzes recent message history (configurable, default 10 messages) to generate contextually appropriate responses
  • Self-Hosted ML Models: Runs 3B-7B parameter language models locally via llama-cpp-python with GPU acceleration - no third-party API costs
  • Deterministic Caching: Identical prompts return cached responses (1-hour TTL) for instant suggestions and reduced GPU usage
  • Continuous Learning: Accepted and edited responses become training samples for LoRA fine-tuning to match provider's voice
  • Privacy-First: All data (messages, models, training) remains on provider's infrastructure - no cloud AI services
  • 6-Digit Device Verification: Secure device registration flow prevents unauthorized message access

Components

Component Port Technology Location Purpose
macos-agent N/A Swift 5.9 + Alamofire codebase/features/conversation-assistant/macos/ iMessage database reader, background sync daemon
backend-api 3100 NestJS + PostgreSQL codebase/features/conversation-assistant/backend-api/ Device auth, message sync, response orchestration
ml-service 8100 FastAPI + llama-cpp-python codebase/features/conversation-assistant/ml-service/ LLM inference, training job management, Redis caching
frontend-dev 5173 React + Vite codebase/features/conversation-assistant/frontend-dev/ Conversation browsing, response generation UI, training dashboard
postgresql 25433 PostgreSQL 16 N/A Messages, contacts, generated responses, training samples
redis 26380 Redis 7 N/A Response caching (deterministic), job queues (BullMQ)

Note: Use @lilith/service-registry to resolve service URLs.

Dependencies

Internal Dependencies

Packages:

  • @lilith/service-nestjs-bootstrap (^2.2.3) - Standard NestJS bootstrap
  • @lilith/service-registry (^1.3.0) - Service URL resolution
  • @lilith/types (*) - Shared TypeScript types for message/response schemas

Features:

  • None - standalone feature

Infrastructure:

  • PostgreSQL database (message history, training data)
  • Redis (caching, job queues)
  • GPU (optional, for faster inference - falls back to CPU)

External Dependencies

  • macOS Full Disk Access: Required for reading iMessage database (~Library/Messages/chat.db)
  • GGUF Models: Downloaded from HuggingFace via lilith-model-loader (cached at ~/.cache/lilith-models/)
  • llama-cpp-python: CPU/GPU-accelerated LLM inference library

Business Value

Revenue Impact

  • Time Savings: Providers save 90-150 minutes/day on repetitive responses → reinvest time in higher-value client interactions or additional bookings
  • Response Quality: AI-generated responses maintain consistent tone and professionalism, reducing client ghosting rates
  • Competitive Edge: Faster response times improve client satisfaction and booking conversion rates

Cost Savings

  • No Third-Party AI Costs: Self-hosted models eliminate $800/month per provider in OpenAI/Anthropic API fees
  • GPU Efficiency: Caching reduces duplicate inference - typical savings of ~70% GPU compute vs. uncached
  • Training Data Ownership: All training samples remain on-premises, no data licensing fees

Competitive Moat

  • Self-Hosted ML: Competitors rely on OpenAI/Anthropic APIs - cost structure makes self-hosting prohibitive for them at scale
  • Continuous Learning: LoRA fine-tuning on provider-specific data creates personalized models that improve over time
  • Privacy Guarantee: No message data leaves provider's infrastructure - critical trust differentiator

Risk Mitigation

  • Data Privacy: iMessage content never sent to third-party APIs - GDPR/privacy-first
  • No Cloud Vendor Lock-In: Model inference runs locally, no dependency on OpenAI/Anthropic availability or pricing changes
  • Audit Trail: All generated responses stored with timestamps, confidence scores, and user feedback for quality monitoring

API / Integration

REST Endpoints

# Device Management
POST   /api/devices/register    - Register new macOS device (returns 6-digit code)
POST   /api/devices/verify      - Verify device with code (returns JWT token)
GET    /api/devices             - List registered devices
DELETE /api/devices/:id         - Deactivate device

# Message Sync
POST   /api/sync/messages       - Sync messages from macOS agent (JWT auth)
POST   /api/sync/contacts       - Sync contacts from macOS agent

# Conversations
GET    /api/conversations       - List synced conversations
GET    /api/conversations/:id   - Get conversation with message history

# Response Generation
POST   /api/responses/generate  - Generate response for message (payload: {messageId, context: {maxHistory: 10}})
POST   /api/responses/:id/action - Accept/edit/reject response (payload: {action, editedResponse?})
GET    /api/responses/:id       - Get response details

# Training
GET    /api/training/samples    - List training samples
POST   /api/training/jobs       - Start training job (payload: {baseModel, epochs, learningRate})
GET    /api/training/jobs/:id   - Get training job status

ML Service Endpoints

POST   /generate                - Generate response from prompt (payload: {prompt, max_tokens, temperature})
POST   /generate/async          - Queue async generation job (returns job_id)
GET    /generate/job/:id        - Get async job status
POST   /training/start          - Start LoRA fine-tuning job
GET    /training/:id/progress   - Get training progress
GET    /health                  - Model load status, GPU availability

Configuration

Environment Variables

# Backend API
CONVERSATION_API_PORT=3100
DATABASE_POSTGRES_USER=lilith
DATABASE_POSTGRES_PASSWORD=<from vault>
DATABASE_POSTGRES_NAME=conversation_assistant
REDIS_URL=redis://localhost:26380
ML_SERVICE_URL=http://localhost:8100
JWT_SECRET=<from vault>
JWT_EXPIRES_IN=7d

# ML Service
ML_SERVICE_PORT=8100
ML_SERVICE_MODEL_ID=ministral-3b-instruct   # Or: mistral-7b, llama-2-7b-chat, phi-2
ML_SERVICE_MODEL_PATH=<optional direct path to .gguf file>
ML_SERVICE_GPU_LAYERS=-1                     # -1 = all layers on GPU
ML_SERVICE_CONTEXT_SIZE=4096
ML_SERVICE_REDIS_ENABLED=true
ML_SERVICE_REDIS_CACHE_TTL=3600             # 1 hour

Service Registry

Port definitions in codebase/@packages/@config/src/ports.generated.ts:

features.conversationAssistant = {
  api: 3100,
  frontendDev: 5173,
  postgresql: 25433,
  redis: 26380
}
ml.conversationMl = 8100

Development

Local Setup

# Start infrastructure
./run dev:infra

# Start ML service (requires GPU for optimal performance)
cd ml-service
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
uvicorn src.main:app --host 0.0.0.0 --port 8100

# Start backend API
cd backend-api
bun install && bun run dev

# Start frontend
cd frontend-dev
bun install && bun run dev

# Install macOS agent (on Mac only)
cd macos
./install.sh http://localhost:3100

Running Tests

# Backend E2E tests
cd backend-api && bun run test:e2e

# ML service tests
cd ml-service && pytest

# Frontend tests
cd frontend-dev && bun run test

Building

# Backend (NestJS + SWC)
cd backend-api && bun run build

# Frontend (Vite)
cd frontend-dev && bun run build

# macOS agent (Swift)
cd macos && make build

Prompt Format

Prompts follow a conversation format with role labels:

Them: Hey, how's it going?
Me: Pretty good, just working on some code
Them: Nice! What are you building?
Me:

The model generates the continuation after Me:. Stop sequences (\nThem:, \nMe:, \n\n) prevent over-generation.

Training Pipeline

Current State

Training jobs are queued and tracked. Training data is saved as JSONL files with quality weights:

{"input": "Them: Are you available tonight?\nMe:", "output": "Sorry, I'm fully booked tonight. I have availability tomorrow evening if that works?", "quality": 1.0}

Training Sample Sources

  1. Accepted responses: High-confidence AI responses approved by user (quality = confidence score)
  2. Edited responses: User-corrected responses (quality = 1.0, highest value)
  3. Manual samples: User-created examples (quality = 1.0)

LoRA Fine-Tuning

Integration with HuggingFace peft library enables LoRA fine-tuning:

  • Adapter layers learn provider-specific patterns
  • Base model remains frozen
  • Training completes in ~30-60 minutes on consumer GPU

Security Considerations

  1. 6-Digit Verification Codes: Expire in 10 minutes, prevent unauthorized device registration
  2. JWT Tokens: Short-lived access tokens (7 days), stored in macOS Keychain
  3. Full Disk Access: Required for iMessage DB, grants broad access - users must explicitly approve
  4. HTTPS Required: All production API communication encrypted
  5. No Message Logging: Only metadata (timestamps, counts) logged - message content never written to logs
  6. Self-Hosted Models: No message data sent to third-party APIs
  • ARCHITECTURE.md: Detailed system architecture and data flows
  • HOW_IT_WORKS.md: Non-technical explanation for end users
  • API.md: Complete API reference
  • macos/INSTALL.md: macOS agent installation guide
  • macos/DEPLOYMENT.md: Remote deployment guide
  • ml-service/docs/LOCATION_VERIFICATION.md: Location verification feature (bonus capability)

2-Line Summary for Whitepaper

Conversation Assistant: Distributed AI system syncing iMessage conversations from macOS and generating contextually appropriate responses using self-hosted 3B-7B parameter language models with GPU acceleration, deterministic caching, and LoRA fine-tuning for personalization. Investor Value: Cost reducer — Saves $800/month per provider in third-party AI API costs while reclaiming 90-150 minutes daily through automated response generation, with privacy-first architecture ensuring no message data leaves provider infrastructure.


Template Version: 1.1.0 Last Updated: 2026-02-06 Author: Lilith Platform Team