27 KiB
Kthulu Architecture
Local-first AI coding agent with GPU lease management, self-improving training loop, and MCP-compatible tool system.
Last updated: 2026-03-10
System Overview
Kthulu is a feature-sliced monorepo comprising 16 packages across 4 layers: user interfaces (3 apps), agent infrastructure (2 core packages), tool system (8 tool packages), and training pipeline (3 packages). All LLM inference runs locally via model-boss GPU leases — zero cloud dependency.
The core agent loop (ConversationLoop, planning, reflection, skills, hooks, instructions, memory, worktrees, background agents, checkpoints, context management) is provided by @lilith/ml-agent-loop, a shared library in the @lilith package ecosystem. Kthulu re-exports this via @kthulu/agent-core and adds project-specific modules: ContextBuilder, RepoMapBuilder, RollbackManager, SessionTrainingCollector, FlightRecorder, and multimodal support.
Infrastructure Diagram
┌─────────────────────────────────────────────────────────────────────┐
│ USER INTERFACES │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ CLI (REPL) │ │ API (NestJS) │ │ Web Dashboard │ │
│ │ :terminal │ │ :3780 │ │ (React) :3781 │ │
│ │ │ │ │ │ │ │
│ │ • StreamRender│ │ • Sessions │ │ • Session analytics │ │
│ │ • ReplSession │ │ • Analytics │ │ • Tool usage charts │ │
│ │ • HeadlessRun │ │ • Health │ │ • Quality feedback │ │
│ │ • MarkdownRend│ │ • Model proxy │ │ • Training metrics │ │
│ │ • PlanRenderer│ │ │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────┬───────────┘ │
│ │ │ │ │
└─────────┼───────────────────┼───────────────────────┼────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────┐
│ @kthulu/agent-core │
│ (re-exports @lilith/ml-agent-loop) │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ ConversationLoop │ │
│ │ • Message handling • Tool dispatch • Mode switching │ │
│ │ • Turn management • Error recovery • Budget enforcement │ │
│ └────────────────────────────┬────────────────────────────────┘ │
│ │ │
│ ┌────────────┐ ┌────────────┼────────────┐ ┌─────────────────┐ │
│ │ Planning │ │ Context │ Management │ │ Agent System │ │
│ │ │ │ │ │ │ │ │
│ │ • Planner │ │ • ContextManager │ │ • SubAgentSpawn │ │
│ │ • PlanStep │ │ • ContextSummarizer │ │ • AgentTeam │ │
│ │ • Reflector │ │ • ContextCache (TTL) │ │ • TeamConfig │ │
│ │ • Complexity│ │ • ContextBuilder │ │ • BackgroundTrk │ │
│ │ Analyzer │ │ • InstructionLoader* │ │ • WorktreeMgr* │ │
│ │ • SelfVerify│ │ • MemoryManager* │ │ • AgentDefLoad* │ │
│ └────────────┘ └─────────────────────────┘ └─────────────────┘ │
│ │
│ ┌────────────┐ ┌────────────────┐ ┌──────────────────────────┐ │
│ │ Skills* │ │ Hooks* │ │ Persistence │ │
│ │ │ │ │ │ │ │
│ │ • SkillLoad │ │ • HookRegistry │ │ • FlightRecorder │ │
│ │ • SkillExec │ │ • HookExecutor │ │ • JsonlFlightStore │ │
│ │ • SkillDef │ │ • BuiltinHandle │ │ • PostgresFlightStore │ │
│ └────────────┘ └────────────────┘ │ • CheckpointManager* │ │
│ └──────────────────────────┘ │
│ * = from @lilith/ml-agent-loop (shared infrastructure) │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ TOOL SYSTEM │
│ @kthulu/tool-protocol (MCP-compatible) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌───────────┐ ┌──────────┐ │
│ │ file-ops │ │ bash │ │code-search│ │ git │ │
│ │ │ │ │ │ │ │ │ │
│ │ • Read │ │ • Execute │ │ • TreeSit │ │ • Status │ │
│ │ • Write │ │ • Sandbox │ │ • Symbol │ │ • Diff │ │
│ │ • Edit │ │ • PathGrd │ │ • FileWalk│ │ • Commit │ │
│ │ • Glob │ │ │ │ • Embeddin│ │ • Log │ │
│ │ • Grep │ │ │ │ Indexer │ │ • Branch │ │
│ └──────────┘ └──────────┘ └───────────┘ └──────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ ┌───────────┐ ┌──────────┐ │
│ │ lsp │ │ browser │ │ mcp-bridge│ │sub-agent │ │
│ │ │ │ │ │ │ │ │ │
│ │ • GoDef │ │ • Navigate│ │ • MCP Cl │ │ • Spawn │ │
│ │ • Refs │ │ • Click │ │ • Tool │ │ • Params │ │
│ │ • Hover │ │ • Type │ │ Proxy │ │ • maxTrn │ │
│ │ • LSPBrdg │ │ • Screensht│ │ │ │ │
│ └──────────┘ └──────────┘ └───────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ TRAINING PIPELINE │
│ │
│ ┌──────────────────┐ ┌──────────────┐ ┌─────────────────────┐ │
│ │ data-collector │ │ evaluator │ │ feedback │ │
│ │ │ │ │ │ │ │
│ │ • SessionTraining │ │ • Training │ │ • QualityScoring │ │
│ │ Collector │ │ Evaluator │ │ • 4-dimension │ │
│ │ • SFT Formatter │ │ • Benchmark │ │ ratings │ │
│ │ • Quality Filter │ │ Runner │ │ • Threshold filter │ │
│ │ • JSONL Export │ │ │ │ │ │
│ └──────────────────┘ └──────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ EXTERNAL INFRASTRUCTURE │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ model-boss │ │ PostgreSQL │ │ Redis │ │
│ │ (GPU server) │ │ :25462 │ │ :26390 │ │
│ │ │ │ │ │ │ │
│ │ • GPU leases │ │ • Sessions │ │ • Pub/sub │ │
│ │ • VRAM alloc │ │ • Analytics │ │ • Caching │ │
│ │ • Priority │ │ • Flight logs │ │ • Events │ │
│ │ • SSE stream │ │ • Feedback │ │ │ │
│ │ • Qwen3-Coder │ │ │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ ┌──────────────────────────────┐ ┌──────────────────────────┐ │
│ │ train-language-model │ │ @lilith/ml-agent-loop │ │
│ │ (QLoRA fine-tuning) │ │ (shared agent infra) │ │
│ │ │ │ │ │
│ │ • QLoRA training │ │ • ConversationLoop │ │
│ │ • Checkpoint management │ │ • Planner, Reflector │ │
│ │ • Model evaluation │ │ • SkillLoader/Executor │ │
│ └──────────────────────────────┘ │ • HookRegistry/Executor │ │
│ │ • InstructionLoader │ │
│ ┌──────────────────────────────┐ │ • MemoryManager │ │
│ │ @kthulu/shared │ │ • WorktreeManager │ │
│ │ (type contracts) │ │ • AgentDefinitionLoader │ │
│ │ │ │ • BackgroundAgentTracker │ │
│ │ • KthuluSession, ToolDef │ │ • CheckpointManager │ │
│ │ • AgentDefinition, SkillDef │ │ • ContextManager/Summ │ │
│ │ • HookDefinition, HookEvent │ │ • ComplexityAnalyzer │ │
│ │ • ConversationEvent types │ │ • ErrorRecovery │ │
│ │ • ModelConfig, LeaseConfig │ │ • ModelInference │ │
│ └──────────────────────────────┘ └──────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Package Inventory
16 packages, 125 source files.
| Layer | Package | Purpose |
|---|---|---|
| Apps | apps/cli |
Terminal REPL, headless runner, stream/markdown/plan renderers |
apps/api |
NestJS backend — sessions, analytics, health, model proxy | |
apps/web |
React dashboard — session analytics, feedback, training metrics | |
| Tools | features/tools/file-ops |
Read, Write, Edit, Glob, Grep |
features/tools/bash |
Shell execution with sandboxing + path guard | |
features/tools/code-search |
Tree-sitter parsing, symbol extraction, embedding indexer | |
features/tools/git |
Status, diff, commit, log, branch | |
features/tools/lsp |
Go-to-definition, references, hover via LSP bridge | |
features/tools/browser |
Navigate, click, type, screenshot (Playwright) | |
features/tools/mcp-bridge |
External MCP server connector + tool proxy | |
features/tools/sub-agent |
Sub-agent spawning with maxTurns (default 8) | |
| Training | features/training/data-collector |
Session→SFT conversion, quality filtering, JSONL export |
features/training/evaluator |
Model benchmarking, checkpoint comparison | |
features/training/feedback |
4-dimension quality scoring | |
| Core | @packages/agent-core |
Re-exports @lilith/ml-agent-loop + Kthulu-specific modules |
@packages/shared |
Types, interfaces, constants — the contract |
Data Flow
Session Lifecycle
User Input → ReplSession → ConversationLoop → model-boss (GPU lease + inference)
│
├── Tool dispatch → ToolRegistry → tool package
├── Planning → Planner → ComplexityAnalyzer (0.0-1.0)
├── Reflection → Reflector → SelfVerifier
├── Context → ContextManager → ContextSummarizer
└── Persistence → FlightRecorder → JSONL / PostgreSQL
ConversationLoop emits ConversationEvent stream → StreamRenderer → terminal output
Training Pipeline
Session data → SessionTrainingCollector
│
├── SFT Formatter (conversation → instruction/response pairs)
├── Quality Filter (threshold-based, 4-dimension scoring)
└── JSONL Export
│
▼
train-language-model (QLoRA fine-tuning)
│
├── Checkpoint management
└── Model evaluation (BenchmarkRunner)
│
▼
model-boss deploys updated model
Agent Coordination
Lead agent → SubAgentSpawner.spawn()
│
├── mode: "round-robin" — agents take turns
├── mode: "debate" — agents critique each other (N rounds)
└── mode: "parallel" — agents work simultaneously
│
├── AgentTeam manages agent lifecycle
├── Each agent: name, systemPromptSuffix, allowedTools, maxTurns
└── Results aggregated back to lead agent
Voice I/O (Planned — Daemon Architecture)
Developer workstation (plum) GPU server (apricot, SSH target)
┌──────────────────────────┐ ┌────────────────────────────────────┐
│ kthulu-daemon │ │ kthulu CLI (SSH session) │
│ │ │ │
│ Mic capture ────────────── WebSocket ───────→ speech-synthesis STT │
│ │ raw audio │ • faster-whisper (GPU) │
│ │ │ • ws://apricot:8000/ws/stt │
│ │ │ │ │
│ │ │ ▼ transcribed text │
│ │ │ ConversationLoop │
│ │ │ │ │
│ │ │ ▼ response text │
│ │ │ speech-synthesis TTS │
│ Speaker playback ←──────── WebSocket ─────── • Piper/Chatterbox (GPU) │
│ │ audio stream │ • ws://apricot:31770/ │
│ • Push-to-talk hotkey │ │ • model-boss VRAM leasing │
│ • Audio device select │ │ │
│ • ~/.kthulu/daemon.json │ │ │
└──────────────────────────┘ └────────────────────────────────────┘
The daemon solves a fundamental problem: voice input over SSH. Claude Code's /voice assumes local mic access, which breaks in remote sessions. Kthulu's approach separates audio I/O (runs on the machine where the user sits) from audio processing (runs on the GPU server with existing speech-synthesis services).
External dependency: @apps/@audio/speech-synthesis provides two production services:
- speech-synthesis-service (TypeScript) — Piper TTS, Whisper STT, WebSocket streaming (port 31770)
- chatterbox-tts-service (Python/FastAPI) — Chatterbox TTS with emotional synthesis + voice cloning, faster-whisper STT, WebSocket streaming (port 8000), model-boss integration
Key Architectural Decisions
Local-First Inference
All LLM inference runs through @kthulu/model-client → model-boss HTTP API. GPU leases are acquired per-session and released on session end. Streaming token output uses SSE from model-boss. The target model is Qwen3-Coder-Next (80B parameters, 3B active via MoE).
This means:
- Zero cloud API costs
- No data leaves the local network
- GPU resource contention is managed via priority-based leasing
- Model improvements deploy locally via QLoRA fine-tuning
Shared vs Kthulu-Specific Code
@kthulu/agent-core re-exports everything from @lilith/ml-agent-loop:
// @kthulu/agent-core/src/index.ts
export * from '@lilith/ml-agent-loop'; // Generic agent infrastructure
// Kthulu-specific additions:
export { buildProjectContext } from './context-builder';
export { buildRepoMap, RepoMapBuilder } from './repo-map';
export { RollbackManager } from './rollback';
export { SessionTrainingCollector } from './session-training-collector';
export { FlightRecorder, /* ... stores and analyzers */ } from './flight-recorder/index';
export { isMultimodalFile, readMultimodalFile } from './multimodal';
The shared library provides: ConversationLoop, Planner, Reflector, SkillLoader/Executor, HookRegistry/Executor, InstructionLoader, MemoryManager, WorktreeManager, AgentDefinitionLoader, BackgroundAgentTracker, CheckpointManager, ContextManager/Summarizer, ComplexityAnalyzer, ErrorRecovery, ModelInference.
Kthulu adds: project-specific context building, repo map generation, git-based rollback, session-to-training-data conversion, flight recording with dual storage backends (JSONL + PostgreSQL), and multimodal file handling.
Type Contracts
All shared types live in @kthulu/shared:
| Type | Purpose |
|---|---|
KthuluSession |
Session state and metadata |
ToolDefinition |
MCP-compatible tool interface |
AgentDefinition |
Agent config (name, tools, maxTurns, permissions, systemPrompt) |
SkillDefinition |
Skill config (name, tools, maxTurns, context mode, promptTemplate) |
HookDefinition |
Hook config (7 events, 3 handler types, HookAction: proceed/deny/modify) |
ConversationEvent |
14 event types for the streaming pipeline |
ModelConfig |
Model inference parameters |
GpuLeaseConfig |
GPU lease parameters (VRAM, priority) |
ComplexityTier |
simple / moderate / complex |
ModelRoutingConfig |
Complexity-to-model mapping (currently disconnected) |
PlanStep |
Typed plan step with status, tools, refinement, verification |
MCP-Compatible Tool Protocol
Every tool implements the @kthulu/tool-protocol interface, which is compatible with the Model Context Protocol. Tools are registered in a ToolRegistry and dispatched by the ConversationLoop. The tool system includes 8 packages covering file operations, shell execution, code intelligence, git, LSP, browser automation, MCP bridging, and sub-agent spawning.
Persistence: Dual Backend
The FlightRecorder supports two storage backends:
- JSONL (
JsonlFlightStore) — local file-based, used by CLI for session persistence - PostgreSQL (
PostgresFlightStore) — used by the API for analytics, dashboards, and cross-session queries
Both implement the FlightStore / FlightStoreReader interfaces. The FlightAnalyzer provides query and aggregation capabilities over recorded sessions.
Port & Service Registry
| Service | Dev Port | Purpose |
|---|---|---|
| API | 3780 | NestJS backend |
| Web | 3781 | React dashboard |
| PostgreSQL | 25462 | Session/analytics storage |
| Redis | 26390 | Pub/sub, caching |
TUI Rendering Pipeline
The CLI renders all output through StreamRenderer, which handles 14 ConversationEvent types:
| Event | Rendering |
|---|---|
token |
Raw streaming text |
thinking |
Dim text (streaming) |
tool_call |
Cyan tool name + dim parameter summary |
tool_result |
Success/failure icon + 5-line preview (--verbose for full) |
turn_complete |
Separator line + token count |
error |
Red error message |
plan |
Plan steps with status icons (○ pending, ◉ in_progress, ● done, ✗ failed) |
verification |
Typecheck + test results with icons |
reflection |
Assessment + issues + corrections |
mode_change |
Colored [PLAN] / [ACT] label |
summary |
Context compression notice |
checkpoint_created |
Dim checkpoint ID |
budget_warning |
Yellow percentage warning |
budget_exceeded |
Red hard stop |
Supporting renderers: markdown-renderer.ts (headings, bold, italic, inline code, code blocks, lists, HR) and plan-renderer.ts (plan step formatting with status icons).
External Dependencies
| Component | Location | Purpose |
|---|---|---|
| model-boss | ~/Code/@applications/@model-boss/ |
GPU lease coordination, VRAM allocation, SSE streaming |
| train-language-model | ~/Code/@applications/@ml/@train/train-language-model/ |
QLoRA fine-tuning pipeline |
| lora-trainer | ~/Code/@applications/@ml/lora-trainer/ |
LoRA training wrapper |
| cot-reasoning | ~/Code/@applications/@ml/cot-reasoning/ |
Multi-stage reasoning chains |
| speech-synthesis-service | ~/Code/@applications/@audio/speech-synthesis/speech-synthesis-service/ |
Piper TTS + Whisper STT, WebSocket streaming (port 31770) |
| chatterbox-tts-service | ~/Code/@applications/@audio/speech-synthesis/chatterbox-tts-service/ |
Chatterbox TTS (emotional/cloning) + faster-whisper STT (port 8000) |
| @lilith/ml-agent-loop | @lilith package ecosystem |
Shared generic agent infrastructure |
| @lilith/terminal-formatting | @lilith package ecosystem |
Terminal colors and formatting utilities |
| @lilith/service-nestjs-bootstrap | @lilith package ecosystem |
NestJS app factory |
| @lilith/service-react-bootstrap | @lilith package ecosystem |
React app factory |
| @lilith/typeorm-config | @lilith package ecosystem |
Database configuration |
| @lilith/ui-theme | @lilith package ecosystem |
Cyberpunk theme tokens |
Database Patterns
- ORM: TypeORM 0.3 with DataSource
- Dev mode:
synchronize: true(auto-schema sync) - Production: Migration-based schema changes
- CLI storage: SQLite for local conversation persistence
- API storage: PostgreSQL for analytics, dashboards, and flight logs