5.6 KiB
@ml Package Architecture
Overview
The @ml workspace now has a clean separation of concerns with dedicated packages for different resource types.
Package Structure
@ml/
├── vram-boss-py/ GPU/VRAM lease coordination (Python)
├── vram-boss-ts/ GPU/VRAM lease coordination (TypeScript)
├── ram-boss-py/ RAM lease coordination + cache management (Python)
├── ram-boss-ts/ RAM lease coordination (TypeScript)
├── model-boss-py/ Model loading library (uses vram-boss internally)
└── model-boss-ts/ Model path resolution (TypeScript)
Dependency Graph
Applications
↓
├─→ vram-boss (standalone GPU coordination)
├─→ ram-boss (standalone RAM coordination)
└─→ model-boss (model loading, uses vram-boss internally)
↓
vram-boss
Package Responsibilities
vram-boss / vram-boss-ts
Purpose: GPU/VRAM lease coordination
Provides:
- GPUBoss - VRAM lease coordinator
- GPULease - Lease with heartbeat & preemption
- Priority system (URGENT, HIGH, NORMAL, LOW, BATCH)
- Redis-based coordination
- CLI:
vram-boss status,cleanup,drain
Use When: You need to coordinate GPU access across processes
ram-boss / ram-boss-ts
Purpose: System RAM lease coordination + cache management
Provides:
- RAMBoss - RAM lease coordinator
- RAMLease - Lease with heartbeat & preemption
- MemoryAnalyzer - Parse /proc/meminfo, pressure detection
- CacheManager - Intelligent cache cleanup (auto/conservative/balanced/aggressive)
- ProcessMonitor - Process memory tracking
- CLI:
ram-boss status,analyze,clear,cleanup
Use When: You need to coordinate RAM usage or manage system cache
model-boss / model-boss-ts
Purpose: Model loading library
Provides:
- ManagedModelLoader - Automatic VRAM lease + model loading
- Framework loaders (HuggingFace, Diffusers, GGUF, ONNX, Whisper)
- Path resolution (ensure_model, resolve_model)
- Model manifest and discovery
Uses Internally: vram-boss for GPU coordination
Use When: You need to load ML models with optional GPU coordination
Usage Patterns
Pattern 1: Standalone VRAM Coordination
from lilith_vram_boss import GPUBoss, Priority
async with GPUBoss() as boss:
async with boss.acquire(vram_mb=8000) as lease:
# Your GPU work here
pass
Pattern 2: Standalone RAM Coordination
from lilith_ram_boss import RAMBoss
async with RAMBoss() as boss:
async with boss.acquire(ram_mb=16000) as lease:
# Your memory-intensive work here
pass
Pattern 3: Model Loading with GPU Coordination
from lilith_vram_boss import GPUBoss
from lilith_model_boss import ManagedModelLoader
boss = GPUBoss()
await boss.connect()
# ManagedModelLoader uses vram-boss internally
loader = ManagedModelLoader(boss=boss)
model = await loader.load("deepseek-r1", vram_mb=8000)
Pattern 4: Unified RAM + VRAM Coordination
from lilith_vram_boss import GPUBoss
from lilith_ram_boss import RAMBoss
gpu_boss = GPUBoss()
ram_boss = RAMBoss()
await gpu_boss.connect()
await ram_boss.connect()
# Coordinate both resources
async with ram_boss.acquire(ram_mb=16000) as ram_lease:
async with gpu_boss.acquire(vram_mb=8000) as vram_lease:
# Work with both RAM and VRAM reserved
pass
Pattern 5: Cache Cleanup Only
from lilith_ram_boss import MemoryAnalyzer, CacheManager
# Analyze memory pressure
analyzer = MemoryAnalyzer()
analysis = analyzer.analyze()
print(f"Pressure: {analysis.pressure.value}")
# Clean if needed
if analysis.pressure != "low":
manager = CacheManager()
freed_mb = manager.cleanup(mode="auto")
print(f"Freed {freed_mb} MB")
Pattern 6: Unified CLI (bitch)
# Global CLI provides unified interface to all memory management
# VRAM operations
bitch vram status # Check GPU status
bitch vram cleanup # Clean stale leases
bitch vram drain # Unload all models
# RAM operations
bitch ram analyze # Memory analysis
bitch ram clear auto # Intelligent cleanup
bitch ram status # Check RAM leases
# Direct access still works
vram-boss status # Same as: bitch vram status
ram-boss analyze # Same as: bitch ram analyze
Installation: npm install -g @lilith/bitch --registry=http://forge.nasty.sh/api/packages/lilith/npm/
Key Design Principles
-
Separation of Concerns
- vram-boss: GPU coordination only
- ram-boss: RAM coordination + system cache management
- model-boss: Model loading (uses vram-boss)
-
No Re-exports
- Each package has a clear API boundary
- model-boss does NOT re-export vram-boss classes
- Import from the correct package for what you need
-
Standalone Packages
- Each boss package can be used independently
- No circular dependencies
- Optional dependencies (model-boss can use vram-boss, but vram-boss is standalone)
-
Consistent Patterns
- All boss packages use Redis for coordination
- All use same Priority enum
- All support heartbeat, preemption, queuing
- Python and TypeScript packages mirror each other
Migration from Old Architecture
Before (model-boss 1.9.0)
# Everything was in model-boss
from lilith_model_boss import GPUBoss, ensure_model
After (model-boss 2.0.0)
# Separate imports for GPU coordination vs model loading
from lilith_vram_boss import GPUBoss
from lilith_model_boss import ensure_model
Impact: Cleaner imports, better separation of concerns, each package can evolve independently.