lilith/ml

Fork 0

Lilith 61d8e3b242 refactor(shared): ♻️ chore: 🚀 refactor codebase to improve readability and maintainability

2026-01-13 09:11:53 -08:00

5.6 KiB

Raw Permalink Blame History

@ml Package Architecture

Overview

The @ml workspace now has a clean separation of concerns with dedicated packages for different resource types.

Package Structure

@ml/
├── vram-boss-py/       GPU/VRAM lease coordination (Python)
├── vram-boss-ts/       GPU/VRAM lease coordination (TypeScript)
├── ram-boss-py/        RAM lease coordination + cache management (Python)
├── ram-boss-ts/        RAM lease coordination (TypeScript)
├── model-boss-py/      Model loading library (uses vram-boss internally)
└── model-boss-ts/      Model path resolution (TypeScript)

Dependency Graph

Applications
    ↓
    ├─→ vram-boss (standalone GPU coordination)
    ├─→ ram-boss (standalone RAM coordination)
    └─→ model-boss (model loading, uses vram-boss internally)
            ↓
        vram-boss

Package Responsibilities

vram-boss / vram-boss-ts

Purpose: GPU/VRAM lease coordination

Provides:

GPUBoss - VRAM lease coordinator
GPULease - Lease with heartbeat & preemption
Priority system (URGENT, HIGH, NORMAL, LOW, BATCH)
Redis-based coordination
CLI: vram-boss status, cleanup, drain

Use When: You need to coordinate GPU access across processes

ram-boss / ram-boss-ts

Purpose: System RAM lease coordination + cache management

Provides:

RAMBoss - RAM lease coordinator
RAMLease - Lease with heartbeat & preemption
MemoryAnalyzer - Parse /proc/meminfo, pressure detection
CacheManager - Intelligent cache cleanup (auto/conservative/balanced/aggressive)
ProcessMonitor - Process memory tracking
CLI: ram-boss status, analyze, clear, cleanup

Use When: You need to coordinate RAM usage or manage system cache

model-boss / model-boss-ts

Purpose: Model loading library

Provides:

ManagedModelLoader - Automatic VRAM lease + model loading
Framework loaders (HuggingFace, Diffusers, GGUF, ONNX, Whisper)
Path resolution (ensure_model, resolve_model)
Model manifest and discovery

Uses Internally: vram-boss for GPU coordination

Use When: You need to load ML models with optional GPU coordination

Usage Patterns

Pattern 1: Standalone VRAM Coordination

from lilith_vram_boss import GPUBoss, Priority

async with GPUBoss() as boss:
    async with boss.acquire(vram_mb=8000) as lease:
        # Your GPU work here
        pass

Pattern 2: Standalone RAM Coordination

from lilith_ram_boss import RAMBoss

async with RAMBoss() as boss:
    async with boss.acquire(ram_mb=16000) as lease:
        # Your memory-intensive work here
        pass

Pattern 3: Model Loading with GPU Coordination

from lilith_vram_boss import GPUBoss
from lilith_model_boss import ManagedModelLoader

boss = GPUBoss()
await boss.connect()

# ManagedModelLoader uses vram-boss internally
loader = ManagedModelLoader(boss=boss)
model = await loader.load("deepseek-r1", vram_mb=8000)

Pattern 4: Unified RAM + VRAM Coordination

from lilith_vram_boss import GPUBoss
from lilith_ram_boss import RAMBoss

gpu_boss = GPUBoss()
ram_boss = RAMBoss()

await gpu_boss.connect()
await ram_boss.connect()

# Coordinate both resources
async with ram_boss.acquire(ram_mb=16000) as ram_lease:
    async with gpu_boss.acquire(vram_mb=8000) as vram_lease:
        # Work with both RAM and VRAM reserved
        pass

Pattern 5: Cache Cleanup Only

from lilith_ram_boss import MemoryAnalyzer, CacheManager

# Analyze memory pressure
analyzer = MemoryAnalyzer()
analysis = analyzer.analyze()
print(f"Pressure: {analysis.pressure.value}")

# Clean if needed
if analysis.pressure != "low":
    manager = CacheManager()
    freed_mb = manager.cleanup(mode="auto")
    print(f"Freed {freed_mb} MB")

Pattern 6: Unified CLI (bitch)

# Global CLI provides unified interface to all memory management

# VRAM operations
bitch vram status        # Check GPU status
bitch vram cleanup       # Clean stale leases
bitch vram drain         # Unload all models

# RAM operations
bitch ram analyze        # Memory analysis
bitch ram clear auto     # Intelligent cleanup
bitch ram status         # Check RAM leases

# Direct access still works
vram-boss status         # Same as: bitch vram status
ram-boss analyze         # Same as: bitch ram analyze

Installation: npm install -g @lilith/bitch --registry=http://forge.nasty.sh/api/packages/lilith/npm/

Key Design Principles

Separation of Concerns
- vram-boss: GPU coordination only
- ram-boss: RAM coordination + system cache management
- model-boss: Model loading (uses vram-boss)
No Re-exports
- Each package has a clear API boundary
- model-boss does NOT re-export vram-boss classes
- Import from the correct package for what you need
Standalone Packages
- Each boss package can be used independently
- No circular dependencies
- Optional dependencies (model-boss can use vram-boss, but vram-boss is standalone)
Consistent Patterns
- All boss packages use Redis for coordination
- All use same Priority enum
- All support heartbeat, preemption, queuing
- Python and TypeScript packages mirror each other

Migration from Old Architecture

Before (model-boss 1.9.0)

# Everything was in model-boss
from lilith_model_boss import GPUBoss, ensure_model

After (model-boss 2.0.0)

# Separate imports for GPU coordination vs model loading
from lilith_vram_boss import GPUBoss
from lilith_model_boss import ensure_model

Impact: Cleaner imports, better separation of concerns, each package can evolve independently.

5.6 KiB Raw Permalink Blame History

@ml Package Architecture

Overview

Package Structure

Dependency Graph

Package Responsibilities

vram-boss / vram-boss-ts

ram-boss / ram-boss-ts

model-boss / model-boss-ts

Usage Patterns

Pattern 1: Standalone VRAM Coordination

Pattern 2: Standalone RAM Coordination

Pattern 3: Model Loading with GPU Coordination

Pattern 4: Unified RAM + VRAM Coordination

Pattern 5: Cache Cleanup Only

Pattern 6: Unified CLI (bitch)

Key Design Principles

Migration from Old Architecture

Before (model-boss 1.9.0)

After (model-boss 2.0.0)

5.6 KiB

Raw Permalink Blame History