No description
Find a file
Lilith 9e61b4e818
Some checks failed
Build and Publish / build (push) Failing after 45s
Build and Publish / publish (push) Has been skipped
deps-upgrade(dependencies): ⬆️ Update all dependencies to their latest stable versions for bug fixes and improvements
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-03-08 19:35:36 -07:00
.forgejo/workflows chore: initial commit with publish config 2026-01-21 12:30:26 -08:00
src chore: initial commit with publish config 2026-01-21 12:30:26 -08:00
.gitignore chore: initial commit with publish config 2026-01-21 12:30:26 -08:00
package.json deps-upgrade(dependencies): ⬆️ Update all dependencies to their latest stable versions for bug fixes and improvements 2026-03-08 19:35:36 -07:00
README.md chore: trigger CI publish 2026-01-30 11:56:40 -08:00
tsconfig.json chore: initial commit with publish config 2026-01-21 12:30:26 -08:00
tsup.config.ts perf(build): Optimize TypeScript bundling with tsup config tweaks for faster builds 2026-01-21 15:36:17 -08:00

@lilith/ml-vram-boss

GPU/VRAM lease coordinator for preventing race conditions in multi-model ML systems.

Features

  • Lease-based coordination: Acquire exclusive VRAM allocations via Redis
  • Priority queuing: Support for URGENT, HIGH, NORMAL, LOW, and BATCH priorities
  • Automatic heartbeat: Keep leases alive automatically
  • Preemption support: Gracefully handle resource preemption
  • Stale lease cleanup: Automatically clean up crashed processes
  • Multi-GPU support: Coordinate across multiple GPUs

Installation

pnpm add @lilith/ml-vram-boss

Quick Start

import { GPUBoss, Priority } from '@lilith/ml-vram-boss';

const boss = new GPUBoss();
await boss.connect();

// Initialize GPUs
await boss.initializeGpu(0, 24000, 'NVIDIA RTX 4090');

// Acquire a lease
const lease = await boss.acquire({
  vramMb: 8000,
  modelId: 'llama-7b',
  priority: Priority.NORMAL,
  timeoutMs: 60000,
});

// Handle preemption
lease.onPreempt(async (reason) => {
  console.log(`Preempted: ${reason}`);
  await unloadModel();
});

// Use the GPU
await loadModel();

// Release when done
await lease.release();
await boss.close();

Configuration

const boss = new GPUBoss({
  redisUrl: 'redis://localhost:6379',
  heartbeatIntervalMs: 10000,
  staleLeaseTimeoutMs: 60000,
  preemptionGracePeriodMs: 30000,
  defaultTimeoutMs: 300000,
  keyPrefix: 'gpu',
  autoCleanup: true,
  cleanupIntervalSeconds: 30,
});

API

GPUBoss

connect(): Promise<void>

Connect to Redis and start background cleanup task.

initializeGpu(gpuIndex: number, vramTotalMb: number, gpuName?: string): Promise<void>

Initialize a GPU for tracking.

acquire(options: AcquireOptions): Promise<GPULease>

Acquire a GPU lease.

Options:

  • vramMb: Required VRAM in megabytes
  • priority: Priority level (default: NORMAL)
  • modelId: Identifier for the model
  • timeoutMs: Max wait time
  • gpuPreference: Preferred GPU indices
  • serviceName: Service identifier

getStatus(): Promise<BossStatus>

Get current status of all GPUs and queues.

forceRelease(leaseId: string): Promise<boolean>

Force release a lease (for admin operations).

drainAll(reason?: string): Promise<string[]>

Request all models to unload gracefully.

GPULease

onPreempt(callback: (reason: string) => Promise<void>): void

Register a callback for preemption signals.

release(): Promise<boolean>

Release the lease and free VRAM.

Properties

  • leaseId: Unique lease identifier
  • gpuIndex: GPU index
  • vramMb: Reserved VRAM
  • priority: Lease priority
  • modelId: Model identifier
  • isReleased: Whether lease has been released

Priority Levels

enum Priority {
  URGENT = 1,   // Immediate, bypasses queue
  HIGH = 5,     // Critical paths
  NORMAL = 10,  // Default
  LOW = 20,     // Background tasks
  BATCH = 50,   // Bulk operations
}

Redis Key Structure

gpu:{index}:leases         - Hash of active leases
gpu:{index}:vram:total     - Total VRAM for GPU
gpu:{index}:vram:used      - Currently used VRAM
gpu:{index}:name           - GPU name
gpu:count                  - Number of GPUs
gpu:leases:all             - Mapping of lease IDs to GPU indices
gpu:queue                  - Sorted set of queued requests
gpu:queue:requests         - Hash of request details
gpu:heartbeat:{leaseId}    - Heartbeat timestamp
gpu:preempt:{leaseId}      - Preemption channel
  • @lilith/ml-model-boss - Full model loading system (uses this package)
  • lilith-vram-boss - Python implementation

License

MIT