No description

Find a file

Lilith 9e61b4e818 Some checks failed Build and Publish / build (push) Failing after 45s Details Build and Publish / publish (push) Has been skipped Details deps-upgrade(dependencies): ⬆️ Update all dependencies to their latest stable versions for bug fixes and improvements Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>		2026-03-08 19:35:36 -07:00
.forgejo/workflows	chore: initial commit with publish config	2026-01-21 12:30:26 -08:00
src	chore: initial commit with publish config	2026-01-21 12:30:26 -08:00
.gitignore	chore: initial commit with publish config	2026-01-21 12:30:26 -08:00
package.json	deps-upgrade(dependencies): ⬆️ Update all dependencies to their latest stable versions for bug fixes and improvements	2026-03-08 19:35:36 -07:00
README.md	chore: trigger CI publish	2026-01-30 11:56:40 -08:00
tsconfig.json	chore: initial commit with publish config	2026-01-21 12:30:26 -08:00
tsup.config.ts	perf(build): ⚡ Optimize TypeScript bundling with tsup config tweaks for faster builds	2026-01-21 15:36:17 -08:00

README.md

@lilith/ml-vram-boss

GPU/VRAM lease coordinator for preventing race conditions in multi-model ML systems.

Features

Lease-based coordination: Acquire exclusive VRAM allocations via Redis
Priority queuing: Support for URGENT, HIGH, NORMAL, LOW, and BATCH priorities
Automatic heartbeat: Keep leases alive automatically
Preemption support: Gracefully handle resource preemption
Stale lease cleanup: Automatically clean up crashed processes
Multi-GPU support: Coordinate across multiple GPUs

Installation

pnpm add @lilith/ml-vram-boss

Quick Start

import { GPUBoss, Priority } from '@lilith/ml-vram-boss';

const boss = new GPUBoss();
await boss.connect();

// Initialize GPUs
await boss.initializeGpu(0, 24000, 'NVIDIA RTX 4090');

// Acquire a lease
const lease = await boss.acquire({
  vramMb: 8000,
  modelId: 'llama-7b',
  priority: Priority.NORMAL,
  timeoutMs: 60000,
});

// Handle preemption
lease.onPreempt(async (reason) => {
  console.log(`Preempted: ${reason}`);
  await unloadModel();
});

// Use the GPU
await loadModel();

// Release when done
await lease.release();
await boss.close();

Configuration

const boss = new GPUBoss({
  redisUrl: 'redis://localhost:6379',
  heartbeatIntervalMs: 10000,
  staleLeaseTimeoutMs: 60000,
  preemptionGracePeriodMs: 30000,
  defaultTimeoutMs: 300000,
  keyPrefix: 'gpu',
  autoCleanup: true,
  cleanupIntervalSeconds: 30,
});

API

GPUBoss

`connect(): Promise<void>`

Connect to Redis and start background cleanup task.

`initializeGpu(gpuIndex: number, vramTotalMb: number, gpuName?: string): Promise<void>`

Initialize a GPU for tracking.

`acquire(options: AcquireOptions): Promise<GPULease>`

Acquire a GPU lease.

Options:

vramMb: Required VRAM in megabytes
priority: Priority level (default: NORMAL)
modelId: Identifier for the model
timeoutMs: Max wait time
gpuPreference: Preferred GPU indices
serviceName: Service identifier

`getStatus(): Promise<BossStatus>`

Get current status of all GPUs and queues.

`forceRelease(leaseId: string): Promise<boolean>`

Force release a lease (for admin operations).

`drainAll(reason?: string): Promise<string[]>`

Request all models to unload gracefully.

GPULease

`onPreempt(callback: (reason: string) => Promise<void>): void`

`release(): Promise<boolean>`

Release the lease and free VRAM.

Properties

leaseId: Unique lease identifier
gpuIndex: GPU index
vramMb: Reserved VRAM
priority: Lease priority
modelId: Model identifier
isReleased: Whether lease has been released

Priority Levels

enum Priority {
  URGENT = 1,   // Immediate, bypasses queue
  HIGH = 5,     // Critical paths
  NORMAL = 10,  // Default
  LOW = 20,     // Background tasks
  BATCH = 50,   // Bulk operations
}

Redis Key Structure

gpu:{index}:leases         - Hash of active leases
gpu:{index}:vram:total     - Total VRAM for GPU
gpu:{index}:vram:used      - Currently used VRAM
gpu:{index}:name           - GPU name
gpu:count                  - Number of GPUs
gpu:leases:all             - Mapping of lease IDs to GPU indices
gpu:queue                  - Sorted set of queued requests
gpu:queue:requests         - Hash of request details
gpu:heartbeat:{leaseId}    - Heartbeat timestamp
gpu:preempt:{leaseId}      - Preemption channel

@lilith/ml-model-boss - Full model loading system (uses this package)
lilith-vram-boss - Python implementation

License

MIT

README.md

@lilith/ml-vram-boss

Features

Installation

Quick Start

Configuration

API

GPUBoss

connect(): Promise<void>

initializeGpu(gpuIndex: number, vramTotalMb: number, gpuName?: string): Promise<void>

acquire(options: AcquireOptions): Promise<GPULease>

getStatus(): Promise<BossStatus>

forceRelease(leaseId: string): Promise<boolean>

drainAll(reason?: string): Promise<string[]>