docs(prospector): add GPU cost-control feature (warm-up, meter, pause)
Presence-driven auto-warm (confirm toast), live uptime cost meter, pause=teardown to stop billing, GPU policy moved to settings config. Corrects the cost model: DO bills for droplet existence, not inference. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
50d5527345
commit
bc47a2a72d
1 changed files with 103 additions and 0 deletions
103
docs/features/gpu-cost-control.md
Normal file
103
docs/features/gpu-cost-control.md
Normal file
|
|
@ -0,0 +1,103 @@
|
|||
# GPU cost-control — presence-driven warm-up, live meter, pause
|
||||
|
||||
> The OSS uncensored model runs on an **on-demand DO GPU droplet** (see
|
||||
> [`draft-engine.md`](./draft-engine.md)). The droplet bills **$3.39/hr for its
|
||||
> existence** (`provision` → `teardown`), warm or idle — **not** for inference.
|
||||
> This feature makes that spend visible, consent-gated, and operator-controllable.
|
||||
|
||||
## The cost reality (design premise)
|
||||
|
||||
DO charges from droplet creation to destruction. Therefore:
|
||||
|
||||
- **Warm-up** = `provision` + load weights to VRAM. **Billing starts at provision**,
|
||||
minutes before the model answers.
|
||||
- **The only way to stop billing is `teardown`.** Routing work back to the
|
||||
`template` engine while keeping the droplet warm saves **nothing**.
|
||||
- The cost meter tracks **droplet uptime (`provisioned_at` → now)**, not activity.
|
||||
|
||||
## Existing infrastructure (reused, not rebuilt)
|
||||
|
||||
`src/gpu/gpu.service.ts` already has the full lifecycle: `provision()`,
|
||||
`teardown()`, `recordActivity()` (stamps `last_used_at`), and an
|
||||
`@Interval(60s) idleTimeoutCheck()` that tears down after `GPU_IDLE_TIMEOUT_MINUTES`
|
||||
(default 30). `HostsView.tsx` shows status + a static `$3.39/hr` label. This feature
|
||||
adds **presence-driven warm-up, a live cost meter, an explicit pause, and
|
||||
settings-backed config** on top — no lifecycle rewrite.
|
||||
|
||||
## Decisions (locked)
|
||||
|
||||
- **Pause = stop billing (teardown).** Resume re-provisions (cold start, ~minutes
|
||||
to boot + reload weights). The only honest "stop the spend" control.
|
||||
- **Auto-warm = on, with a confirm toast on first warm-up** ("GPU is starting —
|
||||
you're now billing $3.39/hr"). Consent without per-session friction.
|
||||
|
||||
## Design
|
||||
|
||||
### 1. Presence-driven warm-up
|
||||
|
||||
- PWA emits a heartbeat while focused: `POST /prospector/gpu/heartbeat`
|
||||
(throttled, e.g. every 30 s; pauses on `visibilitychange` → hidden).
|
||||
- First heartbeat with `gpu_auto_warm = true` and **no live droplet** →
|
||||
`provision()` and return `{ warming: true, justStarted: true }` so the PWA fires
|
||||
the one-time **confirm toast**.
|
||||
- Subsequent heartbeats call `recordActivity()` to hold it warm. The existing idle
|
||||
sweep releases it once heartbeats stop + the idle window passes.
|
||||
|
||||
### 2. Live cost meter
|
||||
|
||||
`GET /prospector/gpu/status` payload (`gpu-status.ts#buildGpuStatus`) gains:
|
||||
|
||||
```ts
|
||||
uptimeSeconds: number | null // now - provisioned_at, null when no droplet
|
||||
hourlyUsd: number // from gpu_hourly_usd setting (default 3.39)
|
||||
sessionCostUsd: number | null // uptimeSeconds/3600 * hourlyUsd
|
||||
```
|
||||
|
||||
Hosts shows a live banner: **"🔴 GPU warm — 14m · $0.79 this session · $3.39/hr"**.
|
||||
`hourlyUsd` moves out of the hardcoded `HOURLY_USD` constant into the payload.
|
||||
|
||||
### 3. Pause / resume
|
||||
|
||||
- **Pause** button → `POST /prospector/gpu/teardown` (existing). Meter stops; status
|
||||
shows "off — not billing".
|
||||
- **Resume** → `POST /prospector/gpu/provision` (existing), cold start. UI labels it
|
||||
honestly: "Resume (cold start ~Nm)".
|
||||
- A `gpu_max_session_minutes` cost-cap: the idle sweep also force-tears-down any
|
||||
droplet whose uptime exceeds the cap, regardless of activity — a hard ceiling so a
|
||||
stuck-warm droplet can't bill overnight.
|
||||
|
||||
### 4. Control via config (settings, not env)
|
||||
|
||||
Move GPU policy from env into `prospector_settings` (migration 0013) so it's
|
||||
editable in the PWA **and** over MCP (Plane-1 parity, see
|
||||
[`ai-first-v4.md`](./ai-first-v4.md)):
|
||||
|
||||
| Setting | Default | Meaning |
|
||||
|---|---|---|
|
||||
| `gpu_auto_warm` | `true` | activity provisions the droplet |
|
||||
| `gpu_idle_timeout_minutes` | `30` | release after this much inactivity (was env) |
|
||||
| `gpu_max_session_minutes` | `120` | hard cost-cap auto-teardown |
|
||||
| `gpu_hourly_usd` | `3.39` | meter rate (size-dependent) |
|
||||
| `gpu_region` / `gpu_size` | `nyc2` / `gpu-h100x1-80gb` | provision params (were env) |
|
||||
|
||||
`idleTimeoutCheck()` reads these from settings instead of `ConfigService` env.
|
||||
|
||||
## Endpoints summary
|
||||
|
||||
| Method | Route | Status |
|
||||
|---|---|---|
|
||||
| `POST /prospector/gpu/heartbeat` | presence ping → auto-warm / record activity | **new** |
|
||||
| `GET /prospector/gpu/status` | + `uptimeSeconds`, `hourlyUsd`, `sessionCostUsd` | **extend** |
|
||||
| `POST /prospector/gpu/provision` | warm / resume | exists |
|
||||
| `POST /prospector/gpu/teardown` | pause / stop billing | exists |
|
||||
|
||||
MCP parity (per `ai-first-v4.md` Phase 1): `prospector_gpu_status`,
|
||||
`prospector_gpu_warm`, `prospector_gpu_pause`, and the new GPU settings via
|
||||
`prospector_set_settings` — so an agent sees and controls the spend too.
|
||||
|
||||
## Invariant
|
||||
|
||||
Cost-control never changes send safety. The GPU being warm, cold, paused, or capped
|
||||
only affects **whether a draft body comes from the model or the `template`
|
||||
fallback** — Gate-2, the kill-switch, and the macsync outbox floor are untouched.
|
||||
A cold/paused GPU degrades to `template`, never to an unsafe or placeholder send.
|
||||
Loading…
Add table
Reference in a new issue