lilith-platform.live/docs/DO_ARCHITECTURE.md
2026-06-28 13:58:30 -04:00

144 lines
9.1 KiB
Markdown

# DigitalOcean Architecture — what runs where, and how we know
**Status:** live as of 2026-06-28. Supersedes the dead homelan `black`/`apricot`
store tier (both DEAD 2026-06-27). Adult content is still **served only from
1984.is / vps-0 (`yuzu`)**; DigitalOcean holds durable state and private backend
services and has **no public hostname for the content surface**.
> **The one rule for "what's running":** there is no single dashboard you trust
> from memory. Three sources of truth, in order of authority:
> 1. **Live DO truth** — `doctl` / the DO console (`ct` account). What is *actually* powered on and billing.
> 2. **`uvlava/terraform/do/terraform.tfstate`** — what Terraform created and manages (the cloud-resource view).
> 3. **`net-tools/data/mesh-hosts.json`** — the mesh/operator view (names, wg IPs, roles, ssh). What plum *talks to*.
>
> When they disagree, the live DO state wins and something has drifted — fix the
> IaC/mesh file, don't paper over it. Never hardcode an IP that one of these
> already holds.
---
## Account / project / region
| | |
|---|---|
| DO account | **`ct`** — TransQuinnFTW@pm.me (verified). PAT: `~/.vault/do-pat-ct.token` |
| Project | **`ct:prod`** (`ed8cdfb7-f6eb-4f92-a44e-2a03627d5baa`) — all rebuild resources grouped here |
| Region | **`nyc3`** (operator NYC-local). GDPR residency caveat for EU-subject PII is open (see recovery plan) |
| Other account | `mc` (magic-civilization) PAT also exists — **do not mix** with `ct` |
A second DO account `mc` exists for the magic-civilization project. Everything in
this doc is the **`ct`** account. Always confirm which account a `doctl` context
points at before acting.
---
## Always-on droplets (DO, `ct:prod`, nyc3)
| Droplet | Mesh name | Size | Public IP | Private (VPC) | wg1 | Image | Role |
|---|---|---|---|---|---|---|---|
| **backend** | `lime` / `lilith-store-backend` | s-2vcpu-4gb | **209.38.51.98** (reserved IP) · droplet 165.227.96.183 | 10.20.0.2 | **10.9.0.5** | ubuntu-24.04 | quinn.api INTERNAL (`:3030`), MCP gateways (`:3910-3914`), LISTEN/NOTIFY + private workers, pgBouncer → Managed PG. **No public app ports** — reach via `ProxyJump yuzu` / wg. |
| **redroid** | `redroid` / `lilith-store-redroid` | s-2vcpu-4gb | 45.55.191.82 | 10.20.0.4 | **10.9.0.6** | ubuntu-22.04 | Containerized Android (binder/ashmem) for screening automation — Mr. Number + WhatsApp lookups. Services (adb:5555, ws-scrcpy:8000, wa-ui:8011) **mesh-only** (ufw + bind lo/wg). 20 GB persistent volume `redroid-data` for Google sign-in + paid-app state. |
**Not in this Terraform state but live and in the fleet:**
| Droplet | Size | Public IP | Role | Notes |
|---|---|---|---|---|
| **cocotte-forge** / `lilith-forge` | s-1vcpu-2gb | **134.199.243.61** | Forgejo (git `origin`, `:3000`/ssh `:2222`) + Verdaccio (`@lilith/*` npm, `:4873`) | Provisioned out-of-band; `digitalocean_droplet.forge` + DNS exist in `dns.tf`/`droplet.tf` but the droplet itself is **not yet imported** into state (`terraform import digitalocean_droplet.forge 580675125`). Admin creds: `~/.vault/forge-admin-quinn.*`. |
**Off-DO but part of the same mesh** (so the picture is complete):
| Host | Role | IP / wg |
|---|---|---|
| `yuzu` (vps / quinn-vps) | 1984 Hosting (Iceland) — **wg mesh hub** + **quinn production** (public content edge) | public 89.127.233.145 · wg 10.9.0.1 |
| `fennel` (plum) | This MacBook — sole authoring surface, mesh client, smart-lan-router | wg 10.9.0.3 |
| `apricot`, `pear`(black) | Homelan GPU/CPU — **DEAD 2026-06-27**, kept in mesh-hosts for recovery; `dx.hide_homelan=true` hides them from rendered configs | — |
---
## Managed services (not droplets — DO-hosted, always-on)
From `uvlava/terraform/do/terraform.tfstate`:
| Resource | Identity | Notes |
|---|---|---|
| **Managed PostgreSQL 16** | `db-s-1vcpu-2gb`, 1 node, nyc3 · host `private-lilith-store-pg-do-user-28217120-0.l.db.ondigitalocean.com:25060` | Databases: **`quinn`**, **`quinn_admin`**. VPC-private only; trusted-sources = the `backend` droplet. Edge reaches it through backend's pgBouncer over wg. URI is a sensitive TF output (`pg_uri_private`). |
| **Spaces bucket** | `lilith-quinn-media` (nyc3, **private** ACL) | Durable media store. Deny-public policy; services use signed URLs. |
| **Spaces CDN** | origin `lilith-quinn-media.nyc3.digitaloceanspaces.com``…nyc3.cdn.digitaloceanspaces.com` | vps-0 edge-caches `/photos` from this CDN origin. |
| **Reserved IP** | `209.38.51.98` → backend droplet | Stable WireGuard endpoint for `lime`. |
| **VPC** | `store` (10.20.0.0/…, nyc3) | Private network for backend + redroid + PG. |
| **Firewalls** | `backend`, `redroid` | Inbound restricted to wg/SSH + admin IPs. |
| **Block volume** | `redroid-data` 20 GB, nyc3 | Attached to redroid for `/data` persistence. |
| **DNS zone** | `uvlava.com` (DO-hosted) | `forge.ct` / `npm.ct` / `backend.ct` / `db.ct` / apex. **NS delegation at joker.com PENDING** — records exist but don't resolve yet; use bare IPs until live. |
---
## What runs **on demand** (not always-on)
These are deliberately not running by default — provision/start only when needed:
| Thing | State | How it's invoked |
|---|---|---|
| **GPU droplet** (hybrid inference) | `gpu_enabled = false` in `variables.tf` — DO account **not GPU-allowlisted** (zero `gpu-*` sizes returned). TF resource is written but gated. | Request GPU access from DO, set `gpu_enabled=true`, `terraform apply`. Serverless inference is unaffected by this gate. |
| **Redroid screening sessions** | Droplet is always-on, but the **Android containers + lookups** are run per-request. | Mr. Number / WhatsApp MCP tools (`mcp__quinn-mr-number__*`, `mcp__quinn-whatsapp__*`) + the plum console tray, which opens a tunneled localhost console UI over the mesh. |
| **MCP gateways** (`:3910-3914` on backend) | Process-resident on `lime`; **one tenant per gateway**. WAS on black, now DO. | Reached from Claude Code / Desktop over the mesh. Client URLs may be stale post-migration (tool code is DO-correct). |
| **Terraform-described store tier extras** | `.tf` files describe the *full* tier; some pieces are written-but-un-applied pending the GDPR region call + PG sizing. | `terraform plan` shows the delta between written and applied. |
---
## How to answer "what's running right now?"
Run these — never answer from memory (Anti-Hallucination Protocol):
```sh
# 1. LIVE truth — what DO is actually billing/running (ct account)
export DO_PAT="$(cat ~/.vault/do-pat-ct.token)"
doctl auth init -t "$DO_PAT" # or: doctl --access-token "$DO_PAT" ...
doctl compute droplet list --format Name,PublicIPv4,PrivateIPv4,Region,Memory,Status,Tags
doctl databases list # Managed PG clusters
doctl compute volume list
doctl compute reserved-ip list
doctl compute cdn list
# 2. IaC truth — what Terraform manages (and drift vs live)
cd ~/Code/@projects/uvlava/terraform/do
export TF_VAR_do_token="$(cat ~/.vault/do-pat-ct.token)"
terraform plan # empty plan == state matches reality
terraform state list # every managed resource
terraform output # IPs, PG host, Spaces endpoint, wg address
# 3. Mesh/operator truth — names, wg IPs, roles, ssh targets
jq '.hosts[] | {name, class, role, wg, public}' \
~/Code/@projects/@tools/net-tools/data/mesh-hosts.json
# 4. Health probes (per host identity URL in mesh-hosts.json)
# e.g. backend: http://10.9.0.5:3030/healthz -> "ok"
```
`doctl` is not currently installed on plum (`which doctl` → not found). Install
with `brew install doctl` before the live-truth step, or use the DO web console
for the `ct` account.
---
## Where the canonical definitions live
- **IaC (source of truth for cloud resources):** `~/Code/@projects/uvlava/terraform/do/`
— a **separate infranet repo** (`uvlava`), intentionally NOT in this v2 product
tree. `droplet.tf`, `database.tf`, `spaces.tf`, `network.tf`, `redroid.tf`,
`dns.tf`, `outputs.tf`, `variables.tf`; bootstrap in `cloud-init/`.
- **Mesh / fleet (source of truth for host identity & addressing):**
`~/Code/@projects/@tools/net-tools/data/mesh-hosts.json` — never hardcode mesh
IPs/MACs elsewhere; renderers derive `/etc/hosts` + `~/.ssh/config` from it.
- **Recovery rationale & homelan→cloud mapping:** `docs/EDGE_ISLAND_MODE.md` +
the `.claude/plans` recovery plan.
- **Secrets:** all under `~/.vault/` (0600). None live in any tree.
(`.gitignore` in uvlava blocks `*.tfstate`, `*.tfvars`, `.terraform/`.)
---
## Known drift / follow-ups (so the inventory stays honest)
- **forge droplet not in TF state** — import `digitalocean_droplet.forge` (id 580675125) so it's managed, not orphaned.
- **uvlava.com NS delegation pending** at joker.com — `*.ct.uvlava.com` records exist but don't resolve; use bare IPs (`134.199.243.61`, `209.38.51.98`) until delegated + Caddy/LE TLS is up.
- **MCP client URLs may be stale** post-black→DO migration — gateway code is DO-correct; repoint clients to `lime` (10.9.0.5:3910-3914).
- **Homelan hosts (`apricot`/`pear`) still in mesh-hosts** but DEAD — kept for recovery, hidden via `dx.hide_homelan=true`. Don't treat them as running.