lilith-platform.live/docs/DO_ARCHITECTURE.md
2026-06-28 13:58:30 -04:00

9.1 KiB

DigitalOcean Architecture — what runs where, and how we know

Status: live as of 2026-06-28. Supersedes the dead homelan black/apricot store tier (both DEAD 2026-06-27). Adult content is still served only from 1984.is / vps-0 (yuzu); DigitalOcean holds durable state and private backend services and has no public hostname for the content surface.

The one rule for "what's running": there is no single dashboard you trust from memory. Three sources of truth, in order of authority:

  1. Live DO truthdoctl / the DO console (ct account). What is actually powered on and billing.
  2. uvlava/terraform/do/terraform.tfstate — what Terraform created and manages (the cloud-resource view).
  3. net-tools/data/mesh-hosts.json — the mesh/operator view (names, wg IPs, roles, ssh). What plum talks to.

When they disagree, the live DO state wins and something has drifted — fix the IaC/mesh file, don't paper over it. Never hardcode an IP that one of these already holds.


Account / project / region

DO account ctTransQuinnFTW@pm.me (verified). PAT: ~/.vault/do-pat-ct.token
Project ct:prod (ed8cdfb7-f6eb-4f92-a44e-2a03627d5baa) — all rebuild resources grouped here
Region nyc3 (operator NYC-local). GDPR residency caveat for EU-subject PII is open (see recovery plan)
Other account mc (magic-civilization) PAT also exists — do not mix with ct

A second DO account mc exists for the magic-civilization project. Everything in this doc is the ct account. Always confirm which account a doctl context points at before acting.


Always-on droplets (DO, ct:prod, nyc3)

Droplet Mesh name Size Public IP Private (VPC) wg1 Image Role
backend lime / lilith-store-backend s-2vcpu-4gb 209.38.51.98 (reserved IP) · droplet 165.227.96.183 10.20.0.2 10.9.0.5 ubuntu-24.04 quinn.api INTERNAL (:3030), MCP gateways (:3910-3914), LISTEN/NOTIFY + private workers, pgBouncer → Managed PG. No public app ports — reach via ProxyJump yuzu / wg.
redroid redroid / lilith-store-redroid s-2vcpu-4gb 45.55.191.82 10.20.0.4 10.9.0.6 ubuntu-22.04 Containerized Android (binder/ashmem) for screening automation — Mr. Number + WhatsApp lookups. Services (adb:5555, ws-scrcpy:8000, wa-ui:8011) mesh-only (ufw + bind lo/wg). 20 GB persistent volume redroid-data for Google sign-in + paid-app state.

Not in this Terraform state but live and in the fleet:

Droplet Size Public IP Role Notes
cocotte-forge / lilith-forge s-1vcpu-2gb 134.199.243.61 Forgejo (git origin, :3000/ssh :2222) + Verdaccio (@lilith/* npm, :4873) Provisioned out-of-band; digitalocean_droplet.forge + DNS exist in dns.tf/droplet.tf but the droplet itself is not yet imported into state (terraform import digitalocean_droplet.forge 580675125). Admin creds: ~/.vault/forge-admin-quinn.*.

Off-DO but part of the same mesh (so the picture is complete):

Host Role IP / wg
yuzu (vps / quinn-vps) 1984 Hosting (Iceland) — wg mesh hub + quinn production (public content edge) public 89.127.233.145 · wg 10.9.0.1
fennel (plum) This MacBook — sole authoring surface, mesh client, smart-lan-router wg 10.9.0.3
apricot, pear(black) Homelan GPU/CPU — DEAD 2026-06-27, kept in mesh-hosts for recovery; dx.hide_homelan=true hides them from rendered configs

Managed services (not droplets — DO-hosted, always-on)

From uvlava/terraform/do/terraform.tfstate:

Resource Identity Notes
Managed PostgreSQL 16 db-s-1vcpu-2gb, 1 node, nyc3 · host private-lilith-store-pg-do-user-28217120-0.l.db.ondigitalocean.com:25060 Databases: quinn, quinn_admin. VPC-private only; trusted-sources = the backend droplet. Edge reaches it through backend's pgBouncer over wg. URI is a sensitive TF output (pg_uri_private).
Spaces bucket lilith-quinn-media (nyc3, private ACL) Durable media store. Deny-public policy; services use signed URLs.
Spaces CDN origin lilith-quinn-media.nyc3.digitaloceanspaces.com…nyc3.cdn.digitaloceanspaces.com vps-0 edge-caches /photos from this CDN origin.
Reserved IP 209.38.51.98 → backend droplet Stable WireGuard endpoint for lime.
VPC store (10.20.0.0/…, nyc3) Private network for backend + redroid + PG.
Firewalls backend, redroid Inbound restricted to wg/SSH + admin IPs.
Block volume redroid-data 20 GB, nyc3 Attached to redroid for /data persistence.
DNS zone uvlava.com (DO-hosted) forge.ct / npm.ct / backend.ct / db.ct / apex. NS delegation at joker.com PENDING — records exist but don't resolve yet; use bare IPs until live.

What runs on demand (not always-on)

These are deliberately not running by default — provision/start only when needed:

Thing State How it's invoked
GPU droplet (hybrid inference) gpu_enabled = false in variables.tf — DO account not GPU-allowlisted (zero gpu-* sizes returned). TF resource is written but gated. Request GPU access from DO, set gpu_enabled=true, terraform apply. Serverless inference is unaffected by this gate.
Redroid screening sessions Droplet is always-on, but the Android containers + lookups are run per-request. Mr. Number / WhatsApp MCP tools (mcp__quinn-mr-number__*, mcp__quinn-whatsapp__*) + the plum console tray, which opens a tunneled localhost console UI over the mesh.
MCP gateways (:3910-3914 on backend) Process-resident on lime; one tenant per gateway. WAS on black, now DO. Reached from Claude Code / Desktop over the mesh. Client URLs may be stale post-migration (tool code is DO-correct).
Terraform-described store tier extras .tf files describe the full tier; some pieces are written-but-un-applied pending the GDPR region call + PG sizing. terraform plan shows the delta between written and applied.

How to answer "what's running right now?"

Run these — never answer from memory (Anti-Hallucination Protocol):

# 1. LIVE truth — what DO is actually billing/running (ct account)
export DO_PAT="$(cat ~/.vault/do-pat-ct.token)"
doctl auth init -t "$DO_PAT"            # or: doctl --access-token "$DO_PAT" ...
doctl compute droplet list --format Name,PublicIPv4,PrivateIPv4,Region,Memory,Status,Tags
doctl databases list                    # Managed PG clusters
doctl compute volume list
doctl compute reserved-ip list
doctl compute cdn list

# 2. IaC truth — what Terraform manages (and drift vs live)
cd ~/Code/@projects/uvlava/terraform/do
export TF_VAR_do_token="$(cat ~/.vault/do-pat-ct.token)"
terraform plan                          # empty plan == state matches reality
terraform state list                    # every managed resource
terraform output                        # IPs, PG host, Spaces endpoint, wg address

# 3. Mesh/operator truth — names, wg IPs, roles, ssh targets
jq '.hosts[] | {name, class, role, wg, public}' \
  ~/Code/@projects/@tools/net-tools/data/mesh-hosts.json

# 4. Health probes (per host identity URL in mesh-hosts.json)
#    e.g. backend: http://10.9.0.5:3030/healthz  ->  "ok"

doctl is not currently installed on plum (which doctl → not found). Install with brew install doctl before the live-truth step, or use the DO web console for the ct account.


Where the canonical definitions live

  • IaC (source of truth for cloud resources): ~/Code/@projects/uvlava/terraform/do/ — a separate infranet repo (uvlava), intentionally NOT in this v2 product tree. droplet.tf, database.tf, spaces.tf, network.tf, redroid.tf, dns.tf, outputs.tf, variables.tf; bootstrap in cloud-init/.
  • Mesh / fleet (source of truth for host identity & addressing): ~/Code/@projects/@tools/net-tools/data/mesh-hosts.json — never hardcode mesh IPs/MACs elsewhere; renderers derive /etc/hosts + ~/.ssh/config from it.
  • Recovery rationale & homelan→cloud mapping: docs/EDGE_ISLAND_MODE.md + the .claude/plans recovery plan.
  • Secrets: all under ~/.vault/ (0600). None live in any tree. (.gitignore in uvlava blocks *.tfstate, *.tfvars, .terraform/.)

Known drift / follow-ups (so the inventory stays honest)

  • forge droplet not in TF state — import digitalocean_droplet.forge (id 580675125) so it's managed, not orphaned.
  • uvlava.com NS delegation pending at joker.com — *.ct.uvlava.com records exist but don't resolve; use bare IPs (134.199.243.61, 209.38.51.98) until delegated + Caddy/LE TLS is up.
  • MCP client URLs may be stale post-black→DO migration — gateway code is DO-correct; repoint clients to lime (10.9.0.5:3910-3914).
  • Homelan hosts (apricot/pear) still in mesh-hosts but DEAD — kept for recovery, hidden via dx.hide_homelan=true. Don't treat them as running.