prospector/docs/features/deploy.md
Natalie e7ed0ea951
Some checks failed
CI / verify (push) Failing after 46s
docs(deploy): ct.prod runbook (hardened public prod / DMZ)
Document ct.prod as the hardened public prod host: public Caddy edge + app on
ct.prod, /internal + DB on VPC/mesh, lime internal-only. Add the exact ordered
operator runbook (terraform plan/apply -target for the ct.prod resources, wg1
join via citron, one-time DB role + trusted-source, apps.ftw.pw DNS as an
operator decision since ftw.pw is not DO-managed, deploy-server.sh, Caddy
install). Keep the lime DB/env/systemd mechanics as a legacy reference.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 12:17:25 -04:00

10 KiB

Deploy — Prospector prod on ct.prod (the hardened public DMZ host)

Topology (authoritative, 2026-06-30): ct.prod is the public prod host

The public sales edge does NOT live on lime. lime is the internal store/backend box and keeps zero public app ports. Prospector's prod target is ct.prod (com.uvlava.ct.prod) — a new, dedicated, hardened DO droplet (nyc3, store VPC, joins wg1) whose only job is to face the internet:

internet --(80/443)--> Caddy on ct.prod --(127.0.0.1:3210)--> NestJS app
ct.prod  --(store VPC 10.20.0.0/24)-----> DO Managed PG (lilith-store-pg, private)
ct.prod  --(wg1 mesh 10.9.0.0/24)-------> people / mac-sync / mr-number
  • Public name: apps.ftw.pw (Caddy + Let's Encrypt). ftw.pw is a SEPARATE zone, not DO-managed — see the DNS step below.
  • The app binds 127.0.0.1:3210 only. Caddy is the sole public listener and 403s /internal/* (the mac-sync inbound webhook + peers); macsync hits /internal/inbound over the mesh (http://10.9.0.10:3210/internal/inbound), never the public leg.
  • DB + mesh deps over private paths only. DO Managed PG over the VPC; people/mac-sync/mr-number over wg1. mac-sync runs on the operator's Mac (not lime, not ct.prod) — MACSYNC_BASE_URL/MACSYNC_DEVICE_ID are operator-set.
  • lime stays internal (mesh-only; no app/edge ports).
  • IaC: uvlava/terraform/do/ct_prod.tf (count-gated ct_prod_enabled; droplet + reserved IP + cloud firewall 80/443 public, 22+wg mesh-only). Hardened cloud-init cloud-init/ct-prod.yaml: ufw, fail2ban, unattended-upgrades, non-root deploy user, node20. Mesh entry: mesh-hosts.json host ct.prod, wg 10.9.0.10.

⚠️ ct.prod must be added as a TRUSTED SOURCE on the lilith-store-pg managed cluster (DO console → Databases → firewall) or migrations + the app's DB connect will time out.

Operator runbook — bring ct.prod live (in order)

All terraform here is plan/apply with -target so the rest of the shared store tier is never dragged in. ct_prod_enabled defaults false; the -var flips it on for this targeted apply only.

cd ~/Code/@ct/infra/uvlava/terraform/do
export TF_VAR_do_token="$(cat ~/.vault/do-pat-ct.token)"

# 1. Stand up ct.prod (droplet + reserved IP + cloud firewall) — ONLY these.
terraform plan  -var=ct_prod_enabled=true \
  -target=digitalocean_droplet.ct_prod \
  -target=digitalocean_reserved_ip.ct_prod \
  -target=digitalocean_firewall.ct_prod        # review: 3 to add, 0 change, 0 destroy
terraform apply -var=ct_prod_enabled=true \
  -target=digitalocean_droplet.ct_prod \
  -target=digitalocean_reserved_ip.ct_prod \
  -target=digitalocean_firewall.ct_prod
terraform output -raw ct_prod_public_ip        # = the reserved IP (only exists now)

# 2. Join ct.prod to wg1: copy /root/wg1.pub off the box, add it as a [Peer] on
#    the nyc3 hub (citron); append the citron [Peer] block to ct.prod's
#    /etc/wireguard/wg1.conf, then `systemctl start wg-quick@wg1`
#    (phase-b-mesh-join.sh automates this). Then set mesh-hosts.json ct.prod
#    wg_pubkey + public (= the reserved IP) and re-render (net sync).

# 3. Make ct.prod a trusted source on the managed PG cluster (DO console), then
#    create the prospector DB + role ONCE (secret-bearing; not in terraform):
doctl databases db   create lilith-store-pg prospector
doctl databases user create lilith-store-pg prospector     # prints the password
#    as doadmin on the prospector DB:
#      ALTER DATABASE prospector OWNER TO prospector;
#      GRANT ALL ON SCHEMA public TO prospector; ALTER SCHEMA public OWNER TO prospector;

# 4. DNS for apps.ftw.pw (operator decision — ftw.pw is NOT DO-managed):
#    (a) add  apps.ftw.pw  A  <ct.prod reserved IP>  at ftw.pw's registrar, OR
#    (b) delegate ftw.pw NS to DigitalOcean, then add a digitalocean_record in dns.tf.

# 5. Ship the app (over the mesh; fills /opt/prospector/.env, runs migrations).
cd ~/Code/@ct/@applications/prospector
./deploy/deploy-server.sh                       # SERVER_HOST defaults to 10.9.0.10 (mesh)
#    First run halts at the DB __SET_ME__ guard: fill PROSPECTOR_DB_* in
#    /opt/prospector/.env on ct.prod from step 3, then re-run deploy-server.sh.

# 6. Install the Caddy edge on ct.prod (public TLS for apps.ftw.pw).
scp deploy/edge/apps.ftw.pw.Caddyfile root@10.9.0.10:/etc/caddy/Caddyfile
ssh root@10.9.0.10 'apt-get install -y caddy && systemctl restart caddy'
#    Verify: https://apps.ftw.pw/prospector/ loads; https://apps.ftw.pw/internal/inbound -> 403.

Legacy reference — the lime bootstrap (internal-only now)

The steps below were written for lime and remain accurate for the DB + env + systemd mechanics, which are identical on ct.prod (the deploy script does them). lime itself is now internal-only; the app + edge moved to ct.prod.

Probed 2026-06-29: lime = lilith-store-backend, Ubuntu 24.04, public 209.38.51.98 · wg 10.9.0.5 · VPC 10.20.0.2. Postgres 16 + pgbouncer fronts the DO Managed cluster. NestJS 11 needs Node 20+. SSH alias lime (root, ~/.ssh/id_ed25519_1984).

⚠️ These steps sudo-write a SHARED prod host. They were blocked under auto mode (correctly). Run them in a non-auto session, or grant a Bash(ssh ct.prod *) permission rule, or run them yourself.

1. Node 20 on the droplet

ssh lime 'curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - && sudo apt-get install -y nodejs && node -v'

(mac-sync uses Bun, so a system Node bump is safe for it.)

2. Create the two DBs — on the DO Managed Postgres cluster

There is no local Postgres. The droplet's pgbouncer (:6432) fronts a DO Managed Postgres cluster: private-lilith-store-pg-do-user-28217120-0.l.db.ondigitalocean.com:25060 (holds the live quinn DB). So people + prospector are new databases on that managed cluster (additive — does NOT touch quinn):

  • Via Terraform IaC (the DO infra is Terraform-managed in uvlava/terraform/do). The DBs + dedicated users are already declared (pg_databases += people/prospector; digitalocean_database_user.{people,prospector}). Just apply:
    cd ~/Code/@projects/uvlava/terraform/do
    TF_VAR_do_token=<your DO token> terraform apply   # additive: +2 dbs, +2 users, 0 destroy
    terraform output -raw people_db_password
    terraform output -raw prospector_db_password
    terraform output -raw pg_host        # private cluster host for the .env
    
  • Services connect directly to the managed endpoint over SSL (skip the shared pgbouncer to avoid touching live pooling): *_DB_HOST=private-lilith-store-pg-..., *_DB_PORT=25060, *_DB_SSL=true. (Optionally add [databases] entries to /etc/pgbouncer/pgbouncer.ini + reload to pool them, but that touches shared infra.)

3. Apply migrations

# prospector
for f in 0001_prospector 0002_drafts 0003_corrections; do
  ssh lime "sudo -u postgres psql -d prospector" < migrations/$f.sql ; done
# people (from the cocottetech repo)
ssh lime "sudo -u postgres psql -d people" < <people-service>/migrations/0001_people.sql

4. Ship the built code

Build locally, rsync dist + manifests, install prod deps on the droplet:

npm run build && npm run build -w @prospector/mcp-prospector
rsync -az --delete dist package.json package-lock.json migrations lime:/opt/prospector/
ssh lime 'cd /opt/prospector && npm ci --omit=dev'
# people-service likewise to /opt/people-service

5. Env on the droplet (/opt/prospector/.env)

NODE_ENV=production
PROSPECTOR_API_PORT=3210
PROSPECTOR_DB_HOST=private-lilith-store-pg-do-user-28217120-0.l.db.ondigitalocean.com
PROSPECTOR_DB_PORT=25060          # DO managed cluster (direct, SSL)
PROSPECTOR_DB_SSL=true
PROSPECTOR_DB_NAME=prospector
PROSPECTOR_DB_USER=prospector
PROSPECTOR_DB_PASSWORD=<from doctl databases user create>
PROSPECTOR_SERVICE_TOKEN=<strong-token>
PEOPLE_BASE_URL=http://127.0.0.1:3061
PEOPLE_SERVICE_TOKEN=<people-token>
MACSYNC_BASE_URL=http://127.0.0.1:3201   # mac-sync runs on this same droplet
MACSYNC_SERVICE_TOKEN=<macsync-token>
MACSYNC_DEVICE_ID=<device>
MRNUMBER_BASE_URL=https://my.transquinnftw.com
MRNUMBER_SERVICE_TOKEN=<mr-token>

(people-service gets its own /opt/people-service/.env with PEOPLE_DB_* + PEOPLE_SERVICE_TOKEN.)

6. systemd units (/etc/systemd/system/{prospector,people-service}.service)

[Service]
WorkingDirectory=/opt/prospector
EnvironmentFile=/opt/prospector/.env
ExecStart=/usr/bin/node dist/main.js
Restart=always
User=root
[Install]
WantedBy=multi-user.target

sudo systemctl enable --now people-service prospectorcurl localhost:3061/health, curl localhost:3210/health.

7. Wire mac-sync → prospector webhook

In the @mac-sync server (same droplet): on a new inbound, fire-and-forget POST http://127.0.0.1:3210/internal/inbound with Authorization: Bearer $PROSPECTOR_SERVICE_TOKEN, body {handle, channel:'imessage', text, occurredAt, hasCallSignal?}. Env-gated (PROSPECTOR_WEBHOOK_URL/token) so macsync runs standalone if unset. (Redo cleanly — the earlier agent left partial edits in @mac-sync.)

8. Point the dev UI at prod (over the mesh)

web/.env.local:

PROSPECTOR_API_URL=http://10.9.0.5:3210
PROSPECTOR_SERVICE_TOKEN=<the prod PROSPECTOR_SERVICE_TOKEN>

Restart npm run dev -w @prospector/web. The vite proxy injects the token; the panel now shows real prod decisions.

Verify (go-live)

/health both services → real inbound (or prospector_submit_inbound) → appears in prospector/activity → kill-switch flip persists → dev UI shows it over the mesh.

Post-migration notes (2026-06-29 unification)

  • Run new migrations: for f in migrations/0006_bilingual.sql ; do ssh lime "sudo -u postgres psql -d prospector" < $f ; done
  • Bilingual now in prospect_drafts (original/translated/detected_lang); Triage/Detail/Reports use dual when present (data from macsync inbound + future classifier trans).
  • MCP (@packages/mcp-prospector) now exposes full tools (prospector_* + legacy mappings for cockpit parity): list, thread, draft, send, mr, pastebin, reports, markets, classify, submit, held, activity, etc. Use with PROSPECTOR_BASE_URL + TOKEN. Replaces LP mcp-prospector.
  • UI fused: Triage = designs/main-view + inbox-ops + LP Stream; Reports = 4 reports + engine subs (Experiments/Patterns/Actions); Queue = queued-tasks + owed/backfill; etc. PWA install in Control.
  • LP can now drop prospector (see MIGRATION-PLAN in session plan file for removal list + proxies during cutover).
  • Rebuild/redeploy mcp + app after changes.