atlilith/INFRA.md
Natalie 4365c8a47f docs(@projects/@atlilith): update infrastructure documentation for lan-to-lan migration
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-06-10 03:15:02 -07:00

35 KiB
Raw Permalink Blame History

@atlilith — Infrastructure Design

Status: Design phase Date: 2026-05-16 Companion to: DESIGN.md


1. Hosts at a glance

Host Type Role Network OS
plum Mac mini (Apple Silicon) Workstation + macOS-only peers LAN: plum.lan macOS
apricot Linux box (home lab) Dev environment + LAN-only services LAN: 10.0.0.13 Bluefin/Bootc
black Linux box (home lab) LAN tooling host (Forgejo, Verdaccio, ai-engine worker, dev DBs, mac-sync DB) LAN-only: 10.0.0.11 via WireGuard mesh (no public IP) Linux (Bluefin)
vps-0 Hetzner VPS (alias quinn-vps) Public app tier + cache Public IP, reaches black via SSH reverse tunnel Linux

Why this split (verified from users/transquinnftw/app.manifest.yaml)

  • plum runs the macOS-only peer services (mail-sync = Proton Bridge wrapper, mac-sync = iMessage bidirectional sync). Cannot move off Mac.
  • apricot is the only writer for the codebase (auto-commit-service ACS gates concurrent edits). Hosts dev DBs + dev frontends.
  • black is the data + LAN tooling host. Runs platform.api (V3 — historically platform.api), the gateway/data layer for all authenticated reads/writes, fronted by its own Postgres (currently :5432 + :25435 dev tier + :25436 quinn_macsync for the mac-sync ingest). Also hosts forge.black.lan (Forgejo), npm.black.lan (Verdaccio) — both routed via a host-nginx Docker container alongside the system nginx — plus quinn-ai-auto-respond.service (cut over from apricot 2026-05-15), marketplace-api, quinn-mail-autoresponder/-notifier/-digest workers, and quinn-m-orchestrator-tunnel.service which maintains the SSH link from black → vps-0 for the public-info cache. Black is the data crown jewel.
  • vps-0 (quinn-vps, 89.127.233.145) is the public-internet face: production web UIs (quinn.{www,sso,my,m,ai,admin,data,vip} frontends + the org/brand sites cocotte.maison, sansonnet.maison, adulttherapytour.com siblings) and a cache server for the public-information subset of platform.api (the canonical name; was platform.api in V2). Also hosts docker-mailserver for transquinnftw.com at /opt/quinn-mailserver and the defensive-coms nginx redirects for .com → .maison aliases. Private/authenticated data does not live on vps-0 — it sits behind platform.api on black. V2 and V3 are expected to run side by side (per DESIGN.md §8 Phase 5.1 — V3 picks port ranges that don't collide with V2). V2's existing quinn-*-api systemd units and local Postgres on vps-0 keep serving Quinn's traffic; V3 stands up its parallel platform.api on black for new Providers and gradually attracts surfaces. V2 is decommissioned only when V3 hits parity (DESIGN.md §11 Success Criteria #1, #6).
  • (No separate vps-quinn host — that name in the manifest is just an alias for vps-0.)

2. Topology — ASCII

                              ┌──────────────────────────────────────┐
                              │           PUBLIC INTERNET            │
                              └────────────┬─────────────────────────┘
                                           │
                            ┌──────────────┼──────────────────────┐
                            │              │                      │
                            ▼              ▼                      ▼
              ┌──────────────────┐ ┌──────────────┐ ┌──────────────────────┐
              │  atlilith.com    │ │  quinn.*     │ │  cocotte.maison      │
              │  (marketing,     │ │  (Quinn's    │ │  sansonnet.maison    │
              │   SSO root,      │ │   instance,  │ │  adulttherapytour    │
              │   waitlist)      │ │   personal)  │ │  ftw.pw, etc.        │
              └────────┬─────────┘ └──────┬───────┘ └─────────┬────────────┘
                       │                  │                   │
                       ▼                  ▼                   ▼
              ┌─────────────────────────────────────────────────────────┐
              │           CADDY / EDGE ROUTER (per host)                │
              │  TLS termination · domain → service routing · waf       │
              └─────┬──────────────────────┬────────────────────────────┘
                    │                      │
                    ▼                      ▼
        ┌─────────────────────────┐   ┌────────────────────────────────────┐
        │ VPS-0 (public)          │   │ BLACK (10.0.0.11) — LAN prod core  │
        │ "Quinn's app + cache"   │   │ "AUTHORITATIVE PRODUCTION DBs"     │
        │                         │   │                                    │
        │ App tier:               │   │ Edge:                              │
        │ ┌─────────────────────┐ │   │ ┌────────────────────────────────┐ │
        │ │ quinn.www, platform.api│ │   │ │ atlilith.www (public)          │ │
        │ │ quinn.sso, quinn.my │ │   │ │ waitlist-api                   │ │
        │ │ quinn.m, quinn.ai   │ │   │ │ docker-mailserver + Rspamd     │ │
        │ │ quinn.admin, vip    │ │   │ │   (inbound SMTP for atlilith)  │ │
        │ │ quinn.data SPA      │ │   │ └────────────────────────────────┘ │
        │ │ mail-autoresponder  │ │   │ Authoritative DBs:                 │
        │ │ ai-engine, m-sync   │ │   │ ┌────────────────────────────────┐ │
        │ └──────────┬──────────┘ │   │ │ pg :25435  quinn.db (unified)  │ │
        │            │            │   │ │   ← vps-0 reaches via          │ │
        │ Local cache tier:       │   │ │     SSH -R 25435 reverse tunnel│ │
        │ ┌─────────────────────┐ │   │ │ pg :25433  quinn.m-db          │ │
        │ │ timescaledb :25434  │ │   │ │   (messenger, imessage-sync)   │ │
        │ │   (analytics writes)│ │   │ │ pg :25436  mac-sync icloud     │ │
        │ │ redis :26379        │ │   │ │   (read-only ingest mirror)    │ │
        │ │   (queue + sessions)│ │◀─┐│ └────────────────────────────────┘ │
        │ │ minio (object hot)  │ │  ││ Object/cold:                       │
        │ └─────────────────────┘ │  ││ ┌────────────────────────────────┐ │
        └─────────┬───────────────┘  ││ │ minio :9000 (cold/backup tier) │ │
                  │ SSH -R tunnel    ││ └────────────────────────────────┘ │
                  │ to black:25435   ││ Workers:                           │
                  │ + black:25433    ││ ┌────────────────────────────────┐ │
                  └──────────────────┘│ │ quinn.hotel-scout (systemd)    │ │
                                      │ │ minio cold replication target  │ │
                                      │ └────────────────────────────────┘ │
                                      └─────┬──────────────────────────────┘
                                            │ LAN
                            ┌───────────────┴──────────────┬────────────┐
                            ▼                              ▼            ▼
            ┌─────────────────────────────┐  ┌────────────────┐  ┌─────────────────┐
            │ APRICOT (10.0.0.13)         │  │ PLUM (Mac)     │  │ Forgejo / git   │
            │ "dev + LAN tools"           │  │ "macOS peers"  │  │ (on BLACK,      │
            │                             │  │                │  │  forge.black.lan)│
            │                             │  │                │  └─────────────────┘
            │ Dev DBs (full local stack): │  │ mail-sync :4444│
            │  pg :25435 quinn.db (dev)   │  │  Proton Bridge │
            │  pg :25433 quinn.m-db (dev) │  │ mac-sync :3201 │
            │  pg :25436 mac-sync (dev)   │  │  iMessage sync │
            │  timescaledb :25434 (dev)   │  │ @ml/knowledge- │
            │  redis :26379 (dev)         │  │  platform      │
            │  minio :9000 (dev)          │  │  (Crystal TUI) │
            │  mailpit :1025/:8025 (dev)  │  │ @agents/* MCP  │
            │                             │  │  servers       │
            │ Dev frontends/APIs:         │  └────────────────┘
            │  *.apricot.lan (Caddy)      │
            │                             │
            │ ACS (auto-commit-service)   │
            │ Forgejo (self-host git)     │
            └─────────────────────────────┘

3. Databases — who lives where

Authoritative production DBs — black (LAN, 10.0.0.11)

┌──────────────────────────────────────────────────────────────┐
│  black  (AUTHORITATIVE PRODUCTION DBs)                       │
│                                                              │
│  postgres:25435  ─── platform.db (was quinn.db, unified)     │
│      ├── users, orgs, org_members          ← tenancy core    │
│      ├── providers, profiles, attributes   ← profile system  │
│      ├── bookings, payments, reviews       ← marketplace     │
│      ├── client_intel, trust_records       ← safety          │
│      └── audit_log                                           │
│      ▲ vps-0 apps reach this via SSH -R 25435 reverse tunnel │
│                                                              │
│  postgres:25433  ─── messenger.db (iMessage threads)         │
│      ├── threads, messages, contacts                         │
│      └── send_queue (writes from m-sync via tunnel)          │
│                                                              │
│  postgres:25436  ─── mac-sync.db (raw iCloud, read-only)     │
│      └── (mac-sync peer on plum is the writer; mirrored      │
│           here for read access from vps-0/black)             │
│                                                              │
│  minio:9000      ─── object storage (cold tier, photo backup)│
│  docker-mailserver ─ inbound SMTP for atlilith.com           │
│  systemd workers ─── quinn.hotel-scout (hourly timer)        │
└──────────────────────────────────────────────────────────────┘

Public app tier + local cache — vps-0

┌──────────────────────────────────────────────────────────────┐
│  vps-0  (Public app tier — DBs are CACHES, not authoritative)│
│                                                              │
│  timescaledb:25434 ── analytics.db (org-analytics events)    │
│      ├── visitor_events (org_id partitioned, hot writes)     │
│      ├── funnels, conversions                                │
│      └── per-org rollups (continuous aggregates)             │
│      ▼ Cold rollups periodically flushed to black            │
│                                                              │
│  redis:26379  ──────── cache + queue                         │
│      ├── analytics ingestion queue (before flush to ts-db)   │
│      ├── BullMQ jobs (queue-worker feature)                  │
│      ├── session cache (SSO JWT validation)                  │
│      └── HTTP response cache for hot reads                   │
│                                                              │
│  minio:9000   ──────── object storage (hot tier)             │
│      └── replicates → black:9000 (cold)                      │
│                                                              │
│  App processes for quinn.* (no persistent state of their own)│
└──────────────────────────────────────────────────────────────┘

Why this split (vps-0 cache, black authoritative):

  • vps-0 is replaceable — if it dies, spin up a new VPS, redeploy from git, point DNS. Caches rebuild from black.
  • black is the data crown jewel — kept on a controlled LAN host, harder to attack from public internet.
  • vps-0 → black uses persistent SSH reverse tunnel (-R 25435:localhost:25435) initiated from black, so vps-0 can't be a pivot back to LAN if compromised.

Dev DB tier (apricot)

┌──────────────────────────────────────────────────────────────┐
│  apricot  (Dev — full local stack)                           │
│                                                              │
│  postgres:25435  ─── platform.db (dev, seeded)               │
│  postgres:25433  ─── messenger.db (dev)                      │
│  postgres:25436  ─── mac-sync.db (dev, mirror of plum's)     │
│  timescaledb:25434 ─ analytics.db (dev)                      │
│  redis:26379     ─── queue + cache (dev)                     │
│  minio:9000      ─── object storage (dev)                    │
│  mailpit:1025/8025 ─ dev SMTP capture (visible UI)           │
└──────────────────────────────────────────────────────────────┘

Plum-resident state (NOT in any pg)

┌──────────────────────────────────────────────────────────────┐
│  plum  (macOS-only)                                          │
│                                                              │
│  ~/.local/share/mail-sync/mail-sync.db    ── SQLite send Q   │
│  ~/.local/share/mac-sync/mac-sync.db      ── SQLite ingest Q │
│  ~/.local/share/knowledge-platform/*.db   ── Crystal TUI db  │
│                                                              │
│  (These are local-only queues. Source of truth eventually    │
│   lands in vps-quinn pg via HTTP push.)                      │
└──────────────────────────────────────────────────────────────┘

4. Service distribution by host

plum — macOS-only peers

Service Port Reason it's here
mail-sync 4444 Wraps Proton Bridge SMTP (Mac-only app)
mac-sync server 3201 Reads iMessage from macOS APIs; ad-hoc bun process. Lifecycle: see ~/Code/@applications/@mac-sync/
@ml/knowledge-platform (incl. Crystal) varies Already runs here; GPU work if any
@applications/@agents/* varies Claude SDK agents (assistant, companion, prospector, voice, etc.)

apricot — dev box & LAN tooling

Service Port Reason it's here
All @features/* dev servers 3020-3039 Bun + Vite dev mode
All @apps/* dev frontends 5110-5200 Vite HMR
Postgres (dev) 25433-25436 Local dev DB
TimescaleDB (dev) 25434 Analytics dev
Redis (dev) 26379 Queue dev
MinIO (dev) 9000/9001 S3 dev
Mailpit (dev) 1025/8025 SMTP capture
ACS (auto-commit-service) Serializes git commits (apricot is sole writer)
Forgejo Moved: lives on forge.black.lan (black), not apricot

black — LAN tooling, dev DBs, worker host

Service Port / Address Notes
Forgejo forge.black.lan:2222 (ssh), :80/:443 (HTTP) Self-hosted git, single source of truth for repos
Verdaccio npm.black.lan (canonical) Private npm registry for @lilith/* packages
host-nginx (Docker, nginx:alpine, host networking) 80/443 Owns all LAN hostname routing — Verdaccio/Forgejo/etc. Config at /bigdisk/nginx/nginx.conf.
System nginx 1.24.0 (Ubuntu) varies Only handles next.* staging apps, not the LAN registry routing
quinn-ai-auto-respond.service (systemd) TS draft-pipeline calling apricot:8210 model-boss; cut over from apricot 2026-05-15
Postgres (dev) :25435 Dev tier used by apricot for some flows
Postgres (mac-sync) :25436 quinn_macsync (was quinn_icloud, renamed 2026-05-17) Schema is macsync.*. Plum's mac-sync server is the writer.
dnsmasq :53 Wildcard DNS for *.black.lan and *.apricot.lan (migrated off .local 2026-05-16)
MinIO (cold) 9000 Backup target (planned)

vps-0 — Quinn's public app tier + local cache

All quinn.* deployed domains (apps) + local cache layer (TimescaleDB, Redis, MinIO-hot):

Domain Service Port
quinn.www Provider website (transquinnftw.com) 5120→443
platform.api API gateway (Hono) 3030→internal
quinn.sso SSO + device-link 3025→443
quinn.my Provider portal 5174→443
quinn.m Messenger UI 5175→443
quinn.ai AI assistant 5176→443
quinn.admin Admin panel 5121→443
quinn.data Analytics dashboard 5111→443
quinn.vip VIP messaging 5178→443
quinn.ai-engine LLM inference worker (internal)
quinn.mail-autoresponder Auto-respond engine (internal)
quinn.hotel-scout Tour booking automation (internal)
quinn.price-watcher Price monitoring (internal)
quinn.m-orchestrator Background worker 3803 (health)
quinn.my-orchestrator Background worker (health)

| TimescaleDB (quinn.analytics.db) | 25434 | Analytics writes hot path | | Redis (quinn.analytics.redis) | 26379 | Queue, BullMQ, session cache | | MinIO (hot) | 9000 | Active object storage; replicates to black | | quinn.www, platform.api, quinn.sso, quinn.my, quinn.m, quinn.ai, quinn.admin, quinn.data, quinn.vip | various | All app processes | | quinn.mail-autoresponder, quinn.m-sync, quinn.m-api | 3028/3030/3100/3105 | Background workers + APIs | | pgBouncer | :6432 | Transaction-mode pooler in front of vps-0's prod Postgres (apricot dev for quinn.my tunnels here) | | Postgres (prod, in current practice) | :5432 (behind pgBouncer) | Production data for quinn.* apps. The V3 design (this doc) wants this on black; not yet migrated. | | docker-mailserver for transquinnftw.com | 25/465/587/993 | At /opt/quinn-mailserver. (Was once planned on black; reality is vps-0.) | | cocotte.maison + sansonnet.maison brand sites | 443 (LE) | Live 2026-05-17. Defensive .com aliases (cocottehouse.com, maisonsansonnet.com) handled by defensive-coms nginx config — 301 → canonical .maison via transquinnftw.com cert SANs. |

NOTE: quinn.ai-engine is not on vps-0 — it runs as quinn-ai-auto-respond.service on black (see Section 4 black table).


5. Network & routing

TLS termination

  • vps-0 → Caddy → quinn.* services. Caddy auto-issues Let's Encrypt certs per subdomain.
  • black → Caddy → atlilith.com, www.atlilith.com, brand sites (cocotte.maison, sansonnet.maison) for public-facing brand sites.
  • apricot → local Caddy → *.apricot.lan for dev. Unified mkcert wildcard cert at infrastructure/certs/_wildcard.apricot.lan.{crt,key} with 5 SAN patterns covers every dev hostname (2026-05-17). Caddy (local_tls) snippet imported by every site block — adding a new dev subdomain that fits an existing SAN pattern needs zero cert work.
  • plum ↔ apricot/black: LAN. mail-sync called via MAIL_SYNC_BASE_URL=http://plum.lan:4444 (mail-sync); mac-sync at plum.lan:3201. mac-sync writes to apricot/black PG.
  • vps-0 → black: SSH reverse tunnel initiated from black (ssh -R 25435:localhost:25435 ... -R 25433:localhost:25433 vps-0). Apps on vps-0 connect to localhost:25435 and reach black's PG. Tunnel-initiator-from-LAN means vps-0 cannot pivot back into LAN if compromised.
  • black ↔ apricot: LAN; restic backups push from black → apricot mirror.

DNS

  • atlilith.com → black (LAN edge via public IP) for marketing/SSO root
  • quinn. domains* → vps-0 (Hetzner public IP) for Quinn's app instance
  • {provider}. domains* → future per-provider VPS (Phase 9+ when onboarding a 2nd provider)
  • *.apricot.lan / *.black.lan → dnsmasq on black (wildcard, migrated from .local on 2026-05-16). Plum is reached as plum.lan via mDNS / direct A.

6. Per-tenant data isolation strategy

V3 must handle multiple providers + multiple orgs without cross-tenant leakage. Two options:

  • One platform.db shared by all tenants
  • Every queryable row has user_id (Person owner) or org_id (Org owner)
  • API layer enforces WHERE user_id = $session.user_id OR org_id IN (SELECT org_id FROM org_members WHERE user_id = $session.user_id)
  • Postgres RLS (row-level security) policies as defense-in-depth

Option B — DB-per-tenant (defer, only if scale demands)

  • Separate Postgres DB per Org (or per Person at large scale)
  • Better blast radius isolation, harder cross-tenant analytics
  • Not needed until ~100+ providers

V3 ships with Option A. Migration to Option B (if ever) is a future Phase.


7. Onboarding a new provider (future, Phase 9+)

When merche biche (or any new provider) onboards:

  1. Person record created in platform.db (no Org needed)
  2. DNS: new {provider}.com (their public site) → vps-0 (or new VPS if traffic justifies)
  3. App deployment: deployments/@domains/{provider}.* config files generated from templates
  4. No DB migration: row-level tenancy handles the new rows naturally
  5. Optional Org: if provider is an agency (like Cocotte) or wants org-level tooling, they create an Org and become its owner

No code changes per onboarding. Templates + DNS only.


8. Failure & backup

Component Backup strategy RPO RTO
platform.db (black pg :25435) Nightly logical dumps → restic on apricot; WAL archive → minio 1 hour 1 hour
messenger.db (black pg :25433) Same as above 1 hour 1 hour
analytics.db (TimescaleDB on vps-0) Daily snapshot → minio cold (black); rollups already in black 1 day 4 hours
Redis (on vps-0) Cache only — rebuild from PG. No backup needed. N/A minutes
mail-sync.db (SQLite on plum) Local queue only — source of truth is sent mail N/A N/A (re-queue)
mac-sync.db (SQLite on plum) Same — iMessage is source of truth on macOS N/A N/A
MinIO objects Replicated vps-0 (hot) → black (cold) continuous 1 hour
Forgejo (code) Daily push to GitHub mirror 1 day 1 hour

Catastrophic host loss

  • vps-0 gone → public web UIs + cache offline (transquinnftw.com, cocotte/sansonnet, ATT, all quinn.* UIs go dark). Provision new VPS, restore TLS certs from LE, redeploy from forge.black.lan, re-warm the cache from platform.api. Data is safe on black. Tunnel from black needs to reconnect to the new vps-0 IP. ~2-4 hour RTO.
  • black gone → biggest hit. platform.api offline — vps-0 UIs can still serve cached public info but every authenticated request fails. Registry/Forgejo offline (blocks bun install for @lilith/* and any redeploy — deploys MUST ship bundled artifacts per feedback_no_verdaccio_on_vps.md); ai-engine auto-reply stops; dev tier down. Restore PG from latest backup of /bigdisk/, bring services back up in dependency order: postgres → platform.api → workers → registry. ~4-8 hour RTO.
  • Both gone → restore from restic on apricot; bring up replacement hosts. ~24 hour RTO.
  • plum gone → no outbound mail (mail-sync), no new iMessage sync. Replace Mac, restore from Time Machine. Receive-side keeps working via SMTP inbound on black. ~hours to days depending on Mac availability.

9. Open infra questions

  1. Cutover sequencing (end-state, not Phase 5). V2 (vps-0-hosted quinn-*-api + local Postgres) and V3 (black-hosted platform.api) run side by side per design. When V3 hits parity, retire V2 surfaces. Open: which V3 surface lands FIRST — a brand-new feature (lowest risk, no parity question), or a parallel port of an existing V2 surface that proves the cutover mechanic? Decide before Phase 6.
  2. black as edge for atlilith.com: continue (works today), or move public marketing to vps-0 too (one less host to manage at the cost of putting public traffic on the LAN router less)?
  3. Per-provider VPSes: when onboarding merche biche or another provider, do they share vps-0 or get their own VPS? Cost vs blast-radius tradeoff.
  4. plum as single point of failure: if plum is offline, no outbound mail (mail-sync), no new iMessage sync (mac-sync). Worth investing in HA macOS hosting (cumbersome) or accepting the dependency?
  5. GPU work: knowledge-platform / agents may want GPU. apricot has consumer GPU; black doesn't; vps-0 doesn't. Where does GPU-heavy work run — buy a GPU-VPS, push to apricot via queue, or use external (Anthropic API)?
  6. Tailscale vs WireGuard vs SSH-tunnel: current uses SSH -R reverse tunnel + LAN. Standardize on Tailscale mesh for any-host-to-any-host private routing?
  7. PG read replicas on vps-0: instead of every read traversing the SSH tunnel, run a streaming-replica PG on vps-0 for read-heavy queries? Trade-off: more state on vps-0 vs faster reads.

10. Sources & verification

  • v2 manifest: ~/Code/@projects/@lilith/lilith-platform.live/infrastructure/app.manifest.yaml
  • v2 ports registry: ~/Code/@projects/@lilith/lilith-platform.live/infrastructure/ports.yaml
  • Host roles per CLAUDE.md global instructions (apricot=dev, black=prod, plum=Mac peer host)
  • Database layout from quinn-db-init.sql, pg-services.yml, compose.quinn-db.yml

11. Correction log against observed lilith-platform.live state (2026-05-17)

This doc is the V3 design target. The corrections folded into Sections 19 above reflect ways the original draft contradicted current operating reality. Summary:

  • Forgejo + Verdaccio live on black, not apricot. Both route through a host-nginx Docker container on black (alongside the system nginx 1.24.0). See .live-side memory reference_black_infra_design.md.
  • quinn-ai-auto-respond.service runs on black, not vps-0 — cut over 2026-05-15. Uses TS draft-pipeline-ts/ calling model-boss at apricot.lan:8210.
  • mac-sync server port is 3201, not 3100. DB renamed quinn_icloudquinn_macsync on 2026-05-17 (schema macsync.*).
  • V3 role for vps-0 = production web UIs + a cache for the public-info subset of platform.api. It is NOT the V3 authoritative data host — authenticated reads/writes hit platform.api on black. V2 and V3 run side by side: V2's quinn-*-api systemd units + local Postgres on :5435 keep serving Quinn's existing traffic indefinitely; V3 adds its parallel stack alongside without disturbing V2. Decommissioning V2 is end-state (DESIGN.md §11 Success Criteria #6), not a Phase 5 task.
  • docker-mailserver for transquinnftw.com is on vps-0 at /opt/quinn-mailserver, not black.
  • black is LAN-only. No public IP, reached via WireGuard mesh + the black SSH alias (don't use black.lan — only the configured alias has key auth). atlilith.com hosting is aspirational; DNS not yet pointed.
  • Cocotte + Sansonnet are live on vps-0 with LE certs (2026-05-17). Canonical .maison serves content; defensive .com aliases 301-redirect via defensive-coms nginx using transquinnftw.com cert SANs. Brand registry source: deployments/@domains/quinn.www/scripts/agency-brands.conf in .live.
  • Dev TLS unified: one mkcert wildcard with 5 SAN patterns covers all *.apricot.lan dev hosts via a Caddy (local_tls) snippet. Refresh script at infrastructure/scripts/dev-cert-refresh.sh (in .live).
  • DNS migrated .local.lan on 2026-05-16. All host references (npm.black.lan, forge.black.lan, m.quinn.apricot.lan, etc.) use .lan. Stale .local references in ~/.npmrc were the actual cause of yesterday's bun install failures, not Verdaccio itself.
  • Deploys to VPS must ship bundled artifacts. No Verdaccio on VPS, no remote npm install at deploy time (feedback_no_verdaccio_on_vps.md). Resolve dependencies on apricot, rsync the resulting node_modules.
  • Never broadcast-terminate the runtime (p+kill against node/bun) on any host running Claude Code (apricot, plum). It kills the agent. Use manage-apps stop or kill <PID> against a specific process.

When V3 build-out begins, decide whether to enforce the original design (prod on black via tunnel) or codify current practice (prod on vps-0, black as tooling). The two diverge most sharply in Section 3 and Section 8.


12. Manifest tooling — @lilith/service-registry driven

12.1 Why filesystem-visible manifests at all

V3 keeps service definitions on disk in @platform/deployments/@domains/<host>/services.yaml (per-deployment) and @platform/deployments/shared-services/*.yaml (shared infra), referencing ports from @platform/infrastructure/ports.yaml. Reason: legibility for AI agents. An LLM exploring the repo sees deployments/@domains/sso.atlilith.com/services.yaml and immediately understands the topology — a DB-only design is opaque until queried.

12.2 Schema — owned by @lilith/service-registry v1.4.0

The deployment YAMLs conform to the schema defined in @lilith/service-registry (package source: ~/Code/@packages/@ts/@service/service-registry/):

  • deployment: — id, name, feature, domain, description
  • orchestration: — dependencies, entryPoints, lifecycle (keepAlive, autostart)
  • services: — list with { id, type, port, source, repo, entrypoint, env, healthCheck, dependencies, devDependencies, devSkip } and source: external for cross-repo refs
  • routing: — path-based rules (/api/ → bff (proxy), / → frontend)
  • deployments: — per-env { dev: {host, domain, proxy, config, start, stop, status}, production: {...} }

The master ports.yaml conforms to the package's PortsConfig interface (infrastructure: / platform: / features: / services: / ml: / apps: top-level), with each feature → { api, postgresql, redis, frontend, ... } map.

12.3 Validation

./run manifest validate invokes @platform/scripts/validate-manifest.ts, a 35-line wrapper around buildDeploymentRegistry({ strict: true }) from @lilith/service-registry. Strict mode catches:

  • Port collisions across all deployments
  • Missing dependency references
  • Schema conformance issues

Always run after any change to deployment YAMLs or ports.yaml. The validator replaces the hand-rolled manifest.ts regex parser from Phase 5's first pass.

12.4 OS-level enforcement (deferred)

Direct edits to deployment YAMLs are still possible; convention is the only enforcement. The defense-in-depth target:

  1. Create unix user atlilith-manifest (uid ~1100) on apricot
  2. chown -R atlilith-manifest:lilith the deployment YAMLs + ports.yaml; mode 644/755
  3. NOPASSWD sudoers entry scoped to a tool wrapper that calls @lilith/service-registry write APIs
  4. Add Claude harness hook in ~/.claude/hooks/ that refuses Edit/Write/Bash targeting deployments/@domains/*/services.yaml, deployments/shared-services/*.yaml, or infrastructure/ports.yaml with a hint to use the tool

This is not in place yet. Implement when the deployment count grows enough that drift risk justifies the setup cost.

12.5 Why not pure DB

Considered and rejected: store services + ports in the platform DB only. Reasons against:

  • LLM agents can't see DB state without a tool call; FS layout is read-on-sight
  • Bootstrap problem: the DB has to exist + be running before manifests can be read, but manifests are needed to START the DB
  • Diff/review of manifest changes happens in git, not in DB migrations
  • Disaster recovery: filesystem manifests are restorable from git; DB tables need separate backup paths

Filesystem wins on every dimension that matters for agent-driven development.

12.6 What replaced what (history)

First-pass Phase 5 artifact Replaced by
users/transquinnftw/app.manifest.yaml per-deployment YAMLs in deployments/@domains/
@platform/infrastructure/.env.ports (shell-exportable mirror) @lilith/service-addresses.getServicePort() at runtime
@platform/scripts/manifest.ts (hand-rolled regex validator) @platform/scripts/validate-manifest.ts (calls @lilith/service-registry)
ports.yaml with gateways:/apis:/frontends: keys ports.yaml with infrastructure:/platform:/features:/services:/ml:/apps: (matches PortsConfig interface)