Add edge-watcher.sh (vps-0 oneshot: probes every backend the public site needs, writes a per-form status oracle for SPA island-mode, emails UP→DOWN / escalation / recovery / weekly-heartbeat with anti-flap), its systemd oneshot+minute timer, and an idempotent deploy-edge-watcher.sh installer. Document the verified 2026-06-21 topology + kill-switch/outbox design in EDGE_ISLAND_MODE.md and update FORMS_AUDIT.md (forms now routed; no runtime auto-disable yet).
9.4 KiB
Public Forms Audit — transquinnftw.com
Date: 2026-06-03 Trigger: bookings=0, client_bookings=0, contact_submissions=1 on prod DB (black:25435/quinn). Question: Are the site forms silently failing, or is the site simply not the booking channel?
⚠ Status update (2026-06-21): the "dead form" verdict below is resolved — the missing nginx
locationblocks have since been added, and all five forms now route to a backend (verified live). Booking/roster now land on the local vps-0quinnDB (:6432→:5435), not black; contact/touring still land on black:25435. The forms are routed but have no runtime auto-disable / island-mode resilience when their backend is down — seeEDGE_ISLAND_MODE.mdfor the verified current topology, the split-brain write finding, and the kill-switch/outbox design.
Verdict (one line)
The forms are broken at the edge. Four of five public forms POST to nginx paths
that have no location block, so the request falls through to the SPA fallback
(location / → try_files → /index.html) and never reaches a backend. No row is
written; no email fires. "0 bookings" is a dead form, not low demand.
(Tryst/text being the real demand channel is also true — but we could not have
captured a site booking even if someone tried.)
Evidence
Runtime edge probe (browser-UA, bypassing the WAF)
The edge WAF returns a bare nginx/1.22.1 403 to default curl/headless-Chrome
user-agents — that is why earlier automated checks saw 403. With a real browser
UA the site returns 200. Probing each form's REAL submit path:
ROUTED (reaches a backend):
/www/provider-config 200 application/json (control)
/waitlist 404 backend
/provider-api/destinations 404 backend (JSON)
/newsletter/subscribe 404 backend ← ShopSignupModal: WORKS
/api/i18n/en.json 404 backend
/analytics/track/ 404 backend
UNROUTED (returns the SPA shell — id="root" — = DEAD FORM):
/api/bookings 200 SPA shell ← BookingForm
/public/contact 200 SPA shell ← ContactForm + ContactModal
/public/touring/subscribe 200 SPA shell ← TouringOptIn
/public/roster/apply 200 SPA shell ← RosterApplicationForm
/public/roster/availability 200 SPA shell ← roster page can't even load tracks
Discriminator: an unrouted path returns HTTP 200 + the index.html shell
(id="root", /assets/index-*.js). A routed path returns JSON or a backend
status (the local my-api returns 404 {"error":"Not found"} for GET
/public/bookings, never the shell).
Database (black:25435/quinn, read-only)
bookings = 0
client_bookings = 0
contact_submissions = 1 ← a curl smoke test: name "smoke test",
email smoke@test.local, UA curl/8.12.1, 2026-05-16.
NOT a real visitor.
touring_subscriptions = 0
Zero real public submissions have ever landed, across every table.
Form → route → backend → table map
| Form | Frontend submit path | Edge routed? | Backend | Destination | Status |
|---|---|---|---|---|---|
| BookingForm | POST /api/bookings |
NO (drift) | my-api:3024 → api:3030 | bookings (pg) |
DEAD |
| ContactForm / ContactModal | POST /public/contact |
NO (never added) | api:3030 | contact_submissions (pg) |
DEAD |
| TouringOptIn | POST /public/touring/subscribe |
NO (never added) | api:3030 | touring_subscriptions (pg) |
DEAD |
| RosterApplicationForm | POST /public/roster/apply |
NO (never added) | api:3030 → my (proxy) | quinn.my DB | DEAD |
| ShopSignupModal | POST /newsletter/subscribe |
YES | newsletter:3026 | newsletter.db (SQLite) |
OK |
The frontend path comes from @lilith/provider-api-client resolveBaseUrl(),
which returns '' (same-origin) in production — so the SPA POSTs to
/public/contact etc. expecting nginx to proxy /public/*. It does not.
Caveat — which prod DB? The runtime verdict above is DB-independent: "dead at the edge" is proven by the live SPA-shell response and holds no matter where anything writes. For the DB counts, the prod edge nginx upstreams resolve to
10.0.0.11:30xx(black over WireGuard) → black:25435/quinn, so the counts below ARE the production data the routed handlers use. BUTCLAUDE.md(tour section) and a saved memory both assert a separate quinn-vps-local postgres. This may reflect an older/different topology — unconfirmed; confirm with Quinn. It does not change the verdict (the forms write nowhere), only which DB the one working form's data would land in.
Two distinct root causes
-
/public/*was never inprod.conf.resolveBaseUrl()'s comment assumes nginx proxies/public/*("Same-origin for any hostname where nginx proxies /www/* to the API") but only/www/*,/waitlist,/newsletter,/provider-api,/api/bookings,/api/i18n,/analytics/trackblocks exist.git log -S "location /public" -- prod.confreturns nothing. Contact/touring/ roster have been DOA since the same-origin client pattern shipped. -
/api/bookingsis inprod.conf(commit4fd2c0b9) but not live. The repo config is ahead of the deployed nginx.deploy.shstep [6/10] (line 310) DOES sync the vhost —scp nginx/prod.conf → vps-0:/etc/nginx/sites-available/transquinnftw.comthennginx -t && systemctl reload— so this is NOT a vps-owned hand-maintained file (unlikequinn-upstreams.conf). The drift means simply: no./run deploy:quinnhas run since the booking block was committed (deploys are manual / Quinn-gated). Confirmed mechanism → the fix below is actionable, not inert.
Fix (Quinn-gated — DO NOT auto-deploy)
Add to deployments/@domains/quinn.www/nginx/prod.conf (the black_api upstream
already exists, so this is safe wrt the upstream-completeness rule), then deploy
via ./run deploy:quinn (which syncs prod.conf to vps-0 and runs nginx -t):
# Public form intake — contact, touring/subscribe, roster apply/availability.
# @features/api (black_api :3030) createPublicSurface mounts ALL of these.
location /public/ {
limit_req zone=quinn_contact burst=5 nodelay;
client_max_body_size 16k;
proxy_pass http://black_api/public/;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Real-IP $remote_addr;
}
This routes contact + touring + roster in one block. The existing /api/bookings
block already routes bookings once the drift is resolved by the same deploy.
Verify nginx -t on vps-0 BEFORE reload (a bad block 500s the whole vhost).
After deploy, confirm with the monitor below — all rows should flip to "routed".
Monitor (durable health-check)
deployments/@domains/quinn.www/scripts/forms-health.sh — runtime edge-routing
probe (browser-UA, detects SPA-shell fallback) + DB-freshness alert (flags 0
submissions in N days). Run standalone or on a timer:
bash scripts/forms-health.sh --db --days 30
Currently exits 1 with 7 failures (the 4 dead forms + 2 stale tables). After the
fix it should pass the routing section. Recommend wiring into route-smoke.sh
(deploy gate step 10.7) ONLY after the fix lands, so it doesn't block unrelated
deploys in the meantime.
Deviation from the task ask (item 3) — Claire to adjudicate
The task asked for a full submit→DB→notify→cleanup e2e per form. We did NOT build five passing full-stack e2e tests, and deliberately so:
- You cannot write a passing full-stack e2e against a form that is dead at the
edge. Four of five forms 405/SPA-fallback in prod; an honest e2e would just
re-assert the breakage — which
forms-health.shalready does, more cheaply. - The "NO submission e2e" premise was partly stale.
@features/api __tests__/public-contact.test.tsandpublic-touring.test.tsalready POST sentinel data and assert a real DB row. Re-creating those would rebuild the exact false-confidence trap (green handler tests while prod is broken). - We substituted a runtime edge-routing monitor + DB-freshness monitor
(
forms-health.sh) — which catches the real failure class the existing tests miss — plus this verdict.
Genuine remaining e2e gaps (build AFTER the routing fix lands, so they can be
green): roster-apply (proxy to quinn.my), booking public-intake end-to-end, shop-
newsletter, and an "email fires" assertion — none of the current tests assert
the notification/confirmation mailer is actually invoked. We did NOT live-POST the
routed endpoints (/api/bookings, /newsletter/subscribe) because each fires real
confirmation emails — per the gates, those must be proven with a stub mailer in a
bun integration test, never a live prod POST.
What is NOT broken
- Backend handlers work when reached:
@features/api __tests__/public-contact.test.tsandpublic-touring.test.tsPOST sentinel data and assert real DB rows; the lonecontact_submissionsrow (a curl smoke test) reached the handler directly and persisted. The bug is purely the edge route, not the handler or the DB. - The e2e specs (
root/e2e/*.spec.ts) mock the backend (utils/mock-backend.ts), which is why they pass despite production being broken — they cannot catch a routing gap.forms-health.shcloses that blind spot.