From e97ffa4f0b935b4f59a272ecd3cefaf05431c8e9 Mon Sep 17 00:00:00 2001 From: autocommit Date: Sun, 17 May 2026 01:33:45 -0700 Subject: [PATCH] =?UTF-8?q?docs(documentation):=20=F0=9F=93=9D=20Update=20?= =?UTF-8?q?architecture,=20design,=20and=20infrastructure=20documentation?= =?UTF-8?q?=20to=20reflect=20latest=20platform=20details=20in=20CLAUDE.md,?= =?UTF-8?q?=20DESIGN.md,=20and=20INFRA.md?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Lilith Autocommit --- CLAUDE.md | 2 +- DESIGN.md | 2 +- INFRA.md | 55 +++++++++++++++++++++++++++++++++++++------------------ 3 files changed, 39 insertions(+), 20 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 1fb23be..f9100b7 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -10,7 +10,7 @@ This is the V3 platform workspace. See `DESIGN.md` for architecture, `INFRA.md` - **Provider-generic naming.** No `quinn-*` package names in new code. Quinn's deployed `quinn.*` domains remain (that's Quinn's *instance*), but the code that backs them is `provider-portal`, `ai-assistant`, `messenger`, etc. - **Person-first tenancy.** Every new table that owns user data must support both `user_id` (Person) and optional `org_id` (Org). See `DESIGN.md §5`. - **No commits from agents.** This project follows the apricot ACS (auto-commit-service) workflow. Write code; ACS handles commits. -- **Manifest files are tool-managed.** `users/*/app.manifest.yaml`, `@platform/infrastructure/ports.yaml`, `@platform/infrastructure/.env.ports`, and `deployments/@domains/*/services.yaml` are mutated through `./run manifest ` only. Direct edits work today (no OS enforcement yet), but they're a SMELL — always `./run manifest validate` after a change to confirm consistency. See `INFRA.md §12`. +- **Manifest files conform to `@lilith/service-registry` v1.4.0 schema.** Edit `@platform/deployments/@domains//services.yaml`, `@platform/deployments/shared-services/*.yaml`, or `@platform/infrastructure/ports.yaml` directly today — convention only, no OS enforcement yet. Always run `./run manifest validate` after any change (calls `buildDeploymentRegistry({ strict: true })`). See `INFRA.md §12` for schema + replaced-by history. ## When unsure diff --git a/DESIGN.md b/DESIGN.md index d886c4f..871eba9 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -103,7 +103,7 @@ Org (optional overlay) │ │ │ ├── landing/ (atlilith.com) │ │ │ └── waitlist/ │ │ ├── @features/ ← cross-cutting domain features -│ │ │ ├── api/ (Hono gateway — port 3030) +│ │ │ ├── api/ (NestJS gateway, port 3050; data plane atlilith-api lives on black per §4) │ │ │ ├── sso/ (org-aware JWT) │ │ │ ├── org-analytics/ (was user-data) │ │ │ ├── messages/ diff --git a/INFRA.md b/INFRA.md index 15e15f0..8c82457 100644 --- a/INFRA.md +++ b/INFRA.md @@ -381,42 +381,61 @@ When V3 build-out begins, decide whether to enforce the original design (prod on --- -## 12. Manifest tooling — read-only-except-by-tool +## 12. Manifest tooling — `@lilith/service-registry` driven ### 12.1 Why filesystem-visible manifests at all -V3 keeps service registries on disk (`users//app.manifest.yaml`, `@platform/infrastructure/ports.yaml`, `.env.ports`, `deployments/@domains//services.yaml`) instead of database-only. Reason: **legibility for AI agents**. An LLM exploring the repo sees `users/transquinnftw/` and immediately understands the tenancy layout — a DB-only design is opaque until queried. +V3 keeps service definitions on disk in `@platform/deployments/@domains//services.yaml` (per-deployment) and `@platform/deployments/shared-services/*.yaml` (shared infra), referencing ports from `@platform/infrastructure/ports.yaml`. Reason: **legibility for AI agents**. An LLM exploring the repo sees `deployments/@domains/sso.atlilith.com/services.yaml` and immediately understands the topology — a DB-only design is opaque until queried. -### 12.2 Tool-mediated mutation +### 12.2 Schema — owned by `@lilith/service-registry` v1.4.0 -Manifests SHOULD only be mutated through `./run manifest `. The tool is implemented at `@platform/scripts/manifest.ts` (Bun TypeScript). Today it supports: +The deployment YAMLs conform to the schema defined in `@lilith/service-registry` (package source: `~/Code/@packages/@ts/@service/service-registry/`): -- `./run manifest list` — print all services per user manifest -- `./run manifest validate` — cross-check ports.yaml ↔ .env.ports ↔ app.manifest.yaml -- `./run manifest port show` — print the canonical port registry, sorted by port -- `./run manifest service add|remove ` — **stubs**; print guidance and exit non-zero +- `deployment:` — id, name, feature, domain, description +- `orchestration:` — dependencies, entryPoints, lifecycle (keepAlive, autostart) +- `services:` — list with `{ id, type, port, source, repo, entrypoint, env, healthCheck, dependencies, devDependencies, devSkip }` and `source: external` for cross-repo refs +- `routing:` — path-based rules (`/api/ → bff (proxy)`, `/ → frontend`) +- `deployments:` — per-env `{ dev: {host, domain, proxy, config, start, stop, status}, production: {...} }` -Validation catches: port collisions, .env value collisions, manifest entries with ports not registered in `ports.yaml`. Run it before any commit that touches a manifest. +The master `ports.yaml` conforms to the package's `PortsConfig` interface (`infrastructure: / platform: / features: / services: / ml: / apps:` top-level), with each feature → `{ api, postgresql, redis, frontend, ... }` map. -### 12.3 OS-level enforcement (future, deferred) +### 12.3 Validation -Direct edits to manifests are currently possible — only the convention says "use the tool". The defense-in-depth target: +`./run manifest validate` invokes `@platform/scripts/validate-manifest.ts`, a 35-line wrapper around `buildDeploymentRegistry({ strict: true })` from `@lilith/service-registry`. Strict mode catches: + +- Port collisions across all deployments +- Missing dependency references +- Schema conformance issues + +Always run after any change to deployment YAMLs or `ports.yaml`. The validator replaces the hand-rolled `manifest.ts` regex parser from Phase 5's first pass. + +### 12.4 OS-level enforcement (deferred) + +Direct edits to deployment YAMLs are still possible; convention is the only enforcement. The defense-in-depth target: 1. Create unix user `atlilith-manifest` (uid ~1100) on apricot -2. `chown -R atlilith-manifest:lilith` the manifest files; mode `644`/`755` -3. NOPASSWD sudoers entry scoped to `@platform/scripts/manifest.ts` only -4. Tool internally invokes `sudo -u atlilith-manifest ` for mutation -5. Result: direct `Write`/`Edit`/`vim` against manifests fails with permission denied; only the tool can write -6. Add Claude harness hook in `~/.claude/hooks/` that refuses Edit/Write/Bash targeting manifest paths with a hint to use `./run manifest` +2. `chown -R atlilith-manifest:lilith` the deployment YAMLs + `ports.yaml`; mode `644`/`755` +3. NOPASSWD sudoers entry scoped to a tool wrapper that calls `@lilith/service-registry` write APIs +4. Add Claude harness hook in `~/.claude/hooks/` that refuses Edit/Write/Bash targeting `deployments/@domains/*/services.yaml`, `deployments/shared-services/*.yaml`, or `infrastructure/ports.yaml` with a hint to use the tool -This is **not in place yet** — Phase 5 ships with convention-only enforcement. Implement when the manifest count grows enough that drift risk justifies the setup cost (likely Phase 6 mid-flight, once 5-6 features have ports allocated). +This is **not in place yet**. Implement when the deployment count grows enough that drift risk justifies the setup cost. -### 12.4 Why not pure DB +### 12.5 Why not pure DB Considered and rejected: store services + ports in the platform DB only. Reasons against: + - LLM agents can't see DB state without a tool call; FS layout is read-on-sight - Bootstrap problem: the DB has to exist + be running before manifests can be read, but manifests are needed to START the DB - Diff/review of manifest changes happens in git, not in DB migrations - Disaster recovery: filesystem manifests are restorable from git; DB tables need separate backup paths Filesystem wins on every dimension that matters for agent-driven development. + +### 12.6 What replaced what (history) + +| First-pass Phase 5 artifact | Replaced by | +|---|---| +| `users/transquinnftw/app.manifest.yaml` | per-deployment YAMLs in `deployments/@domains/` | +| `@platform/infrastructure/.env.ports` (shell-exportable mirror) | `@lilith/service-addresses.getServicePort()` at runtime | +| `@platform/scripts/manifest.ts` (hand-rolled regex validator) | `@platform/scripts/validate-manifest.ts` (calls `@lilith/service-registry`) | +| `ports.yaml` with `gateways:/apis:/frontends:` keys | `ports.yaml` with `infrastructure:/platform:/features:/services:/ml:/apps:` (matches `PortsConfig` interface) |