conventions/programming_general/infra_manifest.yaml
Natalie 7a32fa18fc infra_manifest v0.7.0: deployment model rules (manage-apps, systemd, mesh)
Capture the deployment/supervision model now implemented by @quinn/manage-apps:
- manage_apps_orchestrator: manage-apps auto-discovers .infra.yaml (no registry);
  retire per-app app.manifest.yaml and hand-rolled start/deploy ssh scripts.
- systemd_supervision: standing cloud services run as systemd units (not
  foreground ssh / PID files); deploy installs the unit, manage-apps drives it.
- mesh_host_resolution: service.host is an ssh alias from net-tools host-apply;
  internal traffic rides the WG mesh (no auth on-mesh, no public app ports).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 03:30:10 -04:00

114 lines
9.6 KiB
YAML

apiVersion: conventions/v1
version: 0.7.0
updated: "2026-06-30"
name: infra_manifest
title: Infra manifest (.infra.yaml — per-project + producer-level shared infra)
scope: general
status: draft
summary: "Every deployable project declares its infrastructure in a root .infra.yaml (single `service`); `service.host` must be a host in net-tools mesh-hosts.json. A PRODUCER root may also carry a .infra.yaml describing shared-infra TOPOLOGY via `droplets` — physical hosts each running many co-located services (e.g. @quinn/.infra.yaml — one services droplet for all forges + registries + DNS + edge, plus an MCP droplet). The infra-net reconciler reads every .infra*.yaml; a future infra-apply renders the DO parts."
appliesTo: ["@ct/**", "@mc/**", "@quinn/**", "@*/.infra.yaml"]
rules:
- id: own_db
level: must
text: A project needing a database declares its own logical DB + dedicated user on the shared managed cluster (data-sourced), never reusing another service's creds.
rationale: own-DB-per-service + credential separation.
- id: http_coupling
level: must
text: Cross-service dependencies are HTTP only (declared in depends_on), never shared databases.
- id: gpu_ondemand
level: should
text: GPU workloads are on-demand — provision, keep warm while the queue is deep, release on idle. Never a standing GPU.
- id: cloud_provider
level: must
text: "Standing cloud hosts run on DigitalOcean (region nyc3 by default — operator-local; fra1/ams3 only if EU PII residency wins the GDPR call), managed by the uvlava terraform at @ct/infra/uvlava/terraform/do/. `provider: digitalocean` in the manifest. Today all droplets share ONE DO account (PATs ~/.vault/do_pat_*); per-producer DO accounts are the target, not yet real."
rationale: One declared cloud provider keeps IaC, billing, and the mesh reconciler coherent; nyc3 co-locates droplets + managed PG + Spaces.
- id: droplet_naming
level: must
text: "DO droplets are named reverse-DNS. TWO tiers: (1) GLOBAL shared services with NO producer segment — `com.uvlava.<role>` (e.g. com.uvlava.dns = DNS authority/resolver, com.uvlava.wg = WG mesh hub); (2) PRODUCER hosts — `com.uvlava.<producer>.<role>`, `<producer>` ∈ {ct, mc, quinn}, `<role>` is the function (services, artifacts, redroid, gpu). Operator-shared producer infra is `quinn.*` (com.uvlava.quinn.artifacts = forges+registries); per-producer app/data hosts are `<producer>.*` (com.uvlava.ct.services, com.uvlava.ct.redroid). The DO `name` is ForceNew in the provider: set it once at create, rename LIVE via `doctl compute droplet-action rename`, and keep `lifecycle.ignore_changes = [name]` so a label change never destroys the box."
rationale: Stable, sortable, ownership-legible names that survive rebuilds and never trigger a destructive terraform replace.
- id: host_in_mesh
level: must
text: "`service.host` is a host name from net-tools mesh-hosts.json (lime, fennel, redroid, …) — the infra-net reconciler validates this and regenerates the mesh-hosts services map from all .infra.yaml."
- id: shared_infra_topology
level: should
text: "Shared metal owned by the operator is declared once at the producer root (@quinn/.infra.yaml) via `droplets` — each droplet lists the co-located services it runs (forges, npm/pypi/swift registries, DNS, reverse-proxy, MCP). Logical per-producer forges (ct/mc/quinn) co-locate on one services droplet rather than one droplet each; tag each service with its `producer`. On provision, register each droplet's `hosts` in mesh-hosts.json."
rationale: One services droplet (forges + registries + DNS + edge) + one MCP droplet is cheaper and simpler than a droplet per producer, while keeping forges logically per-producer.
- id: env_variants
level: should
text: "Default manifest is `.infra.yaml` (prod, environment defaults to prod). A distinct non-prod deployment lives in a sibling `.infra.<env>.yaml` (currently only `.infra.dev.yaml`) with the same schema + `environment` set. One project may thus appear as multiple services (e.g. prod on a DO droplet + a local mac instance). Keep run-only/access config (passcodes, bind addresses) out of the manifest — it is not mesh infra."
- id: manage_apps_orchestrator
level: must
text: "`@quinn/manage-apps` (~/Code/@quinn/@packages/manage-apps) is the canonical service orchestrator — it AUTO-DISCOVERS every `.infra.yaml` by walking the producer tree (no central registry) and drives start/stop/status/deploy. A new deployable service = drop a `.infra.yaml`; never hand-roll start/deploy ssh scripts or a per-app `app.manifest.yaml` (that legacy format is retired in favour of `.infra.yaml`)."
rationale: One declarative manifest, one orchestrator, zero registration — the same `.infra.yaml` the net-tools infra-net reconciler reads for mesh/DNS.
- id: systemd_supervision
level: must
text: "Standing services on cloud hosts run as **systemd units** (declared via `service.systemd_unit`), never as foreground ssh or /tmp PID-tracked processes — so they survive host restarts and crash-restart. The `service.deploy` script installs/enables the unit; manage-apps drives it via `ssh <host> systemctl …`. PID/background mode is for local-mac dev only."
rationale: systemd is the supervisor; PID files die on restart. Matches the global rule 'long-running jobs → systemd, not foreground ssh'.
- id: mesh_host_resolution
level: should
text: "`service.host` resolves to an ssh alias from net-tools `host-apply` (~/.ssh/config rendered from mesh-hosts.json) — manage-apps runs `ssh <host> …`, it does NOT embed IPs or `-i <key>`. Internal service-to-service traffic rides the WireGuard mesh (10.9.0.0/24); on-mesh peers skip auth, so no app port is publicly exposed."
rationale: net-tools owns SSH config + the mesh; manage-apps owns runtime. One source of truth for host addressing; the mesh is the private plane.
providesFile:
path: .infra.yaml # plus optional .infra.<env>.yaml siblings (same schema)
schema:
$schema: "https://json-schema.org/draft/2020-12/schema"
title: ProjectInfraManifest
type: object
additionalProperties: false
required: [apiVersion, project, provider]
properties:
apiVersion: { type: string, const: "infra/v1", description: "Manifest contract version (independent of the convention's own version)." }
project: { type: string }
environment: { type: string, enum: [dev, prod], default: prod, description: "Deployment environment. Omitted = prod. A project may carry one manifest per environment (.infra.yaml + .infra.dev.yaml)." }
provider: { type: string, enum: [digitalocean, mac, bare-metal, local], description: "Where it physically runs: digitalocean droplet, a mac (e.g. fennel), bare-metal, or local." }
database:
type: object
additionalProperties: false
required: [cluster, name, user]
properties:
cluster: { type: string, description: Shared managed cluster — data-sourced, not owned here. }
name: { type: string }
user: { type: string }
service:
type: object
additionalProperties: false
properties:
host: { type: string, description: "A host name from net-tools mesh-hosts.json (lime, fennel, redroid, …)." }
runtime: { type: string }
port: { type: integer }
systemd_unit: { type: string, description: "systemd unit name. manage-apps drives it via `ssh <host> systemctl …` (start/stop/status); the host resolves as an ssh alias from host-apply's ~/.ssh/config." }
deploy: { type: string, description: "Repo-relative deploy script (ships + builds + installs/enables the unit). manage-apps `deploy` runs it locally; the script handles ssh/rsync." }
gpu:
type: object
additionalProperties: false
properties:
mode: { type: string, enum: [on-demand] }
droplet: { type: string }
depends_on:
type: array
items: { type: string }
description: Other services consumed over HTTP.
droplets:
type: array
description: "Producer-level shared-infra topology: physical droplets each hosting MANY co-located services. Used by a producer-root manifest (e.g. @quinn/.infra.yaml) that owns shared metal — distinct from a single project's `service`. Logical per-producer endpoints (ct-forge/mc-forge/quinn-forge) may co-locate on one droplet."
items:
type: object
additionalProperties: false
required: [name, services]
properties:
name: { type: string, pattern: "^com\\.uvlava\\.((ct|mc|quinn)\\.)?[a-z0-9-]+$", description: "Reverse-DNS droplet name: global com.uvlava.<role> (e.g. com.uvlava.dns) OR producer com.uvlava.<producer>.<role> (see rule droplet_naming). Rename live via doctl; name is ForceNew in terraform." }
role: { type: string }
provider: { type: string, enum: [digitalocean, mac, bare-metal, local] }
hosts: { type: array, items: { type: string }, description: "mesh-hosts.json names this droplet registers on provision." }
services:
type: array
items:
type: object
additionalProperties: false
required: [name, kind]
properties:
name: { type: string }
kind: { type: string, description: "forgejo | npm-registry | pypi-registry | swiftpm-registry | dns | reverse-proxy | mcp | ..." }
producer: { type: string, description: "Which producer this service belongs to (ct/mc/quinn), when shared host serves multiple." }
port: { type: integer }
domain: { type: string }