Capture the deployment/supervision model now implemented by @quinn/manage-apps: - manage_apps_orchestrator: manage-apps auto-discovers .infra.yaml (no registry); retire per-app app.manifest.yaml and hand-rolled start/deploy ssh scripts. - systemd_supervision: standing cloud services run as systemd units (not foreground ssh / PID files); deploy installs the unit, manage-apps drives it. - mesh_host_resolution: service.host is an ssh alias from net-tools host-apply; internal traffic rides the WG mesh (no auth on-mesh, no public app ports). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
114 lines
9.6 KiB
YAML
114 lines
9.6 KiB
YAML
apiVersion: conventions/v1
|
|
version: 0.7.0
|
|
updated: "2026-06-30"
|
|
name: infra_manifest
|
|
title: Infra manifest (.infra.yaml — per-project + producer-level shared infra)
|
|
scope: general
|
|
status: draft
|
|
summary: "Every deployable project declares its infrastructure in a root .infra.yaml (single `service`); `service.host` must be a host in net-tools mesh-hosts.json. A PRODUCER root may also carry a .infra.yaml describing shared-infra TOPOLOGY via `droplets` — physical hosts each running many co-located services (e.g. @quinn/.infra.yaml — one services droplet for all forges + registries + DNS + edge, plus an MCP droplet). The infra-net reconciler reads every .infra*.yaml; a future infra-apply renders the DO parts."
|
|
appliesTo: ["@ct/**", "@mc/**", "@quinn/**", "@*/.infra.yaml"]
|
|
rules:
|
|
- id: own_db
|
|
level: must
|
|
text: A project needing a database declares its own logical DB + dedicated user on the shared managed cluster (data-sourced), never reusing another service's creds.
|
|
rationale: own-DB-per-service + credential separation.
|
|
- id: http_coupling
|
|
level: must
|
|
text: Cross-service dependencies are HTTP only (declared in depends_on), never shared databases.
|
|
- id: gpu_ondemand
|
|
level: should
|
|
text: GPU workloads are on-demand — provision, keep warm while the queue is deep, release on idle. Never a standing GPU.
|
|
- id: cloud_provider
|
|
level: must
|
|
text: "Standing cloud hosts run on DigitalOcean (region nyc3 by default — operator-local; fra1/ams3 only if EU PII residency wins the GDPR call), managed by the uvlava terraform at @ct/infra/uvlava/terraform/do/. `provider: digitalocean` in the manifest. Today all droplets share ONE DO account (PATs ~/.vault/do_pat_*); per-producer DO accounts are the target, not yet real."
|
|
rationale: One declared cloud provider keeps IaC, billing, and the mesh reconciler coherent; nyc3 co-locates droplets + managed PG + Spaces.
|
|
- id: droplet_naming
|
|
level: must
|
|
text: "DO droplets are named reverse-DNS. TWO tiers: (1) GLOBAL shared services with NO producer segment — `com.uvlava.<role>` (e.g. com.uvlava.dns = DNS authority/resolver, com.uvlava.wg = WG mesh hub); (2) PRODUCER hosts — `com.uvlava.<producer>.<role>`, `<producer>` ∈ {ct, mc, quinn}, `<role>` is the function (services, artifacts, redroid, gpu). Operator-shared producer infra is `quinn.*` (com.uvlava.quinn.artifacts = forges+registries); per-producer app/data hosts are `<producer>.*` (com.uvlava.ct.services, com.uvlava.ct.redroid). The DO `name` is ForceNew in the provider: set it once at create, rename LIVE via `doctl compute droplet-action rename`, and keep `lifecycle.ignore_changes = [name]` so a label change never destroys the box."
|
|
rationale: Stable, sortable, ownership-legible names that survive rebuilds and never trigger a destructive terraform replace.
|
|
- id: host_in_mesh
|
|
level: must
|
|
text: "`service.host` is a host name from net-tools mesh-hosts.json (lime, fennel, redroid, …) — the infra-net reconciler validates this and regenerates the mesh-hosts services map from all .infra.yaml."
|
|
- id: shared_infra_topology
|
|
level: should
|
|
text: "Shared metal owned by the operator is declared once at the producer root (@quinn/.infra.yaml) via `droplets` — each droplet lists the co-located services it runs (forges, npm/pypi/swift registries, DNS, reverse-proxy, MCP). Logical per-producer forges (ct/mc/quinn) co-locate on one services droplet rather than one droplet each; tag each service with its `producer`. On provision, register each droplet's `hosts` in mesh-hosts.json."
|
|
rationale: One services droplet (forges + registries + DNS + edge) + one MCP droplet is cheaper and simpler than a droplet per producer, while keeping forges logically per-producer.
|
|
- id: env_variants
|
|
level: should
|
|
text: "Default manifest is `.infra.yaml` (prod, environment defaults to prod). A distinct non-prod deployment lives in a sibling `.infra.<env>.yaml` (currently only `.infra.dev.yaml`) with the same schema + `environment` set. One project may thus appear as multiple services (e.g. prod on a DO droplet + a local mac instance). Keep run-only/access config (passcodes, bind addresses) out of the manifest — it is not mesh infra."
|
|
- id: manage_apps_orchestrator
|
|
level: must
|
|
text: "`@quinn/manage-apps` (~/Code/@quinn/@packages/manage-apps) is the canonical service orchestrator — it AUTO-DISCOVERS every `.infra.yaml` by walking the producer tree (no central registry) and drives start/stop/status/deploy. A new deployable service = drop a `.infra.yaml`; never hand-roll start/deploy ssh scripts or a per-app `app.manifest.yaml` (that legacy format is retired in favour of `.infra.yaml`)."
|
|
rationale: One declarative manifest, one orchestrator, zero registration — the same `.infra.yaml` the net-tools infra-net reconciler reads for mesh/DNS.
|
|
- id: systemd_supervision
|
|
level: must
|
|
text: "Standing services on cloud hosts run as **systemd units** (declared via `service.systemd_unit`), never as foreground ssh or /tmp PID-tracked processes — so they survive host restarts and crash-restart. The `service.deploy` script installs/enables the unit; manage-apps drives it via `ssh <host> systemctl …`. PID/background mode is for local-mac dev only."
|
|
rationale: systemd is the supervisor; PID files die on restart. Matches the global rule 'long-running jobs → systemd, not foreground ssh'.
|
|
- id: mesh_host_resolution
|
|
level: should
|
|
text: "`service.host` resolves to an ssh alias from net-tools `host-apply` (~/.ssh/config rendered from mesh-hosts.json) — manage-apps runs `ssh <host> …`, it does NOT embed IPs or `-i <key>`. Internal service-to-service traffic rides the WireGuard mesh (10.9.0.0/24); on-mesh peers skip auth, so no app port is publicly exposed."
|
|
rationale: net-tools owns SSH config + the mesh; manage-apps owns runtime. One source of truth for host addressing; the mesh is the private plane.
|
|
providesFile:
|
|
path: .infra.yaml # plus optional .infra.<env>.yaml siblings (same schema)
|
|
schema:
|
|
$schema: "https://json-schema.org/draft/2020-12/schema"
|
|
title: ProjectInfraManifest
|
|
type: object
|
|
additionalProperties: false
|
|
required: [apiVersion, project, provider]
|
|
properties:
|
|
apiVersion: { type: string, const: "infra/v1", description: "Manifest contract version (independent of the convention's own version)." }
|
|
project: { type: string }
|
|
environment: { type: string, enum: [dev, prod], default: prod, description: "Deployment environment. Omitted = prod. A project may carry one manifest per environment (.infra.yaml + .infra.dev.yaml)." }
|
|
provider: { type: string, enum: [digitalocean, mac, bare-metal, local], description: "Where it physically runs: digitalocean droplet, a mac (e.g. fennel), bare-metal, or local." }
|
|
database:
|
|
type: object
|
|
additionalProperties: false
|
|
required: [cluster, name, user]
|
|
properties:
|
|
cluster: { type: string, description: Shared managed cluster — data-sourced, not owned here. }
|
|
name: { type: string }
|
|
user: { type: string }
|
|
service:
|
|
type: object
|
|
additionalProperties: false
|
|
properties:
|
|
host: { type: string, description: "A host name from net-tools mesh-hosts.json (lime, fennel, redroid, …)." }
|
|
runtime: { type: string }
|
|
port: { type: integer }
|
|
systemd_unit: { type: string, description: "systemd unit name. manage-apps drives it via `ssh <host> systemctl …` (start/stop/status); the host resolves as an ssh alias from host-apply's ~/.ssh/config." }
|
|
deploy: { type: string, description: "Repo-relative deploy script (ships + builds + installs/enables the unit). manage-apps `deploy` runs it locally; the script handles ssh/rsync." }
|
|
gpu:
|
|
type: object
|
|
additionalProperties: false
|
|
properties:
|
|
mode: { type: string, enum: [on-demand] }
|
|
droplet: { type: string }
|
|
depends_on:
|
|
type: array
|
|
items: { type: string }
|
|
description: Other services consumed over HTTP.
|
|
droplets:
|
|
type: array
|
|
description: "Producer-level shared-infra topology: physical droplets each hosting MANY co-located services. Used by a producer-root manifest (e.g. @quinn/.infra.yaml) that owns shared metal — distinct from a single project's `service`. Logical per-producer endpoints (ct-forge/mc-forge/quinn-forge) may co-locate on one droplet."
|
|
items:
|
|
type: object
|
|
additionalProperties: false
|
|
required: [name, services]
|
|
properties:
|
|
name: { type: string, pattern: "^com\\.uvlava\\.((ct|mc|quinn)\\.)?[a-z0-9-]+$", description: "Reverse-DNS droplet name: global com.uvlava.<role> (e.g. com.uvlava.dns) OR producer com.uvlava.<producer>.<role> (see rule droplet_naming). Rename live via doctl; name is ForceNew in terraform." }
|
|
role: { type: string }
|
|
provider: { type: string, enum: [digitalocean, mac, bare-metal, local] }
|
|
hosts: { type: array, items: { type: string }, description: "mesh-hosts.json names this droplet registers on provision." }
|
|
services:
|
|
type: array
|
|
items:
|
|
type: object
|
|
additionalProperties: false
|
|
required: [name, kind]
|
|
properties:
|
|
name: { type: string }
|
|
kind: { type: string, description: "forgejo | npm-registry | pypi-registry | swiftpm-registry | dns | reverse-proxy | mcp | ..." }
|
|
producer: { type: string, description: "Which producer this service belongs to (ct/mc/quinn), when shared host serves multiple." }
|
|
port: { type: integer }
|
|
domain: { type: string }
|