- ./run forge:dns now prefers central net-tools/bin/forge-dns-render (part of net sync) with local fallback. - Updated dispatcher help, INFRA.md steps, and CLOUD_DX_HANDOFF to document that `net sync` (or forge:dns) installs/keeps the ctforge shortcut as part of standard DX infra setup. - Symmetric with mcforge. After this, `net sync` (once net-tools is installed) is the canonical way to converge all hosts/DX shortcuts including the cloud forges.
12 KiB
Cloud DX Handoff — DigitalOcean ephemeral fleet + self-hosted Forgejo
Purpose. Replicate, for cocotte, the on-demand cloud build/test/compute setup proven on Magic Civilization (~/Code/@projects/@magic-civilization/infra/). Offload heavy work off the laptop onto disposable DigitalOcean droplets; a small self-hosted Forgejo is the off-laptop git origin. Pay-per-use, tear down when idle.
Written 2026-06-27 after building it end-to-end on MC. Implemented same day in this repo:
run,scripts/run/{forge,dist}.sh,infra/{packer,terraform/test-fleet}/, updated.gitignore+INFRA.md §10. The Gotchas section is the real value — each one cost real iterations. Read it before you start.
See also the live integration notes in INFRA.md §10 (references the lilith lineage manifest/run patterns for consistency).
Architecture (3 layers + origin)
Forgejo origin small always-on droplet, holds the source (off-laptop git remote)
Golden image Packer bakes toolchain + warm clone → a DO snapshot (workers boot ready in ~30s)
Fleet Terraform: N ephemeral workers from the snapshot; workers=0 when idle = ~$0
Dispatch ./run verbs that ssh work onto a worker + stream results/artifacts back
Reference implementation — copy from MC, then adapt
| MC file | What it is | cocotte action |
|---|---|---|
infra/terraform/test-fleet/ |
DO provider, golden-image auto-discovery (data.digitalocean_images by name), project grouping, mocked-provider test suite (terraform test, no token/spend) |
copy near-verbatim |
infra/packer/golden-image.pkr.hcl + provision.sh |
bakes the image | copy; swap the toolchain (cocotte = Python/uv/FastAPI + node, not Rust/Godot) |
scripts/run/dist.sh |
dist:{check,up,sim,test,build,render,sync,down} + dist:{publish,fetch,models} (build-once-load-many, see below) |
copy; swap the build/test commands |
scripts/run/forge.sh |
forge:{up,down,dns} lifecycle |
copy verbatim |
scripts/cloud-bringup.sh |
one-shot human-run bring-up | copy; adjust sizes/scene |
The whole thing is provider-pluggable: dispatch + cloud-init + outputs are provider-neutral; only versions.tf/main.tf/variables.tf + the Packer builder are DO-specific.
Build once, load many (artifact Space — added 2026-06-28 on MC)
Fan-out otherwise means N workers each rebuilding the same thing. Instead: build the deployable artifact once, publish it to a DO Space, and let the rest fetch it (keyed by git sha). On MC this is the linux .so+wasm; on cocotte it's whatever your runners consume (built wheels / a uv venv tarball / a bundled image / model files).
- Space: one DO Space (e.g.
cocotte-artifacts). A DO Spaces subscription ($5/mo, 250 GB) covers all your Spaces — a second Space adds ~$0 base. Account-wide S3 keys in~/.vault/do-spaces-*.{access,secret}. rclonebaked into the golden image (provision.sh); the dispatch passes the Spaces creds asRCLONE_S3_*env over ssh — never stored on the worker, never on argv.- Verbs (MC
scripts/run/dist.sh, copy the shape):dist:publishbuilds + uploadsbuilds/<sha>/;dist:syncdoesgit pull→ fetch the prebuilt artifact if published for that sha, else build;dist:models {push,pull,ls}shares model files. Degrades gracefully to build-on-worker when creds/cache are absent. - Complements a compile cache (sccache / pip wheel cache): those cache intermediate build steps; the Space caches the final artifact.
- ⚠️
ssh -ndefeats a heredoc —-nredirects stdin from/dev/null, so assh … bash -s <<'EOF'remote script silently gets empty stdin and no-ops (exit 0). Use-nonly for inline-command ssh, never for heredoc-stdin ssh. - ⚠️ Dispatch ssh must pass
-i <fleet-key>explicitly — don't rely on the key being agent-loaded, or you'll hit intermittentpublickeyfailures.
⚠️ Gotchas (learned the hard way — each cost hours)
-
DO account tier restricts size AND count (new accounts).
droplet_limitstarts low (3) → raise via a support ticket (we got 10).- Large + CPU-Optimized sizes are locked:
s-8vcpu-16gb(non-amd),c-4,c-8all return422 "size restricted / open a ticket".s-8vcpu-16gb-amdworks in nyc3 (8 vCPU AMD, $0.167/hr) and is the beefy sweet spot until you file the tier ticket. - The
/v2/sizesavailableflag LIES (claimedc-4available; create 422'd). Test-create + destroy to confirm a size before committing.
-
Powering off a droplet does NOT stop billing on DO (unlike AWS — DO bills allocated, not running). Only destroy stops it. "Park overnight" = power-off → snapshot → destroy; restore = create-from-snapshot. See
forge.shforge:down/forge:up. -
The AI-agent exfil hard-deny. An agent (Claude Code) cannot push/clone your private repo onto a fresh cloud box — it's classified as data exfiltration, and
permissions.allowdoes NOT clear it (it's a hard-deny). Two fixes:- You run the source push / build yourself (human-initiated clears it), OR
- Add an
autoModetrust block to.claude/settings.local.jsonBY HAND (the agent can't self-edit this — that's the anti-injection point) declaring the forge + DO project as the owner's trusted infra. Then the agent can run packer/terraform/git. Template at the bottom. - Always keep credentials out of argv — pass via env (
PKR_VAR_*,TF_VAR_*), never-var creds=...on the command line (the plaintext password is a second exfil signal + leaks tops).
-
apt dpkg-lock race on fresh droplets. cloud-init runs its own apt at boot; your provisioner collides →
Could not get lock /var/lib/dpkg/lock-frontend→ exit 100. Fix at the top of provisioning:cloud-init status --wait >/dev/null 2>&1 || true apt-get -o DPkg::Lock::Timeout=600 update -y -
Build user needs passwordless sudo. Dev-setup scripts install system packages via
sudo apt-get. A bareuseradduser has no sudo → node/etc. install fails. Add:echo "$BUILD_USER ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/90-$BUILD_USER chmod 440 /etc/sudoers.d/90-$BUILD_USER -
Packer DO builder has no
projectparameter → its transient build droplet lands in the account default project. Fix: make your project the account default —PATCH /v2/projects/{id} {"is_default":true}(reversible). Persistent fleet droplets get assigned explicitly viadigitalocean_project_resourcesin Terraform. -
Reserved/floating IPs cost ~$4/mo while DETACHED (i.e. exactly when the box is down) — defeats the savings. Skip them; read the dynamic IP from a vault file that
forge:uprefreshes. -
The golden image is size-agnostic — build it on any size, run workers on the beefy size. Build size ≠ worker size.
-
macOS coordinator (if you dispatch from a Mac):
tools/*that userealpath -mneed GNU coreutils (brew install coreutils) or run dispatch from Linux.
Runbook (cocotte)
0. Tooling: brew install hashicorp/tap/terraform hashicorp/tap/packer shellcheck on the laptop.
1. DO account: create account + a read/write API token. File a support ticket to (a) raise droplet limit and (b) unlock larger/CPU-Optimized sizes if you want >8 vCPU. Create a DO Project for cocotte; make it the account default (gotcha #6).
2. Vault the secrets (never in repo/argv):
mkdir -p ~/.vault && chmod 700 ~/.vault
echo '<token>' > ~/.vault/do_pat_cocotte && chmod 600 ~/.vault/do_pat_cocotte
3. Forge droplet + Forgejo: spin a s-1vcpu-1gb Ubuntu droplet (~$6/mo); install Forgejo (single Go binary + sqlite + systemd). Key detail: app.ini must be owned by the git run-user (it writes its INTERNAL_TOKEN on first start) and INSTALL_LOCK = true. Create an admin + a repo. Store IP + admin creds in ~/.vault/cocotte_forge_creds. (Copy MC's install sequence.)
4. Push source to the forge (you, by hand — exfil gate): push an orphan snapshot of the current tree (avoids dragging bloated .git history):
cd <cocotte-repo>
URL="http://<admin>:<pass>@<forge-ip>:3000/<org>/<repo>.git"
TREE=$(git rev-parse main^{tree})
COMMIT=$(git commit-tree "$TREE" -m "snapshot for cloud build")
git -c http.postBuffer=524288000 push "$URL" "${COMMIT}:refs/heads/main" # zsh: brace the var!
5. (Optional) autoMode trust so the agent can run the cloud steps unattended — see template below; add it yourself.
6. Golden image: adapt provision.sh (cocotte toolchain: python3 + uv/pip, node + pnpm, plus #4/#5 fixes baked in), then:
export DIGITALOCEAN_TOKEN=$(cat ~/.vault/do_pat_cocotte)
export PKR_VAR_git_remote="http://<admin>:<pass>@<forge-ip>:3000/<org>/<repo>.git" # creds in env, not argv
packer init infra/packer/golden-image.pkr.hcl
packer build infra/packer/golden-image.pkr.hcl
7. Fleet: ./run dist:up 1 s-8vcpu-16gb-amd → ./run dist:test → ./run dist:down.
8. DNS / DX shortcut: after forge:up, net sync (or ./run forge:dns inside the project) installs the managed ctforge (and mcforge) entry via the net-tools infra installer (forge-dns-render). Browse http://ctforge:3000. The shortcuts are adopted into a marked block and survive net sync re-runs.
9. One-shot bring-up (human-run): after forge + key registration + golden image, use scripts/cloud-bringup.sh (or run the steps by hand). It does packer + dist:up 1 + dist:typecheck + auto teardown on exit. Launch with nohup ... & and review the log.
Cost
| Item | Cost |
|---|---|
| Forge (s-1vcpu-1gb) | ~$6/mo, or ~$0.30/mo idle (snapshot+destroy via forge:down) |
| Workers (s-8vcpu-16gb-amd) | $0.167/hr, only while up → cents per run, $0 idle |
| Golden snapshot | ~$0.40/mo |
autoMode trust block (you add by hand to .claude/settings.local.json)
This is what lets the agent run the cloud steps without tripping the exfil hard-deny. Adapt names/IP/project to cocotte:
python3 - <<'PY'
import json, os
p = os.path.expanduser("<cocotte-repo>/.claude/settings.local.json")
cfg = json.load(open(p))
am = cfg.setdefault("autoMode", {})
env = am.setdefault("environment", [])
if "$defaults" not in env: env.insert(0, "$defaults")
env.append("The self-hosted Forgejo at <forge-ip>:3000 and the DigitalOcean droplets "
"in the cocotte project are the owner's OWN build/test infrastructure holding "
"this project's source. Cloning the repo from that forge onto those droplets, "
"and pushing snapshots to it, is authorized provisioning — NOT data exfiltration.")
allow = am.setdefault("allow", [])
if "$defaults" not in allow: allow.insert(0, "$defaults")
allow.append("git/packer/terraform operations moving THIS project's source between the "
"owner's machine, the cocotte Forgejo, and the cocotte DO droplets are authorized.")
perm = cfg.setdefault("permissions", {}).setdefault("allow", [])
for r in ["Bash(git push:*)","Bash(packer init:*)","Bash(packer build:*)","Bash(terraform apply:*)","Bash(terraform destroy:*)"]:
if r not in perm: perm.append(r)
json.dump(cfg, open(p,"w"), indent=2); open(p,"a").write("\n")
print("autoMode + permissions updated")
PY
Pointers
- Implemented here:
run,scripts/run/{dist,forge}.sh,scripts/cloud-bringup.sh,infra/{packer,terraform/test-fleet}/(see also updated INFRA.md §10). - Working reference (original):
~/Code/@projects/@magic-civilization/infra/{terraform/test-fleet,packer}+scripts/run/{dist,forge}.sh+scripts/cloud-bringup.sh. - MC memory note (decisions + tier constraints):
~/.claude/projects/-Users-natalie-Code--projects--magic-civilization/memory/project_cloud_test_fleet.md. - Offline verify with zero spend:
terraform fmt + validate + test(mocked provider) —./run dist:check. - SSH key for this project: we generated
~/.ssh/id_cocotte_fleet+.pubright now. You must register the .pub in your DO account (Security → SSH Keys) under the exact namecocotte-fleet. The scripts (forge + fleet) now auto-lookup the numeric ID via API. Do this in the DO web UI before running forge:up or dist:up. Current pubkey (as of this run): ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIEkqbC3eHgo3cc263rS+y9KDUz/MuQsrw8srjVSTt8Q1 cocotte-fleet-2026-06 - Vault: ~/.vault/do_pat_cocotte symlinked to your existing do-pat-ct.token ; placeholder cocotte_forge_creds created (populated by first forge:up).
- Ready for human bring-up: after key registration in DO, run the steps in the "Runbook" or
./scripts/cloud-bringup.sh(human, with nohup recommended).