auto-commit-service/CLAUDE.md
autocommit 2bfd968f1a docs(docs): 📝 Add branch synchronization requirements and failure modes documentation to CLAUDE.md
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-06-09 19:22:47 -07:00

9.4 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Commands

# Install
uv pip install -e .              # basic
uv pip install -e ".[dev]"       # with test deps

# Tests
pytest                            # unit/smoke tests (GPU tests excluded by default)
pytest tests/test_daemon.py -v    # single file
pytest tests/test_daemon.py::TestDaemon::test_start -v  # single test
pytest -m gpu -v                  # GPU integration tests (needs model-boss coordinator running)
pytest --cov=auto_commit_service  # with coverage

# Lint/type check
ruff check src/ tests/
ruff format src/ tests/
mypy src/

# Run service
python -m auto_commit_service     # direct entry
commits start 5m -R               # CLI: daemon with 5m cycle, recursive discovery
commits status --all              # check all running daemons

# Systemd
systemctl --user restart auto-commit-applications.service
journalctl --user -u auto-commit-applications -f

Architecture

Periodic sweep + synchronous queue-mediated inference. The daemon loops every cycle_interval_seconds (default 180s), processes repos sequentially, and each LLM call blocks on model-boss coordinator's queue. The commits-tray macOS app can also act as a remote commit agent: with --commit-local, it scans repos on the local Mac, forwards diffs to apricot's /generate-message endpoint for LLM inference, commits and pushes locally, and reports back via /record-commit — no second daemon needed on the Mac.

Execution flow

CommitDaemon.start() — main loop
  for each dirty repo:
    PipelineCommitProcessor.commit_repo()
      → Pipeline orchestrator (11 stages):
        PreFilter → Discover → Retrieve(RAG) → Group → Analyze(14B) → Format(3B)
        → Commit → Push → VersionDetect → PublishVerify → Recover
      → Each LLM stage calls MultiModelLlamaClient._chat()
        → InferenceClient.chat() → POST coordinator:8210/v1/chat/completions
        → model-boss coordinator queues and executes on GPU
  sleep(cycle_interval_seconds)

Two-model LLM pipeline

All inference routes through model-boss coordinator (port 8210). No direct model loading.

  • Reasoning (ministral-14b-reasoning): Analyzes diffs, groups files, understands changes
  • Instruct (ministral-3b-instruct): Formats commit messages from analysis
  • Recovery (claude:sonnet via model-boss): Two-phase recovery for git failures — Claude diagnoses, ACS executes the plan locally

Key modules

Module Role
scheduler/daemon.py Main loop, cycle orchestration, repo discovery
scheduler/pipeline_processor.py Per-repo processing, monorepo submodule handling
pipeline/orchestrator.py Creates the 11-stage pipeline chain
pipeline/stages/ Individual pipeline stages
pipeline/init.py Global ML provider initialization (must call before pipeline)
llm/multi_model_client.py Routes inference to model-boss via InferenceClient
recovery/handlers.py Error classification → recovery strategy routing
recovery/claude_fallback.py Two-phase Claude recovery (diagnose via model-boss, execute locally)
database/ Async SQLite (aiosqlite) for commit/cycle/error history
cli/ Typer CLI (commits command) for multi-daemon management
app.py FastAPI factory with 20+ monitoring/control endpoints
config.py All settings with AUTO_COMMIT_ env prefix

External dependencies

  • model-boss coordinator (port 8210): GPU model management, inference queue, VRAM scheduling
  • rag-retrieval (optional): Contextual retrieval for commit analysis
  • git: All operations via asyncio.create_subprocess_exec (no shell)

Multi-host sync (plum ↔ apricot)

The same ~68 repos are checked out on two machines. ACS keeps them in sync and up to date in both directions. Internalize this model before touching anything that pushes, pulls, or discovers repos.

Forgejo is the hub — there is no direct host-to-host git. Both hosts push to and pull from the same origin (forge.nasty.sh:2222 / forge.black.local). plum talks to apricot only over HTTP (port 8200) for LLM message generation and commit recording — never git-to-git.

Roles

Host What runs LLM How
apricot (Fedora, primary) ACS daemon, auto-commit-applications.service (port 8200) local (model-boss :8210) Full 11-stage pipeline per repo each cycle
plum (macOS MacBook) commits-tray --commit-local LaunchAgent none — forwards to apricot Scans local repos, gets messages from apricot's /generate-message, commits+pushes locally, reports via /record-commit

Stay-up-to-date (the pull half) — runs every cycle, even with nothing to commit

  • apricot: pre_cycle_sync() (git/operations.py) per repo — orphan-recover → fetch → if behind and cleangit pull --rebase. Dirty trees are never pulled or stashed (other agents may be mid-edit); the dirty changes commit this cycle, push, and the next cycle pulls clean. Gated by pre_cycle_pull=True (config.py, default on).
  • plum: _push_if_safe() (tray/local_agent.py) — git fetch --quiet first, then if clean-but-behind → git merge --ff-only. This path runs even when a repo has no local changes, so plum's clean checkouts self-heal toward origin.

Stay-in-sync (the push half)

  • apricot: pipeline COMMIT → PUSH; on rejection, git pull --rebase then retry (pipeline/stages/push.py).
  • plum: secret prefilter strips denylisted files (tray/prefilter.py) → stage allowed paths only (never blanket git add -A) → message from apricot → commit → _push_if_safe push. Repos with a non-empty staging index are skipped (don't clobber in-progress manual work).

Divergence (both ahead and behind)

Neither host force-anything. Both hand off to Claude Code recovery — apricot via recovery/claude_fallback.py, plum via _invoke_claude_recovery (with a stall cooldown so a stuck repo isn't retried every cycle). Recovery commands are allowlisted: no --force, --hard, --no-verify.

Branch/remote are config-driven — and the two hosts MUST track the same branch

git_remote / git_branch come from settings/per-directory overrides; each repo syncs whatever branch its checkout tracks. Don't assume master.

Invariant — the hub only reconciles same-branch. Forgejo never merges main into master. If apricot's checkout is on main and plum's is on master, each host commits to a different branch, both push/pull cleanly against the hub, and the two branches diverge permanently — no error, no recovery, just silent drift. The pull/push halves above keep two checkouts together only when they track the same origin/<branch>. Verify branch parity across hosts (git rev-parse --abbrev-ref HEAD on each) before trusting that a repo is in sync; "ahead 0 / behind 0 vs upstream" on each host is not sufficient when the upstreams differ.

Known failure mode — keep these distinct

  1. Data repos (the ~68 monitored checkouts): self-heal via the pull/push halves above. Drift only if pre_cycle_pull is off (apricot) or --commit-local is absent from plum's LaunchAgent — --commit-local is OFF by default for safety, so a plum tray launched without it neither commits nor fast-forwards, and its checkouts silently fall behind.
  2. The ACS tooling itself on plum (the commits-tray code plum executes): this is plum's local checkout of this repo, which can drift far behind apricot (the source of the past "~1000-commit-stale" tray). It does not self-heal like a data repo, because the stale code is what would do the healing. Mitigation pattern: have plum run apricot's current tray code over SSH rather than its own local copy. Don't conflate this with data-repo sync.

Configuration

All settings via env vars prefixed AUTO_COMMIT_ or ~/.config/commits/startup-config.json.

Key settings: REASONING_MODEL_ID, INSTRUCT_MODEL_ID, CYCLE_INTERVAL_SECONDS, CLAUDE_FALLBACK_ENABLED, CLAUDE_RECOVERY_MODEL.

Per-directory git identity and push behavior via directory_overrides in config.

Testing

  • asyncio_mode = "auto" — all async tests run automatically
  • GPU tests require model-boss coordinator running, marked @pytest.mark.gpu
  • Fixtures: temp_git_repo (creates temp git repo), mock_settings (unit), gpu_settings (integration)
  • ruff: line-length 100, select E,F,I,N,W,UP,B,C4,SIM,RUF,PTH,ERA
  • mypy: strict mode, Python 3.11+

Data paths

Path Contents
~/.cache/commits/auto_commit.db SQLite: commits, cycles, errors, repo status
~/.cache/commits/activity.jsonl Activity log (JSON Lines)
~/.cache/commits/auto-commit.log Rotated log file
~/.config/commits/startup-config.json Daemon registry config
~/.config/commits/daemons.json Running daemon instances

Important notes

  • Never commit from this repo — ACS itself handles all commits across the workspace
  • Pipeline stages access ML providers via globals initialized by init_ml_providers() — must be called before pipeline execution
  • ACS uses default_priority="batch" (lowest) in model-boss queue, so interactive requests preempt it
  • Recovery commands are validated against an allowlist (no --force, --hard, --no-verify)