auto-commit-service/docs/architecture.md

# Auto-Commit Service Architecture

## Overview

The auto-commit service monitors git repositories for uncommitted changes and automatically generates commit messages using a local LLM (llama-service).

## Monitoring Scope

### What Gets Monitored

The service monitors **git repositories**, not individual packages.

| Metric | Count | Notes |
|--------|-------|-------|
| Git repos in @packages | 58 | Excludes node_modules |
| Git repos in @applications | 10 | @audio, @image, @lilith, @ml |
| **Total monitored** | **68** | |

### Package vs Repo Distinction

```
@packages/                    # Workspace root
├── @nestjs/                  # 1 git repo
│   ├── .git/
│   ├── auth/                 # package: @lilith/nestjs-auth
│   ├── bootstrap/            # package: @lilith/nestjs-bootstrap
│   └── health/               # package: @lilith/nestjs-health
└── @eslint/
    ├── config-base/          # 1 git repo, 1 package
    │   └── .git/
    └── config-react/         # 1 git repo, 1 package
        └── .git/
```

- **114 npm packages** (`package.json` files)
- **26 Python packages** (`pyproject.toml` files)
- **59 git repos** (`.git` directories) - this is what gets monitored

Git commits happen at the repo level, so monitoring repos (not packages) is correct.

## Configured Base Paths

```python
repos_base_paths = [
    "/var/home/lilith/Code/@packages",
    "/var/home/lilith/Code/@applications/@audio",
    "/var/home/lilith/Code/@applications/@image",
    "/var/home/lilith/Code/@applications/@lilith",
    "/var/home/lilith/Code/@applications/@ml",
]
```

## Discovery Process

1. For each base path, recursively find `.git` directories
2. Filter out excluded patterns: `node_modules`, `.venv`, `dist`, `build`, `__pycache__`
3. Respect `recursive_depth` limit (default: 4)
4. Deduplicate repos found in multiple paths

## Service Dependencies

```
┌─────────────────────┐
│  auto-commit-service│ Port 8200
│  (scheduler/daemon) │
└─────────┬───────────┘
          │ HTTP
          ▼
┌─────────────────────┐
│   llama-http        │ Port 10010
│   (LLM inference)   │
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│ ministral-14b       │ reasoning model (analyze)
│ ministral-3b        │ instruct model (format)
└─────────────────────┘
```

The service uses a multi-model approach:
- **Reasoning model** (ministral-14b): Deep analysis of code changes
- **Instruct model** (ministral-3b): Fast commit message formatting

## Cycle Flow

The service uses a **per-repo atomic workflow**:

```
┌─────────────────────────────────────────┐
│              CYCLE LOOP                 │
├─────────────────────────────────────────┤
│  repo-a: pipeline → push → done         │
│  repo-b: pipeline → push → done         │
│  repo-c: no changes → skip              │
│  repo-d: pipeline → push → done         │
│                 ↓                       │
│         All repos processed             │
│                 ↓                       │
│         Persist commit history          │
│                 ↓                       │
│           Sleep X seconds               │
│                 ↓                       │
│            Next cycle                   │
└─────────────────────────────────────────┘
```

### Pipeline Stages

For each repo with uncommitted changes, a 6-stage pipeline processes the working directory changes:

```
┌─────────────────────────────────────────────────────────────────────┐
│                         COMMIT PIPELINE                             │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  1. DETECT     Find changed files in working directory              │
│       ↓        (uncommitted changes, not yet git-staged)            │
│                                                                     │
│  2. GROUP      Cluster related files into logical commit batches    │
│       ↓        (LLM groups by feature/purpose)                      │
│                                                                     │
│  3. ANALYZE    LLM reads each batch's diff to understand changes    │
│       ↓        (what does this code change do?)                     │
│                                                                     │
│  4. FORMAT     Generate commit message from analysis                │
│       ↓        (conventional commit format with emoji)              │
│                                                                     │
│  5. COMMIT     git add + git commit for each batch                  │
│       ↓        (files are staged and committed here)                │
│                                                                     │
│  6. PUSH       Push commits to remote                               │
│                (with conflict resolution if needed)                 │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
```

**Terminology note**: "Analyzing commit 189/283" in logs means the LLM is analyzing the 189th batch of uncommitted changes. These are not yet git-staged or committed - that happens in stage 5.

### Per-Repo Processing
For each repo:
1. Check `git status --porcelain` for uncommitted working directory changes
2. Skip if no changes
3. Run pipeline: detect → group → analyze → format → commit → push
4. Move to next repo

### Cycle Completion
When all repos processed:
- Log summary (committed, failed, unchanged)
- Persist commit history
- Sleep for `cycle_interval_seconds` (default: 60)
- Start next cycle

### Why Per-Repo Atomic?
- **Sloppy-atomic**: Each repo is self-contained (commit+push)
- **Progress visible**: Changes appear on remote as processed
- **Fail-isolated**: One repo failing doesn't block others

## Data Persistence

Commit history is persisted to survive daemon restarts:

| File | Location | Purpose |
|------|----------|---------|
| History | `~/.cache/commits/history.json` | Last 100 commits (hash, repo, timestamp) |
| Activity | `~/.cache/commits/activity.jsonl` | Detailed activity log |
| Database | `~/.cache/commits/auto_commit.db` | SQLite for structured queries |

**Important**: History is only persisted when a cycle completes. If the daemon is interrupted mid-cycle (stuck hook, crash, etc.), commits made during that cycle won't appear in history.

## API Endpoints

| Endpoint | Method | Purpose |
|----------|--------|---------|
| `/health` | GET | Service health check |
| `/status` | GET | Current daemon status, last cycle results |
| `/repos` | GET | List all monitored repositories |
| `/trigger` | POST | Manually trigger a commit cycle |
| `/enable` | POST | Enable the daemon |
| `/disable` | POST | Disable the daemon |
| `/report/commits` | GET | View commit history |
| `/report/summary` | GET | Comprehensive daemon report |

## Configuration

Key settings in `AutoCommitSettings`:

| Setting | Default | Description |
|---------|---------|-------------|
| `cycle_interval_seconds` | 60 | Time between commit cycles |
| `llama_model_id` | qwen2.5-1.5b-instruct | Model for commit messages |
| `recursive_depth` | 4 | Max depth for repo discovery |
| `git_remote` | origin | Remote to push to |
| `git_branch` | master | Branch to push |

## Related Scripts

Existing scripts in `@packages/scripts/` provide similar functionality:

| Script | Purpose |
|--------|---------|
| `git/git-repo-status.sh` | Check status across all repos |
| `git/commit-all-dirty.sh` | Simple bulk commit (no LLM) |
| `git/git-push-all.sh` | Push all repos |

The auto-commit service is the "AI-powered" version that generates better commit messages via LLM, while the scripts provide simpler manual alternatives.