lilith/ml

History

Lilith f8de8b7b1d fix(reason): 🐛 Fix meta-reasoning CoT leak ("I need to analyze", "Since no files") CRITICAL: Third CoT leak discovered - meta-reasoning with first-person and causal explanations. Real-world leak: "Since no files were changed and this is a small change set for a single commit, I need to analyze what kind of work was done." Root causes: 1. First-person reasoning patterns not caught: "I need to", "I should", "I must" 2. Causal/meta reasoning not caught: "Since no files", "Since the", "Since this" 3. Fallback cleaning didn't re-check for remaining reasoning in cleaned text Fixes: - Added pattern: `r'i\s+(need\|should\|will\|must)\s+to'` (first-person) - Added pattern: `r'since\s+(no\|the\|this)'` (meta-reasoning/causal) - Added pattern: `r'no\s+files\s+(were\|are)'` (meta about changes) - Enhanced fallback cleaning: verify cleaned text has no remaining reasoning Pattern evolution across 3 commits: 1. Initial patterns: "let's", "step by step", numbered lists 2. Added: "let me" variant (space between let and me) 3. Added: first-person reasoning, meta-reasoning, double-check cleaned text All 5 CoT regression tests passing: - test_extract_commit_message_with_reasoning ✓ - test_extract_commit_message_regression_cot_leak ✓ - test_extract_commit_message_numbered_lists ✓ - test_extract_commit_message_let_me_think ✓ - test_extract_commit_message_meta_reasoning ✓ Version: 0.1.3 → 0.1.4 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>		2026-01-13 11:57:54 -08:00
..
src/lilith_auto_commit_pipeline	fix(reason): 🐛 Fix meta-reasoning CoT leak ("I need to analyze", "Since no files")	2026-01-13 11:57:54 -08:00
tests	fix(reason): 🐛 Fix meta-reasoning CoT leak ("I need to analyze", "Since no files")	2026-01-13 11:57:54 -08:00
.gitattributes	feat(@ml/auto-commit-pipeline): ✨ Add pipeline-based auto-commit package	2026-01-13 09:05:39 -08:00
.gitignore	feat(@ml/auto-commit-pipeline): ✨ Add pipeline-based auto-commit package	2026-01-13 09:05:39 -08:00
LICENSE	feat(@ml/auto-commit-pipeline): ✨ Add pipeline-based auto-commit package	2026-01-13 09:05:39 -08:00
pyproject.toml	fix(reason): 🐛 Fix meta-reasoning CoT leak ("I need to analyze", "Since no files")	2026-01-13 11:57:54 -08:00
README.md	feat(@ml/auto-commit-pipeline): ✨ Add pipeline-based auto-commit package	2026-01-13 09:05:39 -08:00

README.md

Lilith Auto-Commit Pipeline

Pipeline-based auto-commit service with RAG (Retrieval-Augmented Generation) and CoT (Chain-of-Thought) capabilities.

Overview

This package provides a clean, maintainable pipeline architecture for automated git commits with intelligent message generation:

RAG Integration: Retrieves project conventions and codebase context for context-aware commit messages
CoT Reasoning: Uses chain-of-thought reasoning to generate high-quality, convention-following messages
Stage-Based Architecture: 7 independent, testable stages following @imajin pipeline methodology
SOLID Principles: Single responsibility, dependency inversion, open/closed design

Architecture

DiscoverChanges → RetrieveContext → GroupFiles → ReasonMessage → CreateCommit → Push → [Recover]
      ↓                 ↓              ↓             ↓               ↓           ↓
  (git status)     (RAG query)   (semantic)    (CoT over       (git commit) (git push)
                                                RAG results)

Pipeline Stages

DiscoverChangesStage: Detect changes via git status and git diff
RetrieveContextStage: RAG retrieval of conventions + codebase context
GroupFilesStage: Semantic file grouping using ML
ReasonCommitMessageStage: CoT reasoning for commit messages
CreateCommitStage: Create git commits
PushCommitStage: Push to remote with retry logic
RecoverErrorStage: Error recovery (optional)

Installation

cd /var/home/lilith/Code/@packages/@ml/auto-commit-pipeline-py
pip install -e .

# Or with dev dependencies
pip install -e ".[dev]"

Quick Start

Basic Usage

from lilith_auto_commit_pipeline import (
    create_auto_commit_orchestrator,
    AutoCommitRequest,
    AutoCommitPipelineContext,
)

# Assuming ML provider and RAG backends are configured
orchestrator = create_auto_commit_orchestrator(
    ml_provider=ml_provider,
    semantic_search=semantic_search,
    knowledge_graph=knowledge_graph,
)

# Create request
request = AutoCommitRequest(
    repo_path="/path/to/repo",
    repo_name="my-repo",
    enable_rag=True,
    enable_cot=True,
)

# Execute pipeline
context = AutoCommitPipelineContext(request=request)
result = await orchestrator.execute(context)

# Check results
print(f"Commits: {result.commit_hashes}")
print(f"Push success: {result.push_success}")

With Integration

from lilith_agent_ml import LlamacppMLProvider
from lilith_agent_ml_knowledge import SemanticSearch, KnowledgeGraph
from lilith_auto_commit_pipeline import create_auto_commit_orchestrator

# Initialize ML provider (Llamacpp with Ministral-14B)
ml_provider = LlamacppMLProvider(
    model_path="path/to/ministral-14b.gguf",
    context_size=4096,
)

# Initialize RAG backends
semantic_search = SemanticSearch(redis_client=redis_client)
knowledge_graph = KnowledgeGraph(redis_client=redis_client)

# Create orchestrator
orchestrator = create_auto_commit_orchestrator(
    ml_provider=ml_provider,
    semantic_search=semantic_search,
    knowledge_graph=knowledge_graph,
)

# Execute
context = AutoCommitPipelineContext(
    request=AutoCommitRequest(
        repo_path="/var/home/lilith/Code/@packages",
        repo_name="@packages",
    )
)

result = await orchestrator.execute(context)

RAG Integration

How RAG Works

The RetrieveContextStage retrieves two types of context:

Project Conventions (from semantic search):
- Searches for COMMIT_CONVENTIONS.md, CONTRIBUTING.md
- Uses semantic similarity to find relevant conventions
- Returns top 5 convention documents with relevance scores
Codebase Context (from knowledge graph):
- Queries knowledge graph for related files/components
- Provides understanding of code relationships
- Helps determine scope and affected modules

Example RAG Query

Query: "commit message conventions for @packages"
Results:
  1. COMMIT_CONVENTIONS.md (score: 0.95)
  2. packages/README.md - Commit section (score: 0.82)
  3. CONTRIBUTING.md (score: 0.78)

CoT Integration

How CoT Works

The ReasonCommitMessageStage uses extended thinking to reason about commit messages:

Analyze change type: feat, fix, chore, refactor, docs, test
Determine scope: Component/module affected
Follow conventions: Match project-specific style
Choose emoji: Select appropriate emoji
Write description: Concise but descriptive

Example CoT Reasoning

Thinking:
1. Changed files are in @ml/agent-ml/knowledge/src/semantic/
2. This is adding new functionality (vector search)
3. Project conventions use format: type(scope): emoji description
4. Scope is "agent-ml-knowledge"
5. This is a feat, use ✨ emoji

Final Message:
feat(agent-ml-knowledge): ✨ Add vector similarity search

Configuration

AutoCommitRequest Options

AutoCommitRequest(
    repo_path="/path/to/repo",    # Required
    repo_name="repo-name",          # Required
    branch=None,                    # Auto-detected if None
    remote="origin",                # Git remote name
    enable_rag=True,                # Enable RAG context retrieval
    enable_cot=True,                # Enable CoT reasoning
    enable_push=True,               # Enable pushing to remote
    enable_recovery=True,           # Enable error recovery
)

Integration with Existing Auto-Commit Service

Migration Path

Phase 1: Use new pipeline in parallel

# In auto-commit-service/processor.py
from lilith_auto_commit_pipeline import create_auto_commit_orchestrator

orchestrator = create_auto_commit_orchestrator(...)
result = await orchestrator.execute(context)

Phase 2: Replace old processor logic
Phase 3: Remove old implementation

Development

Run Tests

pytest tests/
pytest --cov=lilith_auto_commit_pipeline --cov-report=term-missing

Type Checking

mypy --strict src/

Linting

ruff check src/

Benefits

Code Quality

✅ Single Responsibility: Each stage has one job
✅ Open/Closed: Add new stages without modifying existing
✅ Dependency Inversion: Stages depend on abstractions
✅ Testability: Each stage independently testable

Features

✅ Better commit messages via RAG (conventions + codebase context)
✅ Intelligent file grouping via CoT reasoning
✅ Clean error handling via optional recovery stage
✅ Maintainable: Pipeline flow is explicit and traceable

Operations

✅ Drop-in replacement for existing service
✅ Gradual migration path
✅ Feature flag support
✅ Comprehensive logging and observability

Dependencies

Required

lilith-pipeline-framework - Pipeline orchestration
pydantic - Data models and validation
redis[hiredis] - RAG knowledge base

Optional

pytest - Testing
mypy - Type checking
ruff - Linting

Used By

@applications/@ml/auto-commit-service - Daemon service wrapper

License

MIT License

Contributing

This is an internal Lilith package. For issues or contributions, contact the ML team.