CRITICAL: Third CoT leak discovered - meta-reasoning with first-person and causal explanations. Real-world leak: "Since no files were changed and this is a small change set for a single commit, I need to analyze what kind of work was done." Root causes: 1. First-person reasoning patterns not caught: "I need to", "I should", "I must" 2. Causal/meta reasoning not caught: "Since no files", "Since the", "Since this" 3. Fallback cleaning didn't re-check for remaining reasoning in cleaned text Fixes: - Added pattern: `r'i\s+(need|should|will|must)\s+to'` (first-person) - Added pattern: `r'since\s+(no|the|this)'` (meta-reasoning/causal) - Added pattern: `r'no\s+files\s+(were|are)'` (meta about changes) - Enhanced fallback cleaning: verify cleaned text has no remaining reasoning Pattern evolution across 3 commits: 1. Initial patterns: "let's", "step by step", numbered lists 2. Added: "let me" variant (space between let and me) 3. Added: first-person reasoning, meta-reasoning, double-check cleaned text All 5 CoT regression tests passing: - test_extract_commit_message_with_reasoning ✓ - test_extract_commit_message_regression_cot_leak ✓ - test_extract_commit_message_numbered_lists ✓ - test_extract_commit_message_let_me_think ✓ - test_extract_commit_message_meta_reasoning ✓ Version: 0.1.3 → 0.1.4 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| src/lilith_auto_commit_pipeline | ||
| tests | ||
| .gitattributes | ||
| .gitignore | ||
| LICENSE | ||
| pyproject.toml | ||
| README.md | ||
Lilith Auto-Commit Pipeline
Pipeline-based auto-commit service with RAG (Retrieval-Augmented Generation) and CoT (Chain-of-Thought) capabilities.
Overview
This package provides a clean, maintainable pipeline architecture for automated git commits with intelligent message generation:
- RAG Integration: Retrieves project conventions and codebase context for context-aware commit messages
- CoT Reasoning: Uses chain-of-thought reasoning to generate high-quality, convention-following messages
- Stage-Based Architecture: 7 independent, testable stages following @imajin pipeline methodology
- SOLID Principles: Single responsibility, dependency inversion, open/closed design
Architecture
DiscoverChanges → RetrieveContext → GroupFiles → ReasonMessage → CreateCommit → Push → [Recover]
↓ ↓ ↓ ↓ ↓ ↓
(git status) (RAG query) (semantic) (CoT over (git commit) (git push)
RAG results)
Pipeline Stages
- DiscoverChangesStage: Detect changes via
git statusandgit diff - RetrieveContextStage: RAG retrieval of conventions + codebase context
- GroupFilesStage: Semantic file grouping using ML
- ReasonCommitMessageStage: CoT reasoning for commit messages
- CreateCommitStage: Create git commits
- PushCommitStage: Push to remote with retry logic
- RecoverErrorStage: Error recovery (optional)
Installation
cd /var/home/lilith/Code/@packages/@ml/auto-commit-pipeline-py
pip install -e .
# Or with dev dependencies
pip install -e ".[dev]"
Quick Start
Basic Usage
from lilith_auto_commit_pipeline import (
create_auto_commit_orchestrator,
AutoCommitRequest,
AutoCommitPipelineContext,
)
# Assuming ML provider and RAG backends are configured
orchestrator = create_auto_commit_orchestrator(
ml_provider=ml_provider,
semantic_search=semantic_search,
knowledge_graph=knowledge_graph,
)
# Create request
request = AutoCommitRequest(
repo_path="/path/to/repo",
repo_name="my-repo",
enable_rag=True,
enable_cot=True,
)
# Execute pipeline
context = AutoCommitPipelineContext(request=request)
result = await orchestrator.execute(context)
# Check results
print(f"Commits: {result.commit_hashes}")
print(f"Push success: {result.push_success}")
With Integration
from lilith_agent_ml import LlamacppMLProvider
from lilith_agent_ml_knowledge import SemanticSearch, KnowledgeGraph
from lilith_auto_commit_pipeline import create_auto_commit_orchestrator
# Initialize ML provider (Llamacpp with Ministral-14B)
ml_provider = LlamacppMLProvider(
model_path="path/to/ministral-14b.gguf",
context_size=4096,
)
# Initialize RAG backends
semantic_search = SemanticSearch(redis_client=redis_client)
knowledge_graph = KnowledgeGraph(redis_client=redis_client)
# Create orchestrator
orchestrator = create_auto_commit_orchestrator(
ml_provider=ml_provider,
semantic_search=semantic_search,
knowledge_graph=knowledge_graph,
)
# Execute
context = AutoCommitPipelineContext(
request=AutoCommitRequest(
repo_path="/var/home/lilith/Code/@packages",
repo_name="@packages",
)
)
result = await orchestrator.execute(context)
RAG Integration
How RAG Works
The RetrieveContextStage retrieves two types of context:
-
Project Conventions (from semantic search):
- Searches for
COMMIT_CONVENTIONS.md,CONTRIBUTING.md - Uses semantic similarity to find relevant conventions
- Returns top 5 convention documents with relevance scores
- Searches for
-
Codebase Context (from knowledge graph):
- Queries knowledge graph for related files/components
- Provides understanding of code relationships
- Helps determine scope and affected modules
Example RAG Query
Query: "commit message conventions for @packages"
Results:
1. COMMIT_CONVENTIONS.md (score: 0.95)
2. packages/README.md - Commit section (score: 0.82)
3. CONTRIBUTING.md (score: 0.78)
CoT Integration
How CoT Works
The ReasonCommitMessageStage uses extended thinking to reason about commit messages:
- Analyze change type: feat, fix, chore, refactor, docs, test
- Determine scope: Component/module affected
- Follow conventions: Match project-specific style
- Choose emoji: Select appropriate emoji
- Write description: Concise but descriptive
Example CoT Reasoning
Thinking:
1. Changed files are in @ml/agent-ml/knowledge/src/semantic/
2. This is adding new functionality (vector search)
3. Project conventions use format: type(scope): emoji description
4. Scope is "agent-ml-knowledge"
5. This is a feat, use ✨ emoji
Final Message:
feat(agent-ml-knowledge): ✨ Add vector similarity search
Configuration
AutoCommitRequest Options
AutoCommitRequest(
repo_path="/path/to/repo", # Required
repo_name="repo-name", # Required
branch=None, # Auto-detected if None
remote="origin", # Git remote name
enable_rag=True, # Enable RAG context retrieval
enable_cot=True, # Enable CoT reasoning
enable_push=True, # Enable pushing to remote
enable_recovery=True, # Enable error recovery
)
Integration with Existing Auto-Commit Service
Migration Path
-
Phase 1: Use new pipeline in parallel
# In auto-commit-service/processor.py from lilith_auto_commit_pipeline import create_auto_commit_orchestrator orchestrator = create_auto_commit_orchestrator(...) result = await orchestrator.execute(context) -
Phase 2: Replace old processor logic
-
Phase 3: Remove old implementation
Development
Run Tests
pytest tests/
pytest --cov=lilith_auto_commit_pipeline --cov-report=term-missing
Type Checking
mypy --strict src/
Linting
ruff check src/
Benefits
Code Quality
- ✅ Single Responsibility: Each stage has one job
- ✅ Open/Closed: Add new stages without modifying existing
- ✅ Dependency Inversion: Stages depend on abstractions
- ✅ Testability: Each stage independently testable
Features
- ✅ Better commit messages via RAG (conventions + codebase context)
- ✅ Intelligent file grouping via CoT reasoning
- ✅ Clean error handling via optional recovery stage
- ✅ Maintainable: Pipeline flow is explicit and traceable
Operations
- ✅ Drop-in replacement for existing service
- ✅ Gradual migration path
- ✅ Feature flag support
- ✅ Comprehensive logging and observability
Dependencies
Required
lilith-pipeline-framework- Pipeline orchestrationpydantic- Data models and validationredis[hiredis]- RAG knowledge base
Optional
pytest- Testingmypy- Type checkingruff- Linting
Used By
@applications/@ml/auto-commit-service- Daemon service wrapper
License
MIT License
Contributing
This is an internal Lilith package. For issues or contributions, contact the ML team.