platform-codebase/features/conversation-assistant/LOGGING.md
Quinn Ftw f6abcaf662 fix(dating-autopilot): replace vm2 with acorn for syntax validation
The E2E tests were using vm2 to execute generated code, which caused
unhandled rejections because browser APIs (setTimeout, etc.) weren't
mocked. This was incorrectly ignored.

Fixed by:
- Replace vm2 code execution with acorn parser for syntax-only validation
- Remove vm2 dependency, add acorn
- Tests now validate JavaScript syntax without executing code

All 139 tests pass with zero errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-28 18:35:36 -08:00

7.5 KiB

Structured Logging Implementation

This document describes the structured logging implementation across all conversation-assistant services.

Overview

We implement comprehensive structured logging with:

  • JSON format for production (machine-readable, log aggregation friendly)
  • Human-readable format for development (colored console output)
  • Request correlation via request IDs
  • Contextual information (device IDs, user IDs, model versions)

Services

1. NestJS Server (server/)

Implementation:

  • Custom Logger service extending NestJS logger interface
  • AsyncLocalStorage for request context propagation
  • Logging interceptor for automatic request/response logging

Key Files:

  • src/common/logger.service.ts - Core logger implementation
  • src/common/logging.interceptor.ts - HTTP request logging

Configuration:

// Production (JSON output)
NODE_ENV=production

// Development (pretty console)
NODE_ENV=development

Usage Example:

import { createLogger } from '../../common';

class MyService {
  private readonly logger = createLogger(MyService.name);

  async myMethod() {
    this.logger.logWithData('info', 'Processing request', {
      userId: '123',
      operation: 'create',
    });
  }
}

Log Structure:

{
  "timestamp": "2025-12-28T10:30:45.123Z",
  "level": "info",
  "message": "Device registration successful",
  "context": "DevicesService",
  "requestId": "a1b2c3d4",
  "deviceId": "device-123",
  "platform": "macos"
}

2. ML Service (ml-service/)

Implementation:

  • structlog for structured logging
  • Custom logging configuration with JSON/text formats
  • Request ID middleware for correlation
  • Context variables for request tracking

Key Files:

  • src/logging_config.py - Logging configuration
  • src/main.py - Request logging middleware

Configuration:

# Environment variables
ML_SERVICE_LOG_LEVEL=INFO     # DEBUG, INFO, WARNING, ERROR, CRITICAL
ML_SERVICE_LOG_FORMAT=json    # json or text

Usage Example:

from .logging_config import get_logger

logger = get_logger(__name__)

logger.info("Model loaded successfully",
           model_id="ministral-3b",
           gpu_layers=-1,
           context_size=4096)

Log Structure:

{
  "timestamp": "2025-12-28T10:30:45.123456",
  "level": "info",
  "event": "Generation completed",
  "logger": "src.main",
  "request_id": "a1b2c3d4-e5f6-7890",
  "tokens_used": 128,
  "confidence": 0.85,
  "duration": "1.23s"
}

3. Frontend (frontend/)

Implementation:

  • Enhanced ErrorBoundary with structured error logging
  • Console logging with structured context
  • Hook for external error reporting services

Key Files:

  • src/components/ErrorBoundary.tsx - Error boundary with structured logging

Log Structure:

{
  "timestamp": "2025-12-28T10:30:45.123Z",
  "level": "error",
  "message": "React component error",
  "error": {
    "name": "TypeError",
    "message": "Cannot read property 'x' of undefined",
    "stack": "..."
  },
  "componentStack": "...",
  "userAgent": "Mozilla/5.0...",
  "url": "http://localhost:5173/devices"
}

Logged Events

NestJS Server

DevicesService:

  • Device registration (new/existing)
  • Verification attempts (success/failure)
  • Rate limiting triggers
  • Device lockouts

ResponsesService:

  • Generation requests
  • ML service calls (with timing)
  • Cache hits/misses
  • Generation failures

TrainingService:

  • Training job creation
  • Sample selection
  • ML service submission
  • Progress polling
  • Job completion/failure

ML Service

Lifecycle:

  • Service startup/shutdown
  • Model loading/unloading
  • Redis connection status

Generation:

  • Request receipt
  • Cache hits/misses
  • LLM generation (with timing)
  • Response caching
  • Errors

Training:

  • Job processing
  • Data preparation
  • Progress updates
  • Completion/failure

Request Correlation

All logs within a request context include the same request_id:

NestJS:

// Automatically set by LoggingInterceptor
// Propagated via AsyncLocalStorage
requestId: "a1b2c3d4-e5f6-7890"

ML Service:

# Set by LoggingRoute middleware
# Propagated via structlog contextvars
request_id: "a1b2c3d4-e5f6-7890"

Log Aggregation

For production deployments, structured JSON logs can be aggregated using:

  • ELK Stack: Elasticsearch, Logstash, Kibana
  • Grafana Loki: Log aggregation for Grafana
  • CloudWatch Logs: AWS native solution
  • DataDog: Commercial APM solution

Example Logstash configuration:

input {
  file {
    path => "/var/log/conversation-assistant/*.log"
    codec => json
  }
}

filter {
  if [level] == "error" {
    mutate {
      add_tag => [ "error" ]
    }
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "conversation-assistant-%{+YYYY.MM.dd}"
  }
}

Querying Logs

NestJS (Development Console)

# Follow logs in development
npm run start:dev | grep "DevicesService"

# Filter by level
npm run start:dev | grep "ERROR"

ML Service (Development)

# Text format (development)
ML_SERVICE_LOG_FORMAT=text python -m src.main

# JSON format (production)
ML_SERVICE_LOG_FORMAT=json python -m src.main | jq .

Production (JSON)

# Filter by request ID
cat app.log | jq 'select(.request_id == "abc123")'

# Filter by error level
cat app.log | jq 'select(.level == "error")'

# Extract specific fields
cat app.log | jq '{timestamp, message, error}'

# Group by context
cat app.log | jq 'select(.context == "ResponsesService")'

Performance Considerations

  • Development: Colored console output has minimal overhead
  • Production: JSON serialization adds ~1-2ms per log entry
  • Request context: AsyncLocalStorage/contextvars have negligible overhead
  • Log volume: Adjust log levels in production (INFO/WARNING instead of DEBUG)

Best Practices

  1. Use appropriate log levels:

    • DEBUG: Detailed diagnostic information
    • INFO: General informational messages
    • WARNING: Warning messages (degraded but working)
    • ERROR: Error messages (operation failed)
  2. Include context:

    logger.logWithData('info', 'Operation completed', {
      userId: user.id,
      operation: 'create',
      duration: elapsedMs,
    });
    
  3. Log timing for operations:

    const start = Date.now();
    // ... operation ...
    const duration = Date.now() - start;
    logger.logWithData('info', 'Operation timing', { duration });
    
  4. Don't log sensitive data:

    • Avoid logging passwords, tokens, personal data
    • Hash or redact sensitive fields
  5. Use structured data over string interpolation:

    // Good
    logger.logWithData('info', 'User logged in', { userId });
    
    // Avoid
    logger.log(`User ${userId} logged in`);
    

Monitoring & Alerts

Set up alerts based on log patterns:

  • Error rate spike: More than 10 errors/minute
  • ML service latency: Generation > 5 seconds
  • Cache miss rate: < 50% cache hit rate
  • Training failures: Any training job failure

Example alert query (for log aggregation system):

level:error AND context:ResponsesService
| count by 1m
| alert if count > 10

Future Enhancements

  • Add distributed tracing (OpenTelemetry)
  • Implement log sampling for high-volume endpoints
  • Add performance metrics (Prometheus)
  • Integrate with error tracking service (Sentry)
  • Add log rotation for local development
  • Implement audit logging for sensitive operations