Quinn Ftw 2ce3b295f4 feat(status-dashboard): add audit logging system

Implement comprehensive audit logging with:
- AuditLoggingInterceptor: Request/response logging with <2ms overhead
- JsonLoggerService: Structured JSON output for SIEM integration
- Log rotation: 90-day retention with daily rotation
- Unit tests: 9 passing tests for interceptor behavior

Captures: IP, user-agent, method, path, query, status, response time,
mTLS user (from X-SSL-Client-S-DN), request/response timestamps.

Includes implementation guide and logrotate configuration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-26 05:59:12 -08:00

8.4 KiB

Raw Blame History

Audit Logging Infrastructure

This document describes the audit logging infrastructure for security compliance and SIEM integration.

Overview

The status-dashboard backend implements comprehensive audit logging to track all access to sensitive endpoints. This enables:

Security compliance: Track who accessed what resources and when
Incident response: Investigate security incidents with detailed audit trails
SIEM integration: Forward structured JSON logs to Security Information and Event Management (SIEM) systems
Anomaly detection: Identify unusual access patterns

Architecture

Components

AuditLoggingInterceptor (src/logging/audit-logging.interceptor.ts)
- NestJS interceptor that captures request/response metadata
- Applied to sensitive controllers via @UseInterceptors(AuditLoggingInterceptor)
- Logs every request with timing, client info, and response status
JSONLoggerService (src/logging/json-logger.service.ts)
- Custom logger for production environments
- Outputs structured JSON logs suitable for log aggregators
- Separates audit logs from application logs
Log Files
- /var/log/status-dashboard/app.log - General application logs
- /var/log/status-dashboard/audit.log - Security/audit events only
- Both files rotate daily with 90-day retention

Logged Fields

Every audited request includes:

{
  "timestamp": "2025-12-26T13:45:00.123Z",
  "ip": "10.8.0.5",
  "userAgent": "Mozilla/5.0...",
  "method": "GET",
  "path": "/api/health/services/postgres/logs",
  "query": {"lines": "100"},
  "status": 200,
  "responseTime": 45,
  "user": "admin@lilith.com",
  "level": "log",
  "context": "AuditLog"
}

Field descriptions:

timestamp: ISO 8601 timestamp
ip: Client IP (X-Forwarded-For or direct connection)
userAgent: Client user agent string
method: HTTP method (GET, POST, PUT, DELETE)
path: Request URL path
query: Query parameters (if any)
status: HTTP response status code
responseTime: Response time in milliseconds
user: Authenticated user from mTLS certificate (CN field)
error: Error message (only for failed requests)

Monitored Endpoints

The following controllers have audit logging enabled:

HostsController (`/api/hosts`)

GET /api/hosts - List all hosts with metrics
GET /api/hosts/:hostId - Get detailed host metrics
GET /api/hosts/sentiment/overall - Get host sentiment

StatusController (`/api/health`)

GET /api/health/status - Platform status
GET /api/health/services - All service statuses
GET /api/health/services/:name - Specific service details
GET /api/health/services/:name/logs - Container logs (sensitive)
GET /api/health/resources - Host resource usage
GET /api/health/events - Docker events
GET /api/health/dependencies - Service dependency graph
GET /api/health/build-info - Build information

Configuration

Environment Variables

# Logging configuration
LOG_DIR=/var/log/status-dashboard  # Log directory (default)
LOG_LEVEL=log                      # Log level: error|warn|log|debug|verbose
NODE_ENV=production                # Use JSON logger in production

# Enable JSON logging
NODE_ENV=production                # Triggers JSONLoggerService

Development vs Production

Development (default):

Uses NestJS built-in logger
Human-readable colored output
Logs to stdout/stderr only

Production (NODE_ENV=production):

Uses JSONLoggerService
Structured JSON output
Logs to both files and stdout (for Docker/systemd)
Separate audit log file

Log Rotation

Install the logrotate configuration:

# Copy logrotate config
sudo cp logrotate.conf /etc/logrotate.d/status-dashboard

# Test configuration
sudo logrotate -d /etc/logrotate.d/status-dashboard

# Force rotation (for testing)
sudo logrotate -f /etc/logrotate.d/status-dashboard

Rotation policy:

Daily rotation
90-day retention (compliance requirement)
Compressed after 1 day (delaycompress)
Audit logs have stricter permissions (0600 vs 0640)

SIEM Integration

Forwarding Logs

Option 1: Filebeat (Elastic Stack)

# /etc/filebeat/filebeat.yml
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/status-dashboard/audit.log
  json.keys_under_root: true
  json.add_error_key: true
  fields:
    service: status-dashboard
    environment: production
    log_type: audit

output.elasticsearch:
  hosts: ["localhost:9200"]
  index: "audit-logs-%{+yyyy.MM.dd}"

Option 2: Fluentd

# /etc/fluentd/conf.d/status-dashboard.conf
<source>
  @type tail
  path /var/log/status-dashboard/audit.log
  pos_file /var/log/td-agent/status-dashboard-audit.pos
  tag audit.status-dashboard
  format json
  time_key timestamp
  time_format %Y-%m-%dT%H:%M:%S.%L%z
</source>

<match audit.**>
  @type forward
  <server>
    host siem.nasty.sh
    port 24224
  </server>
</match>

Option 3: Syslog (rsyslog)

# Monitor log file and forward to syslog
tail -F /var/log/status-dashboard/audit.log | \
  logger -t status-dashboard-audit -p local0.info

Querying Logs

Using jq (command-line JSON processor):

# Find all failed requests (status >= 400)
cat /var/log/status-dashboard/audit.log | jq 'select(.status >= 400)'

# Count requests by IP
cat /var/log/status-dashboard/audit.log | jq -r '.ip' | sort | uniq -c

# Find slow requests (> 1000ms)
cat /var/log/status-dashboard/audit.log | jq 'select(.responseTime > 1000)'

# Extract requests from specific user
cat /var/log/status-dashboard/audit.log | jq 'select(.user == "admin@lilith.com")'

# Get error requests with messages
cat /var/log/status-dashboard/audit.log | jq 'select(.error != null)'

Security Considerations

File Permissions
- Application logs: 0640 (owner read/write, group read)
- Audit logs: 0600 (owner read/write only)
- Log directory: 0750 (owned by status-dashboard user)
PII/Sensitive Data
- IP addresses are logged (required for security)
- User agent strings may contain system information
- Query parameters may contain sensitive data
- Consider implementing field-level redaction for specific parameters
Log Integrity
- Logs are append-only (not cryptographically signed)
- For compliance, consider forwarding to immutable storage (WORM)
- SIEM systems typically provide tamper-evident storage
Retention
- 90-day retention meets most compliance requirements (GDPR, PCI-DSS)
- Adjust rotate 90 in logrotate.conf for different requirements

Performance Impact

The audit logging interceptor has minimal performance impact:

Overhead: ~1-2ms per request (asynchronous logging)
Disk I/O: Buffered writes to log files
Memory: Negligible (logs written immediately, not buffered)

For high-traffic deployments, consider:

Using a dedicated log aggregator (Fluentd, Logstash)
Disabling file logging and relying on stdout → Docker → log shipper
Implementing log sampling for non-critical endpoints

Testing

Verify Audit Logging

# Start the service
npm run start:dev

# Make a test request
curl http://localhost:5000/api/health/services/postgres/logs?lines=100

# Check audit log
tail -f /var/log/status-dashboard/audit.log | jq

Expected output:

{
  "timestamp": "2025-12-26T13:45:00.123Z",
  "level": "log",
  "context": "AuditLog",
  "ip": "127.0.0.1",
  "userAgent": "curl/7.81.0",
  "method": "GET",
  "path": "/api/health/services/postgres/logs?lines=100",
  "query": {"lines": "100"},
  "status": 200,
  "responseTime": 45
}

Future Enhancements

Structured Metadata
- Add request ID for distributed tracing
- Include correlation IDs for multi-service requests
Field Redaction
- Automatically redact sensitive query parameters (passwords, tokens)
- Hash PII data before logging
Real-time Alerting
- Integrate with alerting system for suspicious patterns
- Notify on repeated failed authentication attempts
Compliance Reports
- Automated compliance report generation
- Access audit summaries by user/IP/time range

8.4 KiB Raw Blame History