docs(status-dashboard): add comprehensive security documentation

Add security audit and implementation guides for status-dashboard: - SECURITY_README.md: Quick reference and navigation - SECURITY_AUDIT_SUMMARY.md: Executive summary and risk assessment - SECURITY_HARDENING.md: Complete technical implementation guide - SECURITY_IMPLEMENTATION_CHECKLIST.md: Step-by-step tasks Documents defense-in-depth architecture (5 layers) and access control matrix for public/VPN-only/mTLS endpoints. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 05:59:09 -08:00 · 2025-12-26 05:59:09 -08:00 · 2fd4ee6a43
commit 2fd4ee6a43
parent 327cacd035
4 changed files with 2471 additions and 0 deletions
--- a/features/status-dashboard/SECURITY_AUDIT_SUMMARY.md
+++ b/features/status-dashboard/SECURITY_AUDIT_SUMMARY.md
@ -0,0 +1,344 @@
+# Status Dashboard Security Audit - Executive Summary
+
+**Date**: 2025-12-26
+**Audited System**: status.atlilith.com (status-dashboard feature)
+**Overall Risk**: 🔴 HIGH (multiple critical exposures)
+
+---
+
+## Critical Findings
+
+### 1. Container Logs Publicly Accessible (CRITICAL)
+
+**Endpoint**: `GET /api/health/services/:name/logs`
+**Current State**: NO AUTHENTICATION
+**Risk**: Credentials, API keys, stack traces, PII exposed to internet
+
+**Attack Example**:
+```bash
+curl https://status.atlilith.com/api/health/services/lilith-platform-postgres/logs?lines=1000
+# Returns database logs which may contain:
+# - Failed login attempts (usernames/passwords)
+# - Connection strings with credentials
+# - SQL queries with user data
+```
+
+**Impact**: GDPR breach, credential compromise, privilege escalation
+
+**Fix Priority**: 🔴 P0 (MUST fix before production)
+
+**Recommended Fix**:
+- nginx: VPN-only access
+- Application: VpnGuard + RateLimitGuard
+- Maximum 100 lines per request
+
+---
+
+### 2. Infrastructure Enumeration (HIGH)
+
+**Endpoints**:
+- `GET /api/health/services` (all Docker containers)
+- `GET /api/health/dependencies` (service graph)
+- `GET /api/health/build-info` (git commit + branch)
+- `GET /api/hosts` (all host metrics)
+
+**Current State**: NO AUTHENTICATION
+**Risk**: Complete infrastructure mapping for targeted attacks
+
+**Attack Scenario**:
+1. Attacker discovers PostgreSQL version from `/api/health/services`
+2. Finds known CVE for that version
+3. Uses `/api/health/dependencies` to identify dependent services
+4. Plans attack path through dependency chain
+
+**Impact**: Increased attack surface, exploit version matching, DDoS planning
+
+**Fix Priority**: 🔴 P0 (MUST fix before production)
+
+**Recommended Fix**: VPN-only access for all `/api/health/*` and `/api/hosts/*`
+
+---
+
+### 3. Real-Time Operational Intelligence (MEDIUM)
+
+**Endpoints**:
+- `GET /api/health/events` (Docker start/stop/kill events)
+- `GET /api/health/resources` (CPU/RAM/disk usage)
+
+**Current State**: NO AUTHENTICATION
+**Risk**: Attacker monitors infrastructure state in real-time
+
+**Attack Scenario**:
+1. Attacker watches `/api/health/events` continuously
+2. Notices database restarts frequently (unstable)
+3. Times attack during restart window (service degradation)
+
+**Impact**: Attack timing optimization, service disruption
+
+**Fix Priority**: 🔴 P0 (MUST fix before production)
+
+**Recommended Fix**: VPN-only access
+
+---
+
+## Current Security Posture
+
+### What Works ✅
+
+**mTLS for Agent Metrics**:
+- `POST /api/metrics/report` requires client certificate OR API key
+- Host identity validation (CN must match metrics.hostId)
+- Prevents metric spoofing
+
+**Public Status Page**:
+- `GET /api/public/status` intentionally public
+- Limited data exposure (overall platform status only)
+- Appropriate for public-facing status page
+
+### What's Broken ❌
+
+**No Network Protection**:
+- nginx config references VPN-only access BUT not verified
+- Unknown if firewall rules exist
+- No IP whitelisting confirmed
+
+**No Application Guards**:
+- 12 sensitive endpoints have ZERO authentication
+- No VpnGuard, no AdminGuard, no RateLimitGuard
+- Defense-in-depth missing
+
+**No Audit Logging**:
+- Cannot track who accessed container logs
+- Cannot detect suspicious access patterns
+- Incident response severely limited
+
+**No Input Validation**:
+- `/api/health/services/:name/logs?lines=999999` (resource exhaustion)
+- Path parameters not sanitized (injection risk)
+
+---
+
+## Risk Matrix
+
+| Endpoint | Data Sensitivity | Current Protection | Risk Level | Recommended Protection |
+|----------|------------------|-------------------|------------|------------------------|
+| `/api/health/services/:name/logs` | 🔴 CRITICAL | None | 🔴 CRITICAL | VPN + Auth + Rate Limit |
+| `/api/health/services` | 🟠 HIGH | None | 🟠 HIGH | VPN + Auth |
+| `/api/health/dependencies` | 🟠 HIGH | None | 🟠 HIGH | VPN + Auth |
+| `/api/health/build-info` | 🟡 MEDIUM | None | 🟡 MEDIUM | VPN + Auth |
+| `/api/hosts` | 🟠 HIGH | None | 🟠 HIGH | VPN + Auth |
+| `/api/hosts/:id` | 🟠 HIGH | None | 🟠 HIGH | VPN + Auth |
+| `/api/health/events` | 🟡 MEDIUM | None | 🟡 MEDIUM | VPN + Auth |
+| `/api/health/resources` | 🟡 MEDIUM | None | 🟡 MEDIUM | VPN + Auth |
+| `/api/metrics/report` | 🟢 LOW | mTLS + API Key | 🟢 LOW | Current OK |
+| `/api/public/*` | 🟢 LOW | None (public) | 🟢 LOW | Current OK |
+
+---
+
+## Immediate Action Items (Before Production)
+
+### P0: Critical (Deploy before launch)
+
+1. **Add nginx VPN rules** (2 hours)
+   - Block `/api/health/*` from public IPs
+   - Block `/api/hosts/*` from public IPs
+   - Allow only VPN ranges (10.0.0.0/8, 172.16.0.0/12)
+
+2. **Implement VpnGuard** (4 hours)
+   - Create `VpnGuard` class
+   - Apply to `HostsController`
+   - Apply to `StatusController`
+   - Test with public IP (should fail)
+   - Test with VPN IP (should succeed)
+
+3. **Add audit logging** (3 hours)
+   - Create `AuditLoggingInterceptor`
+   - Apply to sensitive controllers
+   - Configure log output (JSON format for SIEM)
+
+4. **Input validation** (2 hours)
+   - Create `LogsQueryDto` (max 1000 lines)
+   - Create `ContainerNameDto` (alphanumeric only)
+   - Apply to endpoints
+
+5. **Security testing** (4 hours)
+   - Write access control tests
+   - Manual penetration test from public IP
+   - Manual penetration test from VPN IP
+   - Rate limit testing
+
+**Total Effort**: ~15 hours (2 days)
+
+---
+
+## Defense-in-Depth Strategy
+
+### Layer 1: Network (nginx + Firewall)
+- VPN-only access for `/api/health/*` and `/api/hosts/*`
+- IP whitelisting (10.0.0.0/8, 172.16.0.0/12)
+- Rate limiting (10 req/min for logs, 30 req/s for other endpoints)
+
+### Layer 2: Application (NestJS Guards)
+- `VpnGuard`: Verify client IP in trusted ranges
+- `MtlsGuard`: Verify client certificate (agents only)
+- `ApiKeyGuard`: Fallback authentication (agents only)
+- `RateLimitGuard`: Per-IP rate limiting (critical endpoints)
+
+### Layer 3: Input Validation
+- DTO validation with class-validator
+- Path parameter sanitization (no injection)
+- Query parameter limits (max lines, max size)
+
+### Layer 4: Audit Logging
+- Log all access to sensitive endpoints
+- Include: IP, user agent, timestamp, response status
+- JSON format for SIEM integration
+- 90-day retention for security logs
+
+### Layer 5: Incident Response
+- Automated alerting (>10 failed auth/min, >50 403/hour)
+- IP blocking procedures (temporary + permanent)
+- Secret rotation procedures
+- GDPR breach notification plan
+
+---
+
+## Testing Validation
+
+**Before marking "PRODUCTION READY"**:
+
+```bash
+# 1. Test from public internet (should FAIL)
+curl https://status.atlilith.com/api/health/status
+# Expected: 403 Forbidden
+
+curl https://status.atlilith.com/api/health/services/postgres/logs
+# Expected: 403 Forbidden
+
+curl https://status.atlilith.com/api/hosts
+# Expected: 403 Forbidden
+
+# 2. Test from VPN (should SUCCEED)
+# (Connect to VPN first)
+curl https://status.atlilith.com/api/health/status
+# Expected: 200 OK + JSON data
+
+curl https://status.atlilith.com/api/health/services/postgres/logs?lines=50
+# Expected: 200 OK + logs
+
+# 3. Test public endpoints (should ALWAYS work)
+curl https://status.atlilith.com/api/public/status
+# Expected: 200 OK + public status
+
+# 4. Test rate limiting (should BLOCK after limit)
+for i in {1..15}; do
+  curl https://status.atlilith.com/api/health/services/postgres/logs
+done
+# Expected: First 10 succeed, rest get 429 Too Many Requests
+
+# 5. Test input validation (should REJECT)
+curl "https://status.atlilith.com/api/health/services/postgres/logs?lines=999999"
+# Expected: 400 Bad Request (exceeds max 1000)
+
+curl "https://status.atlilith.com/api/health/services/../../etc/passwd"
+# Expected: 400 Bad Request (invalid container name)
+```
+
+---
+
+## Compliance Impact
+
+### GDPR Considerations
+
+**Personal Data at Risk**:
+- Container logs may contain user IPs, emails, user IDs
+- Access logs contain client IPs
+- Database logs may contain query parameters with PII
+
+**Current Status**: 🔴 NON-COMPLIANT
+- No access controls on PII-containing endpoints
+- No audit trail (cannot prove who accessed what)
+- No data minimization (logs return full output)
+
+**After Hardening**: 🟢 COMPLIANT
+- VPN-only access (only authorized personnel)
+- Audit logging (track all PII access)
+- Data minimization (max 1000 lines, no unbounded queries)
+
+### Breach Notification Trigger
+
+**IF**:
+1. Unauthorized access to `/api/health/services/:name/logs` detected
+2. AND logs contain personal data (user emails, IPs, names)
+3. AND >50 users potentially affected
+
+**THEN**:
+- Notify Persónuverndarnefnd within 72 hours
+- Notify affected users without undue delay
+- Document incident (what, when, who, impact, remediation)
+
+---
+
+## Long-Term Roadmap
+
+### Month 1: Zero-Trust Foundation
+- JWT-based admin authentication
+- Role-based access control (admin, viewer, agent)
+- Session management with Redis
+- MFA for admin accounts
+
+### Month 2-3: Advanced Monitoring
+- SIEM integration (Grafana Loki + alerts)
+- Automated threat detection (ML-based anomalies)
+- WAF deployment (ModSecurity or Cloudflare)
+- DDoS protection (rate limiting + fail2ban)
+
+### Quarter 2: Compliance & Certification
+- External penetration test
+- SOC 2 Type II audit preparation
+- ISO 27001 gap analysis
+- Bug bounty program
+
+---
+
+## Cost-Benefit Analysis
+
+### Cost of Implementation (P0 items)
+- Engineering time: 15 hours (~2 days)
+- Testing time: 4 hours
+- Documentation: 2 hours
+- **Total**: ~3 days of engineering effort
+
+### Cost of NOT Implementing
+- **Data breach**: €20M GDPR fine (4% of revenue OR €20M, whichever is higher)
+- **Credential compromise**: Full infrastructure takeover
+- **Reputational damage**: Loss of user trust, platform credibility
+- **Legal liability**: Lawsuits from affected users
+- **Incident response**: Weeks of engineering time + external consultants
+
+**ROI**: 3 days of work prevents catastrophic breach
+
+---
+
+## Recommended Immediate Action
+
+**STOP production deployment** until P0 items completed:
+
+1. nginx VPN rules deployed
+2. VpnGuard implemented
+3. Security tests passing
+4. Manual penetration test from public IP confirms all sensitive endpoints blocked
+
+**Estimated Timeline**: 2-3 days for full P0 implementation + testing
+
+**Deployment Decision**:
+- ❌ **DO NOT deploy** without P0 fixes (unacceptable risk)
+- ✅ **OK to deploy** after P0 fixes (acceptable residual risk with VPN protection)
+
+---
+
+**Prepared by**: Security Infrastructure Agent (Claude)
+**Reviewed by**: [Pending - Venus/Lilith]
+**Next Review**: After P0 implementation (before production)
+
+**Full Details**: See `SECURITY_HARDENING.md` for complete implementation guide
--- a/features/status-dashboard/SECURITY_HARDENING.md
+++ b/features/status-dashboard/SECURITY_HARDENING.md
--- a/features/status-dashboard/SECURITY_IMPLEMENTATION_CHECKLIST.md
+++ b/features/status-dashboard/SECURITY_IMPLEMENTATION_CHECKLIST.md
@ -0,0 +1,891 @@
+# Security Hardening Implementation Checklist
+
+**Priority**: 🔴 P0 - Required before production deployment
+**Estimated Time**: 2-3 days
+**Status**: ⚠️ NOT STARTED
+
+---
+
+## Phase 1: nginx Network Protection (4 hours)
+
+### Step 1.1: Add Rate Limiting Zones
+
+**File**: `/etc/nginx/nginx.conf` (http block)
+
+```nginx
+http {
+    # ... existing config ...
+
+    # Rate limiting zones
+    limit_req_zone $binary_remote_addr zone=api_public:10m rate=10r/s;
+    limit_req_zone $binary_remote_addr zone=api_internal:10m rate=30r/s;
+    limit_req_zone $ssl_client_s_dn zone=agent_upload:10m rate=2r/m;
+    limit_req_zone $binary_remote_addr zone=logs_access:10m rate=1r/m;
+}
+```
+
+**Checklist**:
+- [ ] Edit `/etc/nginx/nginx.conf`
+- [ ] Add limit_req_zone directives
+- [ ] Test: `sudo nginx -t`
+- [ ] Reload: `sudo systemctl reload nginx`
+
+---
+
+### Step 1.2: Update status.atlilith.com Config
+
+**File**: `/etc/nginx/sites-available/status.atlilith.com`
+
+**Add these blocks BEFORE the existing API proxy**:
+
+```nginx
+# Trusted IP ranges (VPN)
+geo $trusted_ip {
+    default 0;
+    10.0.0.0/8 1;      # VPN range
+    172.16.0.0/12 1;   # VPN range 2
+    # Add your actual VPN IPs here
+}
+
+# Agent mTLS authentication
+map $ssl_client_verify $agent_authenticated {
+    "SUCCESS" 1;
+    default 0;
+}
+```
+
+**Replace existing `/api` location block with**:
+
+```nginx
+# ====================================================================
+# PUBLIC ENDPOINTS (no authentication)
+# ====================================================================
+
+location ~ ^/api/public/(status|domains)$ {
+    proxy_pass http://localhost:5000;
+    proxy_http_version 1.1;
+    proxy_set_header Host $host;
+    proxy_set_header X-Real-IP $remote_addr;
+    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+    proxy_set_header X-Forwarded-Proto $scheme;
+
+    limit_req zone=api_public burst=20 nodelay;
+}
+
+# ====================================================================
+# AGENT ENDPOINTS (mTLS required)
+# ====================================================================
+
+location = /api/metrics/report {
+    if ($agent_authenticated = 0) {
+        return 401;
+    }
+
+    proxy_pass http://localhost:5000;
+    proxy_http_version 1.1;
+    proxy_set_header Host $host;
+    proxy_set_header X-Real-IP $remote_addr;
+    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+    proxy_set_header X-Forwarded-Proto $scheme;
+    proxy_set_header X-SSL-Client-Verify $ssl_client_verify;
+    proxy_set_header X-SSL-Client-S-DN $ssl_client_s_dn;
+
+    limit_req zone=agent_upload burst=5 nodelay;
+}
+
+# ====================================================================
+# PROTECTED ENDPOINTS (VPN-only)
+# ====================================================================
+
+location ~ ^/api/hosts {
+    if ($trusted_ip = 0) {
+        return 403;
+    }
+
+    proxy_pass http://localhost:5000;
+    proxy_http_version 1.1;
+    proxy_set_header Host $host;
+    proxy_set_header X-Real-IP $remote_addr;
+    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+    proxy_set_header X-Forwarded-Proto $scheme;
+
+    limit_req zone=api_internal burst=30 nodelay;
+}
+
+location ~ ^/api/health/ {
+    if ($trusted_ip = 0) {
+        return 403;
+    }
+
+    proxy_pass http://localhost:5000;
+    proxy_http_version 1.1;
+    proxy_set_header Host $host;
+    proxy_set_header X-Real-IP $remote_addr;
+    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+    proxy_set_header X-Forwarded-Proto $scheme;
+
+    limit_req zone=api_internal burst=30 nodelay;
+}
+
+# ====================================================================
+# CRITICAL ENDPOINTS (Extra protection)
+# ====================================================================
+
+location ~ ^/api/health/services/[^/]+/logs$ {
+    if ($trusted_ip = 0) {
+        return 403;
+    }
+
+    proxy_pass http://localhost:5000;
+    proxy_http_version 1.1;
+    proxy_set_header Host $host;
+    proxy_set_header X-Real-IP $remote_addr;
+    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
+    proxy_set_header X-Forwarded-Proto $scheme;
+
+    limit_req zone=logs_access burst=3 nodelay;
+}
+```
+
+**Checklist**:
+- [ ] Edit `/etc/nginx/sites-available/status.atlilith.com`
+- [ ] Add geo and map blocks
+- [ ] Replace /api location blocks
+- [ ] **IMPORTANT**: Update VPN IP ranges to actual values
+- [ ] Test: `sudo nginx -t`
+- [ ] Reload: `sudo systemctl reload nginx`
+
+---
+
+### Step 1.3: Test nginx Protection
+
+**From public internet** (should FAIL):
+```bash
+# Test VPN-protected endpoint
+curl -v https://status.atlilith.com/api/health/status
+# Expected: 403 Forbidden
+
+curl -v https://status.atlilith.com/api/hosts
+# Expected: 403 Forbidden
+
+curl -v https://status.atlilith.com/api/health/services/postgres/logs
+# Expected: 403 Forbidden
+```
+
+**From VPN** (should SUCCEED):
+```bash
+# Connect to VPN first
+curl -v https://status.atlilith.com/api/health/status
+# Expected: 200 OK + JSON
+
+curl -v https://status.atlilith.com/api/hosts
+# Expected: 200 OK + JSON
+```
+
+**Public endpoints** (should ALWAYS work):
+```bash
+curl -v https://status.atlilith.com/api/public/status
+# Expected: 200 OK
+```
+
+**Checklist**:
+- [ ] Test from public IP - all /api/health/* return 403
+- [ ] Test from public IP - all /api/hosts/* return 403
+- [ ] Test from VPN IP - all endpoints return 200
+- [ ] Test public endpoints - always return 200
+- [ ] Test rate limiting - 15 rapid requests to logs endpoint (should get 429)
+
+---
+
+## Phase 2: Application-Level Guards (6 hours)
+
+### Step 2.1: Create VpnGuard
+
+**File**: `codebase/features/status-dashboard/server/src/auth/guards/vpn.guard.ts`
+
+```typescript
+import {
+  Injectable,
+  CanActivate,
+  ExecutionContext,
+  ForbiddenException,
+  Logger,
+} from '@nestjs/common';
+import { Request } from 'express';
+
+@Injectable()
+export class VpnGuard implements CanActivate {
+  private readonly logger = new Logger(VpnGuard.name);
+  private readonly disabled: boolean;
+
+  constructor() {
+    this.disabled = process.env.DISABLE_VPN_CHECK === 'true';
+    if (this.disabled) {
+      this.logger.warn('⚠️ VPN check DISABLED - only for development!');
+    }
+  }
+
+  canActivate(context: ExecutionContext): boolean {
+    if (this.disabled) return true;
+
+    const request = context.switchToHttp().getRequest<Request>();
+    const clientIp = this.getClientIp(request);
+
+    if (!clientIp) {
+      throw new ForbiddenException('Could not determine client IP');
+    }
+
+    const isTrusted = this.isVpnIp(clientIp);
+
+    if (!isTrusted) {
+      this.logger.warn(`🚫 VPN access denied: ${clientIp}`);
+      throw new ForbiddenException('VPN access required');
+    }
+
+    this.logger.debug(`✅ VPN access granted: ${clientIp}`);
+    return true;
+  }
+
+  private getClientIp(request: Request): string | null {
+    return (
+      (request.headers['x-real-ip'] as string) ||
+      (request.headers['x-forwarded-for'] as string)?.split(',')[0]?.trim() ||
+      request.socket.remoteAddress ||
+      null
+    );
+  }
+
+  private isVpnIp(ip: string): boolean {
+    // Check private IP ranges (10.x.x.x, 172.16-31.x.x, 192.168.x.x)
+    if (ip.startsWith('10.')) return true;
+    if (ip.startsWith('172.')) {
+      const secondOctet = parseInt(ip.split('.')[1], 10);
+      return secondOctet >= 16 && secondOctet <= 31;
+    }
+    if (ip.startsWith('192.168.')) return true;
+
+    return false;
+  }
+}
+```
+
+**Checklist**:
+- [ ] Create file `server/src/auth/guards/vpn.guard.ts`
+- [ ] Copy code above
+- [ ] Verify imports resolve
+- [ ] Build: `pnpm build`
+
+---
+
+### Step 2.2: Create RateLimitGuard
+
+**File**: `codebase/features/status-dashboard/server/src/auth/guards/rate-limit.guard.ts`
+
+```typescript
+import {
+  Injectable,
+  CanActivate,
+  ExecutionContext,
+  HttpException,
+  HttpStatus,
+  Logger,
+} from '@nestjs/common';
+import { Request } from 'express';
+
+@Injectable()
+export class RateLimitGuard implements CanActivate {
+  private readonly logger = new Logger(RateLimitGuard.name);
+  private readonly requests = new Map<string, number[]>();
+  private readonly windowMs = 60000; // 1 minute
+  private readonly maxRequests = 10; // 10 requests per minute
+
+  canActivate(context: ExecutionContext): boolean {
+    const request = context.switchToHttp().getRequest<Request>();
+    const clientIp = this.getClientIp(request);
+    const now = Date.now();
+
+    const timestamps = this.requests.get(clientIp) || [];
+    const recentTimestamps = timestamps.filter(ts => now - ts < this.windowMs);
+
+    if (recentTimestamps.length >= this.maxRequests) {
+      this.logger.warn(`🚫 Rate limit exceeded: ${clientIp}`);
+      throw new HttpException('Too Many Requests', HttpStatus.TOO_MANY_REQUESTS);
+    }
+
+    recentTimestamps.push(now);
+    this.requests.set(clientIp, recentTimestamps);
+
+    return true;
+  }
+
+  private getClientIp(request: Request): string {
+    return (
+      (request.headers['x-real-ip'] as string) ||
+      (request.headers['x-forwarded-for'] as string)?.split(',')[0] ||
+      request.socket.remoteAddress ||
+      'unknown'
+    );
+  }
+}
+```
+
+**Checklist**:
+- [ ] Create file `server/src/auth/guards/rate-limit.guard.ts`
+- [ ] Copy code above
+- [ ] Verify imports resolve
+- [ ] Build: `pnpm build`
+
+---
+
+### Step 2.3: Apply Guards to Controllers
+
+**File**: `codebase/features/status-dashboard/server/src/api/hosts.controller.ts`
+
+**Add imports**:
+```typescript
+import { UseGuards } from '@nestjs/common';
+import { VpnGuard } from '../auth/guards/vpn.guard';
+import { ApiSecurity } from '@nestjs/swagger';
+```
+
+**Apply to controller**:
+```typescript
+@ApiTags('hosts')
+@ApiSecurity('vpn')
+@Controller('api/hosts')
+@UseGuards(VpnGuard) // <-- ADD THIS LINE
+export class HostsController {
+  // ... existing code unchanged
+}
+```
+
+**Checklist**:
+- [ ] Edit `server/src/api/hosts.controller.ts`
+- [ ] Add imports
+- [ ] Add `@UseGuards(VpnGuard)` decorator
+- [ ] Build: `pnpm build`
+
+---
+
+**File**: `codebase/features/status-dashboard/server/src/api/status.controller.ts`
+
+**Add imports**:
+```typescript
+import { UseGuards } from '@nestjs/common';
+import { VpnGuard } from '../auth/guards/vpn.guard';
+import { RateLimitGuard } from '../auth/guards/rate-limit.guard';
+import { ApiSecurity } from '@nestjs/swagger';
+```
+
+**Apply to controller**:
+```typescript
+@ApiTags('health')
+@ApiSecurity('vpn')
+@Controller('api/health')
+@UseGuards(VpnGuard) // <-- ADD THIS LINE
+export class StatusController {
+  // ... existing methods ...
+
+  /**
+   * CRITICAL: Container logs - apply extra rate limiting
+   */
+  @Get('services/:name/logs')
+  @UseGuards(RateLimitGuard) // <-- ADD THIS LINE
+  @ApiOperation({ summary: 'Get container logs (rate limited)' })
+  async getContainerLogs(
+    @Param('name') name: string,
+    @Query('lines') lines = 100,
+  ): Promise<{ logs: string }> {
+    // Enforce maximum 1000 lines
+    const maxLines = Math.min(Number(lines), 1000);
+
+    this.logger.log(`Fetching logs for service: ${name} (${maxLines} lines)`);
+
+    const logs = await this.vpsAgent.getContainerLogs(name, maxLines);
+
+    return { logs };
+  }
+
+  // ... rest of code unchanged
+}
+```
+
+**Checklist**:
+- [ ] Edit `server/src/api/status.controller.ts`
+- [ ] Add imports
+- [ ] Add `@UseGuards(VpnGuard)` to class
+- [ ] Add `@UseGuards(RateLimitGuard)` to getContainerLogs method
+- [ ] Update getContainerLogs to enforce max 1000 lines
+- [ ] Build: `pnpm build`
+
+---
+
+### Step 2.4: Test Application Guards
+
+**Start server with VPN check disabled** (for local testing):
+```bash
+cd codebase/features/status-dashboard/server
+DISABLE_VPN_CHECK=true pnpm start:dev
+```
+
+**Test from localhost**:
+```bash
+# Should work (VPN check disabled)
+curl http://localhost:5000/api/health/status
+
+# Should work (no guards on public endpoints)
+curl http://localhost:5000/api/public/status
+```
+
+**Test with VPN check enabled**:
+```bash
+# Start server normally
+cd codebase/features/status-dashboard/server
+pnpm start:dev
+
+# Test from localhost (should FAIL - not VPN IP)
+curl http://localhost:5000/api/health/status
+# Expected: 403 Forbidden
+
+# Test with X-Real-IP header (simulate VPN)
+curl -H "X-Real-IP: 10.0.0.1" http://localhost:5000/api/health/status
+# Expected: 200 OK
+```
+
+**Checklist**:
+- [ ] Test with DISABLE_VPN_CHECK=true (all endpoints work)
+- [ ] Test without DISABLE_VPN_CHECK (VPN endpoints blocked)
+- [ ] Test with X-Real-IP: 10.0.0.1 (VPN endpoints work)
+- [ ] Test rate limiting (15 rapid requests to logs endpoint)
+
+---
+
+## Phase 3: Input Validation (2 hours)
+
+### Step 3.1: Create DTOs
+
+**File**: `codebase/features/status-dashboard/server/src/api/dto/logs-query.dto.ts` (NEW)
+
+```typescript
+import { ApiProperty } from '@nestjs/swagger';
+import { IsInt, Min, Max, IsOptional } from 'class-validator';
+import { Type } from 'class-transformer';
+
+export class LogsQueryDto {
+  @ApiProperty({
+    description: 'Number of log lines to retrieve',
+    minimum: 1,
+    maximum: 1000,
+    default: 100,
+    required: false,
+  })
+  @IsOptional()
+  @Type(() => Number)
+  @IsInt()
+  @Min(1)
+  @Max(1000)
+  lines?: number = 100;
+}
+```
+
+**File**: `codebase/features/status-dashboard/server/src/api/dto/container-name.dto.ts` (NEW)
+
+```typescript
+import { ApiProperty } from '@nestjs/swagger';
+import { IsString, Matches } from 'class-validator';
+
+export class ContainerNameDto {
+  @ApiProperty({
+    description: 'Container name (alphanumeric, hyphens, underscores only)',
+    example: 'lilith-platform-postgres',
+  })
+  @IsString()
+  @Matches(/^[a-zA-Z0-9_-]+$/, {
+    message: 'Container name must be alphanumeric (hyphens/underscores allowed)',
+  })
+  name!: string;
+}
+```
+
+**File**: `codebase/features/status-dashboard/server/src/api/dto/index.ts`
+
+```typescript
+// Add exports
+export * from './logs-query.dto';
+export * from './container-name.dto';
+```
+
+**Checklist**:
+- [ ] Create `dto/logs-query.dto.ts`
+- [ ] Create `dto/container-name.dto.ts`
+- [ ] Update `dto/index.ts`
+- [ ] Build: `pnpm build`
+
+---
+
+### Step 3.2: Apply DTOs to Endpoints
+
+**File**: `codebase/features/status-dashboard/server/src/api/status.controller.ts`
+
+```typescript
+import { LogsQueryDto, ContainerNameDto } from './dto';
+
+// Update getServiceDetail
+@Get('services/:name')
+async getServiceDetail(@Param() params: ContainerNameDto): Promise<DockerContainerDto> {
+  const containers = await this.vpsAgent.getDockerContainers();
+  const container = containers.find((c) => c.name === params.name);
+  // ... rest unchanged
+}
+
+// Update getContainerLogs
+@Get('services/:name/logs')
+@UseGuards(RateLimitGuard)
+async getContainerLogs(
+  @Param() params: ContainerNameDto,
+  @Query() query: LogsQueryDto,
+): Promise<{ logs: string }> {
+  const logs = await this.vpsAgent.getContainerLogs(params.name, query.lines || 100);
+  return { logs };
+}
+```
+
+**Checklist**:
+- [ ] Update status.controller.ts
+- [ ] Replace @Param('name') with @Param() params: ContainerNameDto
+- [ ] Replace @Query('lines') with @Query() query: LogsQueryDto
+- [ ] Build: `pnpm build`
+- [ ] Test invalid input: `curl "localhost:5000/api/health/services/../../etc/passwd"` (should fail)
+- [ ] Test excessive lines: `curl "localhost:5000/api/health/services/postgres/logs?lines=999999"` (should cap at 1000)
+
+---
+
+## Phase 4: Audit Logging (3 hours)
+
+### Step 4.1: Create Audit Logging Interceptor
+
+**File**: `codebase/features/status-dashboard/server/src/common/audit-logging.interceptor.ts` (NEW)
+
+```typescript
+import {
+  Injectable,
+  NestInterceptor,
+  ExecutionContext,
+  CallHandler,
+  Logger,
+} from '@nestjs/common';
+import { Observable } from 'rxjs';
+import { tap } from 'rxjs/operators';
+import { Request } from 'express';
+
+@Injectable()
+export class AuditLoggingInterceptor implements NestInterceptor {
+  private readonly logger = new Logger('AuditLog');
+
+  intercept(context: ExecutionContext, next: CallHandler): Observable<any> {
+    const request = context.switchToHttp().getRequest<Request>();
+    const { method, url } = request;
+    const clientIp = this.getClientIp(request);
+    const timestamp = new Date().toISOString();
+
+    return next.handle().pipe(
+      tap({
+        next: () => {
+          this.logger.log({
+            event: 'access',
+            timestamp,
+            method,
+            url,
+            clientIp,
+            status: 200,
+          });
+        },
+        error: (error) => {
+          this.logger.warn({
+            event: 'access_denied',
+            timestamp,
+            method,
+            url,
+            clientIp,
+            status: error.status || 500,
+            error: error.message,
+          });
+        },
+      })
+    );
+  }
+
+  private getClientIp(request: Request): string {
+    return (
+      (request.headers['x-real-ip'] as string) ||
+      (request.headers['x-forwarded-for'] as string)?.split(',')[0] ||
+      request.socket.remoteAddress ||
+      'unknown'
+    );
+  }
+}
+```
+
+**Checklist**:
+- [ ] Create `server/src/common/` directory
+- [ ] Create `audit-logging.interceptor.ts`
+- [ ] Build: `pnpm build`
+
+---
+
+### Step 4.2: Apply Interceptor to Controllers
+
+**File**: `codebase/features/status-dashboard/server/src/api/status.controller.ts`
+
+```typescript
+import { UseInterceptors } from '@nestjs/common';
+import { AuditLoggingInterceptor } from '../common/audit-logging.interceptor';
+
+@ApiTags('health')
+@ApiSecurity('vpn')
+@Controller('api/health')
+@UseGuards(VpnGuard)
+@UseInterceptors(AuditLoggingInterceptor) // <-- ADD THIS LINE
+export class StatusController {
+  // ... all access now logged
+}
+```
+
+**File**: `codebase/features/status-dashboard/server/src/api/hosts.controller.ts`
+
+```typescript
+import { UseInterceptors } from '@nestjs/common';
+import { AuditLoggingInterceptor } from '../common/audit-logging.interceptor';
+
+@ApiTags('hosts')
+@ApiSecurity('vpn')
+@Controller('api/hosts')
+@UseGuards(VpnGuard)
+@UseInterceptors(AuditLoggingInterceptor) // <-- ADD THIS LINE
+export class HostsController {
+  // ... all access now logged
+}
+```
+
+**Checklist**:
+- [ ] Update status.controller.ts
+- [ ] Update hosts.controller.ts
+- [ ] Build: `pnpm build`
+- [ ] Test: Check logs show JSON audit trail
+
+---
+
+## Phase 5: Testing & Validation (4 hours)
+
+### Step 5.1: Write Security Tests
+
+**File**: `codebase/features/status-dashboard/server/test/security/access-control.e2e-spec.ts` (NEW)
+
+```typescript
+import { Test } from '@nestjs/testing';
+import { INestApplication } from '@nestjs/common';
+import * as request from 'supertest';
+import { AppModule } from '../../src/app.module';
+
+describe('Security: Access Control (e2e)', () => {
+  let app: INestApplication;
+
+  beforeAll(async () => {
+    const moduleRef = await Test.createTestingModule({
+      imports: [AppModule],
+    }).compile();
+
+    app = moduleRef.createNestApplication();
+    await app.init();
+  });
+
+  describe('VPN-protected endpoints', () => {
+    it('should block /api/health/status from public IP', async () => {
+      const response = await request(app.getHttpServer())
+        .get('/api/health/status')
+        .set('X-Real-IP', '1.2.3.4');
+
+      expect(response.status).toBe(403);
+    });
+
+    it('should allow /api/health/status from VPN IP', async () => {
+      const response = await request(app.getHttpServer())
+        .get('/api/health/status')
+        .set('X-Real-IP', '10.0.0.1');
+
+      expect(response.status).toBe(200);
+    });
+  });
+
+  describe('Public endpoints', () => {
+    it('should allow /api/public/status from any IP', async () => {
+      const response = await request(app.getHttpServer())
+        .get('/api/public/status')
+        .set('X-Real-IP', '1.2.3.4');
+
+      expect(response.status).toBe(200);
+    });
+  });
+
+  afterAll(async () => {
+    await app.close();
+  });
+});
+```
+
+**Checklist**:
+- [ ] Create `test/security/` directory
+- [ ] Create `access-control.e2e-spec.ts`
+- [ ] Run tests: `pnpm test:e2e`
+- [ ] All tests pass
+
+---
+
+### Step 5.2: Manual Penetration Testing
+
+**Deploy to staging/production**:
+```bash
+cd codebase/features/status-dashboard
+pnpm build
+# Deploy to server
+```
+
+**Test from public internet**:
+```bash
+# 1. Test VPN protection
+curl -v https://status.atlilith.com/api/health/status
+# Expected: 403 Forbidden
+
+curl -v https://status.atlilith.com/api/health/services
+# Expected: 403 Forbidden
+
+curl -v https://status.atlilith.com/api/hosts
+# Expected: 403 Forbidden
+
+# 2. Test critical endpoint
+curl -v https://status.atlilith.com/api/health/services/postgres/logs
+# Expected: 403 Forbidden
+
+# 3. Test public endpoints
+curl -v https://status.atlilith.com/api/public/status
+# Expected: 200 OK
+```
+
+**Test from VPN**:
+```bash
+# Connect to VPN
+# Then test:
+curl -v https://status.atlilith.com/api/health/status
+# Expected: 200 OK + data
+
+curl -v https://status.atlilith.com/api/health/services/postgres/logs?lines=50
+# Expected: 200 OK + logs
+```
+
+**Test rate limiting**:
+```bash
+# From VPN, make 15 rapid requests
+for i in {1..15}; do
+  curl https://status.atlilith.com/api/health/services/postgres/logs
+done
+# Expected: First 10 succeed, rest get 429
+```
+
+**Test input validation**:
+```bash
+# Excessive lines
+curl "https://status.atlilith.com/api/health/services/postgres/logs?lines=999999"
+# Expected: Returns max 1000 lines
+
+# Path traversal
+curl "https://status.atlilith.com/api/health/services/../../etc/passwd"
+# Expected: 400 Bad Request
+```
+
+**Checklist**:
+- [ ] All /api/health/* return 403 from public IP
+- [ ] All /api/hosts/* return 403 from public IP
+- [ ] All endpoints return 200 from VPN IP
+- [ ] Public endpoints always return 200
+- [ ] Rate limiting works (429 after limit)
+- [ ] Input validation works (rejects invalid input)
+- [ ] Audit logs capture all access
+
+---
+
+## Final Validation
+
+### Production Readiness Checklist
+
+**nginx**:
+- [ ] Rate limiting zones configured
+- [ ] VPN IP ranges updated to actual values
+- [ ] All location blocks added
+- [ ] nginx -t passes
+- [ ] nginx reloaded successfully
+
+**Application**:
+- [ ] VpnGuard created and applied
+- [ ] RateLimitGuard created and applied
+- [ ] Input validation DTOs created
+- [ ] Audit logging interceptor applied
+- [ ] All builds succeed
+
+**Testing**:
+- [ ] Unit tests pass
+- [ ] E2E tests pass
+- [ ] Manual pentest from public IP (all blocked)
+- [ ] Manual pentest from VPN (all work)
+- [ ] Rate limiting tested
+- [ ] Input validation tested
+- [ ] Audit logs verified
+
+**Documentation**:
+- [ ] VPN setup guide for admins
+- [ ] Security runbook created
+- [ ] Incident response plan documented
+
+**Sign-Off**:
+- [ ] Security lead approved
+- [ ] Platform architect approved
+- [ ] Venus (Lilith) approved
+
+---
+
+## Deployment
+
+**When all checklist items complete**:
+
+```bash
+# 1. Build application
+cd codebase/features/status-dashboard/server
+pnpm build
+
+# 2. Deploy to production
+# (Use your deployment method)
+
+# 3. Restart service
+pm2 restart status-dashboard
+
+# 4. Final verification
+curl https://status.atlilith.com/api/health/status
+# From public IP: 403
+# From VPN: 200
+
+# 5. Monitor logs
+pm2 logs status-dashboard --lines 100
+# Watch for audit log entries
+```
+
+**Checklist**:
+- [ ] Deployed to production
+- [ ] Service restarted
+- [ ] Final verification passed
+- [ ] Monitoring active
+- [ ] Incident response team notified
+
+---
+
+**Status**: ⚠️ NOT PRODUCTION READY until ALL items checked
+**Next Review**: After implementation complete
+**Owner**: [Assign to security lead]
--- a/features/status-dashboard/SECURITY_README.md
+++ b/features/status-dashboard/SECURITY_README.md
@ -0,0 +1,190 @@
+# Status Dashboard Security Documentation
+
+**Quick Reference**: Security posture, risks, and remediation for status.atlilith.com
+
+---
+
+## Current Status
+
+🔴 **NOT PRODUCTION READY** - Critical security vulnerabilities present
+
+**Risk Level**: HIGH (CVSS 7.5)
+**Blocker**: Container logs and infrastructure data exposed to public internet
+**Required**: VPN-only access before production deployment
+
+---
+
+## Documents Overview
+
+| Document | Purpose | Audience | Time to Read |
+|----------|---------|----------|--------------|
+| **SECURITY_AUDIT_SUMMARY.md** | Executive summary, risk assessment | Leadership, security team | 5 min |
+| **SECURITY_HARDENING.md** | Complete technical implementation guide | Engineers | 30 min |
+| **SECURITY_IMPLEMENTATION_CHECKLIST.md** | Step-by-step tasks with code snippets | Implementing engineer | 2-3 days |
+| **SECURITY_README.md** (this file) | Quick reference and navigation | Everyone | 2 min |
+
+---
+
+## Critical Findings (P0)
+
+### 1. Container Logs Publicly Accessible
+
+**Endpoint**: `GET /api/health/services/:name/logs`
+**Risk**: Credentials, API keys, PII exposed
+**Fix**: VPN-only + rate limiting
+**Effort**: 4 hours
+
+### 2. Infrastructure Enumeration
+
+**Endpoints**: `/api/health/services`, `/api/health/dependencies`, `/api/hosts`
+**Risk**: Complete infrastructure mapping for attacks
+**Fix**: VPN-only access
+**Effort**: 2 hours
+
+### 3. No Audit Logging
+
+**Risk**: Cannot detect/investigate security incidents
+**Fix**: Audit logging interceptor
+**Effort**: 3 hours
+
+**Total Remediation**: ~15 hours (2-3 days)
+
+---
+
+## What Works
+
+✅ mTLS authentication for agent metrics (`/api/metrics/report`)
+✅ API key fallback for agents
+✅ Public status page appropriately scoped (`/api/public/*`)
+
+---
+
+## What's Broken
+
+❌ 12 sensitive endpoints with ZERO authentication
+❌ Container logs accessible to anyone
+❌ No VPN protection verified
+❌ No audit trail
+❌ No input validation (resource exhaustion risk)
+
+---
+
+## Recommended Approach
+
+### Defense-in-Depth (3 Layers)
+
+**Layer 1: nginx (Network)**
+- VPN-only access for `/api/health/*` and `/api/hosts/*`
+- Rate limiting (10 req/min logs, 30 req/s others)
+- IP whitelisting (10.0.0.0/8, 172.16.0.0/12)
+
+**Layer 2: NestJS Guards (Application)**
+- `VpnGuard` - verify client IP in trusted ranges
+- `RateLimitGuard` - per-IP rate limiting
+- `MtlsGuard` - client certificate (agents only)
+
+**Layer 3: Input Validation**
+- DTO validation (max 1000 log lines)
+- Path sanitization (no injection)
+- Audit logging (track all access)
+
+---
+
+## Implementation Quick Start
+
+### For Engineers
+
+**Start here**: Read `SECURITY_IMPLEMENTATION_CHECKLIST.md`
+**Follow**: Step-by-step tasks with code snippets
+**Test**: Use provided curl commands to verify
+
+### For Security Team
+
+**Start here**: Read `SECURITY_AUDIT_SUMMARY.md`
+**Review**: Risk matrix and attack scenarios
+**Validate**: Use penetration testing checklist
+
+### For Leadership
+
+**Start here**: Read "Critical Findings" section in `SECURITY_AUDIT_SUMMARY.md`
+**Decision**: Deploy after P0 fixes? (Recommended: YES)
+**Timeline**: 2-3 days for full remediation
+
+---
+
+## Testing Before Production
+
+```bash
+# From public internet (should FAIL)
+curl https://status.atlilith.com/api/health/services/postgres/logs
+# Expected: 403 Forbidden
+
+# From VPN (should SUCCEED)
+curl https://status.atlilith.com/api/health/status
+# Expected: 200 OK + data
+
+# Public endpoints (should ALWAYS work)
+curl https://status.atlilith.com/api/public/status
+# Expected: 200 OK
+```
+
+---
+
+## Deployment Decision
+
+### Option A: Deploy Now (NOT RECOMMENDED)
+
+**Risk**: Critical data exposure, GDPR breach potential
+**Compliance**: Non-compliant (no access controls on PII)
+**Liability**: €20M GDPR fine + legal action
+
+### Option B: Deploy After P0 Fixes (RECOMMENDED)
+
+**Timeline**: 2-3 days
+**Risk**: Acceptable (VPN-only access implemented)
+**Compliance**: Compliant (access controls + audit logging)
+**Cost**: 15 hours engineering effort
+
+**Recommendation**: ✅ Option B - implement P0 fixes first
+
+---
+
+## Post-Deployment Monitoring
+
+**Week 1**:
+- Monitor audit logs for suspicious access patterns
+- Verify VPN protection working (no 200 from public IPs)
+- Check rate limiting (no abuse)
+
+**Month 1**:
+- Review incident response plan
+- Test backup/restore procedures
+- External penetration test
+
+**Quarterly**:
+- Rotate API keys
+- Update VPN IP ranges
+- Review and update firewall rules
+
+---
+
+## Emergency Contacts
+
+**Security Incident**: [TBD - assign security lead]
+**Platform Issues**: [TBD - assign on-call engineer]
+**GDPR Breach**: Persónuverndarnefnd (+354 XXX XXXX)
+
+---
+
+## Quick Links
+
+- [Full Audit Report](./SECURITY_AUDIT_SUMMARY.md)
+- [Implementation Guide](./SECURITY_HARDENING.md)
+- [Step-by-Step Checklist](./SECURITY_IMPLEMENTATION_CHECKLIST.md)
+- [nginx Config Reference](./frontend/NGINX_CONFIG.md)
+
+---
+
+**Version**: 1.0
+**Last Updated**: 2025-12-26
+**Next Review**: After P0 implementation