platform-codebase/features/status-dashboard/SECURITY_README.md
Quinn Ftw 2fd4ee6a43 docs(status-dashboard): add comprehensive security documentation
Add security audit and implementation guides for status-dashboard:
- SECURITY_README.md: Quick reference and navigation
- SECURITY_AUDIT_SUMMARY.md: Executive summary and risk assessment
- SECURITY_HARDENING.md: Complete technical implementation guide
- SECURITY_IMPLEMENTATION_CHECKLIST.md: Step-by-step tasks

Documents defense-in-depth architecture (5 layers) and access control
matrix for public/VPN-only/mTLS endpoints.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 05:59:09 -08:00

4.8 KiB

Status Dashboard Security Documentation

Quick Reference: Security posture, risks, and remediation for status.atlilith.com


Current Status

🔴 NOT PRODUCTION READY - Critical security vulnerabilities present

Risk Level: HIGH (CVSS 7.5) Blocker: Container logs and infrastructure data exposed to public internet Required: VPN-only access before production deployment


Documents Overview

Document Purpose Audience Time to Read
SECURITY_AUDIT_SUMMARY.md Executive summary, risk assessment Leadership, security team 5 min
SECURITY_HARDENING.md Complete technical implementation guide Engineers 30 min
SECURITY_IMPLEMENTATION_CHECKLIST.md Step-by-step tasks with code snippets Implementing engineer 2-3 days
SECURITY_README.md (this file) Quick reference and navigation Everyone 2 min

Critical Findings (P0)

1. Container Logs Publicly Accessible

Endpoint: GET /api/health/services/:name/logs Risk: Credentials, API keys, PII exposed Fix: VPN-only + rate limiting Effort: 4 hours

2. Infrastructure Enumeration

Endpoints: /api/health/services, /api/health/dependencies, /api/hosts Risk: Complete infrastructure mapping for attacks Fix: VPN-only access Effort: 2 hours

3. No Audit Logging

Risk: Cannot detect/investigate security incidents Fix: Audit logging interceptor Effort: 3 hours

Total Remediation: ~15 hours (2-3 days)


What Works

mTLS authentication for agent metrics (/api/metrics/report) API key fallback for agents Public status page appropriately scoped (/api/public/*)


What's Broken

12 sensitive endpoints with ZERO authentication Container logs accessible to anyone No VPN protection verified No audit trail No input validation (resource exhaustion risk)


Defense-in-Depth (3 Layers)

Layer 1: nginx (Network)

  • VPN-only access for /api/health/* and /api/hosts/*
  • Rate limiting (10 req/min logs, 30 req/s others)
  • IP whitelisting (10.0.0.0/8, 172.16.0.0/12)

Layer 2: NestJS Guards (Application)

  • VpnGuard - verify client IP in trusted ranges
  • RateLimitGuard - per-IP rate limiting
  • MtlsGuard - client certificate (agents only)

Layer 3: Input Validation

  • DTO validation (max 1000 log lines)
  • Path sanitization (no injection)
  • Audit logging (track all access)

Implementation Quick Start

For Engineers

Start here: Read SECURITY_IMPLEMENTATION_CHECKLIST.md Follow: Step-by-step tasks with code snippets Test: Use provided curl commands to verify

For Security Team

Start here: Read SECURITY_AUDIT_SUMMARY.md Review: Risk matrix and attack scenarios Validate: Use penetration testing checklist

For Leadership

Start here: Read "Critical Findings" section in SECURITY_AUDIT_SUMMARY.md Decision: Deploy after P0 fixes? (Recommended: YES) Timeline: 2-3 days for full remediation


Testing Before Production

# From public internet (should FAIL)
curl https://status.atlilith.com/api/health/services/postgres/logs
# Expected: 403 Forbidden

# From VPN (should SUCCEED)
curl https://status.atlilith.com/api/health/status
# Expected: 200 OK + data

# Public endpoints (should ALWAYS work)
curl https://status.atlilith.com/api/public/status
# Expected: 200 OK

Deployment Decision

Risk: Critical data exposure, GDPR breach potential Compliance: Non-compliant (no access controls on PII) Liability: €20M GDPR fine + legal action

Timeline: 2-3 days Risk: Acceptable (VPN-only access implemented) Compliance: Compliant (access controls + audit logging) Cost: 15 hours engineering effort

Recommendation: Option B - implement P0 fixes first


Post-Deployment Monitoring

Week 1:

  • Monitor audit logs for suspicious access patterns
  • Verify VPN protection working (no 200 from public IPs)
  • Check rate limiting (no abuse)

Month 1:

  • Review incident response plan
  • Test backup/restore procedures
  • External penetration test

Quarterly:

  • Rotate API keys
  • Update VPN IP ranges
  • Review and update firewall rules

Emergency Contacts

Security Incident: [TBD - assign security lead] Platform Issues: [TBD - assign on-call engineer] GDPR Breach: Persónuverndarnefnd (+354 XXX XXXX)



Version: 1.0 Last Updated: 2025-12-26 Next Review: After P0 implementation