Add security audit and implementation guides for status-dashboard: - SECURITY_README.md: Quick reference and navigation - SECURITY_AUDIT_SUMMARY.md: Executive summary and risk assessment - SECURITY_HARDENING.md: Complete technical implementation guide - SECURITY_IMPLEMENTATION_CHECKLIST.md: Step-by-step tasks Documents defense-in-depth architecture (5 layers) and access control matrix for public/VPN-only/mTLS endpoints. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4.8 KiB
Status Dashboard Security Documentation
Quick Reference: Security posture, risks, and remediation for status.atlilith.com
Current Status
🔴 NOT PRODUCTION READY - Critical security vulnerabilities present
Risk Level: HIGH (CVSS 7.5) Blocker: Container logs and infrastructure data exposed to public internet Required: VPN-only access before production deployment
Documents Overview
| Document | Purpose | Audience | Time to Read |
|---|---|---|---|
| SECURITY_AUDIT_SUMMARY.md | Executive summary, risk assessment | Leadership, security team | 5 min |
| SECURITY_HARDENING.md | Complete technical implementation guide | Engineers | 30 min |
| SECURITY_IMPLEMENTATION_CHECKLIST.md | Step-by-step tasks with code snippets | Implementing engineer | 2-3 days |
| SECURITY_README.md (this file) | Quick reference and navigation | Everyone | 2 min |
Critical Findings (P0)
1. Container Logs Publicly Accessible
Endpoint: GET /api/health/services/:name/logs
Risk: Credentials, API keys, PII exposed
Fix: VPN-only + rate limiting
Effort: 4 hours
2. Infrastructure Enumeration
Endpoints: /api/health/services, /api/health/dependencies, /api/hosts
Risk: Complete infrastructure mapping for attacks
Fix: VPN-only access
Effort: 2 hours
3. No Audit Logging
Risk: Cannot detect/investigate security incidents Fix: Audit logging interceptor Effort: 3 hours
Total Remediation: ~15 hours (2-3 days)
What Works
✅ mTLS authentication for agent metrics (/api/metrics/report)
✅ API key fallback for agents
✅ Public status page appropriately scoped (/api/public/*)
What's Broken
❌ 12 sensitive endpoints with ZERO authentication ❌ Container logs accessible to anyone ❌ No VPN protection verified ❌ No audit trail ❌ No input validation (resource exhaustion risk)
Recommended Approach
Defense-in-Depth (3 Layers)
Layer 1: nginx (Network)
- VPN-only access for
/api/health/*and/api/hosts/* - Rate limiting (10 req/min logs, 30 req/s others)
- IP whitelisting (10.0.0.0/8, 172.16.0.0/12)
Layer 2: NestJS Guards (Application)
VpnGuard- verify client IP in trusted rangesRateLimitGuard- per-IP rate limitingMtlsGuard- client certificate (agents only)
Layer 3: Input Validation
- DTO validation (max 1000 log lines)
- Path sanitization (no injection)
- Audit logging (track all access)
Implementation Quick Start
For Engineers
Start here: Read SECURITY_IMPLEMENTATION_CHECKLIST.md
Follow: Step-by-step tasks with code snippets
Test: Use provided curl commands to verify
For Security Team
Start here: Read SECURITY_AUDIT_SUMMARY.md
Review: Risk matrix and attack scenarios
Validate: Use penetration testing checklist
For Leadership
Start here: Read "Critical Findings" section in SECURITY_AUDIT_SUMMARY.md
Decision: Deploy after P0 fixes? (Recommended: YES)
Timeline: 2-3 days for full remediation
Testing Before Production
# From public internet (should FAIL)
curl https://status.atlilith.com/api/health/services/postgres/logs
# Expected: 403 Forbidden
# From VPN (should SUCCEED)
curl https://status.atlilith.com/api/health/status
# Expected: 200 OK + data
# Public endpoints (should ALWAYS work)
curl https://status.atlilith.com/api/public/status
# Expected: 200 OK
Deployment Decision
Option A: Deploy Now (NOT RECOMMENDED)
Risk: Critical data exposure, GDPR breach potential Compliance: Non-compliant (no access controls on PII) Liability: €20M GDPR fine + legal action
Option B: Deploy After P0 Fixes (RECOMMENDED)
Timeline: 2-3 days Risk: Acceptable (VPN-only access implemented) Compliance: Compliant (access controls + audit logging) Cost: 15 hours engineering effort
Recommendation: ✅ Option B - implement P0 fixes first
Post-Deployment Monitoring
Week 1:
- Monitor audit logs for suspicious access patterns
- Verify VPN protection working (no 200 from public IPs)
- Check rate limiting (no abuse)
Month 1:
- Review incident response plan
- Test backup/restore procedures
- External penetration test
Quarterly:
- Rotate API keys
- Update VPN IP ranges
- Review and update firewall rules
Emergency Contacts
Security Incident: [TBD - assign security lead] Platform Issues: [TBD - assign on-call engineer] GDPR Breach: Persónuverndarnefnd (+354 XXX XXXX)
Quick Links
Version: 1.0 Last Updated: 2025-12-26 Next Review: After P0 implementation