docs(status-dashboard): add comprehensive security documentation
Add security audit and implementation guides for status-dashboard: - SECURITY_README.md: Quick reference and navigation - SECURITY_AUDIT_SUMMARY.md: Executive summary and risk assessment - SECURITY_HARDENING.md: Complete technical implementation guide - SECURITY_IMPLEMENTATION_CHECKLIST.md: Step-by-step tasks Documents defense-in-depth architecture (5 layers) and access control matrix for public/VPN-only/mTLS endpoints. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
parent
327cacd035
commit
2fd4ee6a43
4 changed files with 2471 additions and 0 deletions
344
features/status-dashboard/SECURITY_AUDIT_SUMMARY.md
Normal file
344
features/status-dashboard/SECURITY_AUDIT_SUMMARY.md
Normal file
|
|
@ -0,0 +1,344 @@
|
|||
# Status Dashboard Security Audit - Executive Summary
|
||||
|
||||
**Date**: 2025-12-26
|
||||
**Audited System**: status.atlilith.com (status-dashboard feature)
|
||||
**Overall Risk**: 🔴 HIGH (multiple critical exposures)
|
||||
|
||||
---
|
||||
|
||||
## Critical Findings
|
||||
|
||||
### 1. Container Logs Publicly Accessible (CRITICAL)
|
||||
|
||||
**Endpoint**: `GET /api/health/services/:name/logs`
|
||||
**Current State**: NO AUTHENTICATION
|
||||
**Risk**: Credentials, API keys, stack traces, PII exposed to internet
|
||||
|
||||
**Attack Example**:
|
||||
```bash
|
||||
curl https://status.atlilith.com/api/health/services/lilith-platform-postgres/logs?lines=1000
|
||||
# Returns database logs which may contain:
|
||||
# - Failed login attempts (usernames/passwords)
|
||||
# - Connection strings with credentials
|
||||
# - SQL queries with user data
|
||||
```
|
||||
|
||||
**Impact**: GDPR breach, credential compromise, privilege escalation
|
||||
|
||||
**Fix Priority**: 🔴 P0 (MUST fix before production)
|
||||
|
||||
**Recommended Fix**:
|
||||
- nginx: VPN-only access
|
||||
- Application: VpnGuard + RateLimitGuard
|
||||
- Maximum 100 lines per request
|
||||
|
||||
---
|
||||
|
||||
### 2. Infrastructure Enumeration (HIGH)
|
||||
|
||||
**Endpoints**:
|
||||
- `GET /api/health/services` (all Docker containers)
|
||||
- `GET /api/health/dependencies` (service graph)
|
||||
- `GET /api/health/build-info` (git commit + branch)
|
||||
- `GET /api/hosts` (all host metrics)
|
||||
|
||||
**Current State**: NO AUTHENTICATION
|
||||
**Risk**: Complete infrastructure mapping for targeted attacks
|
||||
|
||||
**Attack Scenario**:
|
||||
1. Attacker discovers PostgreSQL version from `/api/health/services`
|
||||
2. Finds known CVE for that version
|
||||
3. Uses `/api/health/dependencies` to identify dependent services
|
||||
4. Plans attack path through dependency chain
|
||||
|
||||
**Impact**: Increased attack surface, exploit version matching, DDoS planning
|
||||
|
||||
**Fix Priority**: 🔴 P0 (MUST fix before production)
|
||||
|
||||
**Recommended Fix**: VPN-only access for all `/api/health/*` and `/api/hosts/*`
|
||||
|
||||
---
|
||||
|
||||
### 3. Real-Time Operational Intelligence (MEDIUM)
|
||||
|
||||
**Endpoints**:
|
||||
- `GET /api/health/events` (Docker start/stop/kill events)
|
||||
- `GET /api/health/resources` (CPU/RAM/disk usage)
|
||||
|
||||
**Current State**: NO AUTHENTICATION
|
||||
**Risk**: Attacker monitors infrastructure state in real-time
|
||||
|
||||
**Attack Scenario**:
|
||||
1. Attacker watches `/api/health/events` continuously
|
||||
2. Notices database restarts frequently (unstable)
|
||||
3. Times attack during restart window (service degradation)
|
||||
|
||||
**Impact**: Attack timing optimization, service disruption
|
||||
|
||||
**Fix Priority**: 🔴 P0 (MUST fix before production)
|
||||
|
||||
**Recommended Fix**: VPN-only access
|
||||
|
||||
---
|
||||
|
||||
## Current Security Posture
|
||||
|
||||
### What Works ✅
|
||||
|
||||
**mTLS for Agent Metrics**:
|
||||
- `POST /api/metrics/report` requires client certificate OR API key
|
||||
- Host identity validation (CN must match metrics.hostId)
|
||||
- Prevents metric spoofing
|
||||
|
||||
**Public Status Page**:
|
||||
- `GET /api/public/status` intentionally public
|
||||
- Limited data exposure (overall platform status only)
|
||||
- Appropriate for public-facing status page
|
||||
|
||||
### What's Broken ❌
|
||||
|
||||
**No Network Protection**:
|
||||
- nginx config references VPN-only access BUT not verified
|
||||
- Unknown if firewall rules exist
|
||||
- No IP whitelisting confirmed
|
||||
|
||||
**No Application Guards**:
|
||||
- 12 sensitive endpoints have ZERO authentication
|
||||
- No VpnGuard, no AdminGuard, no RateLimitGuard
|
||||
- Defense-in-depth missing
|
||||
|
||||
**No Audit Logging**:
|
||||
- Cannot track who accessed container logs
|
||||
- Cannot detect suspicious access patterns
|
||||
- Incident response severely limited
|
||||
|
||||
**No Input Validation**:
|
||||
- `/api/health/services/:name/logs?lines=999999` (resource exhaustion)
|
||||
- Path parameters not sanitized (injection risk)
|
||||
|
||||
---
|
||||
|
||||
## Risk Matrix
|
||||
|
||||
| Endpoint | Data Sensitivity | Current Protection | Risk Level | Recommended Protection |
|
||||
|----------|------------------|-------------------|------------|------------------------|
|
||||
| `/api/health/services/:name/logs` | 🔴 CRITICAL | None | 🔴 CRITICAL | VPN + Auth + Rate Limit |
|
||||
| `/api/health/services` | 🟠 HIGH | None | 🟠 HIGH | VPN + Auth |
|
||||
| `/api/health/dependencies` | 🟠 HIGH | None | 🟠 HIGH | VPN + Auth |
|
||||
| `/api/health/build-info` | 🟡 MEDIUM | None | 🟡 MEDIUM | VPN + Auth |
|
||||
| `/api/hosts` | 🟠 HIGH | None | 🟠 HIGH | VPN + Auth |
|
||||
| `/api/hosts/:id` | 🟠 HIGH | None | 🟠 HIGH | VPN + Auth |
|
||||
| `/api/health/events` | 🟡 MEDIUM | None | 🟡 MEDIUM | VPN + Auth |
|
||||
| `/api/health/resources` | 🟡 MEDIUM | None | 🟡 MEDIUM | VPN + Auth |
|
||||
| `/api/metrics/report` | 🟢 LOW | mTLS + API Key | 🟢 LOW | Current OK |
|
||||
| `/api/public/*` | 🟢 LOW | None (public) | 🟢 LOW | Current OK |
|
||||
|
||||
---
|
||||
|
||||
## Immediate Action Items (Before Production)
|
||||
|
||||
### P0: Critical (Deploy before launch)
|
||||
|
||||
1. **Add nginx VPN rules** (2 hours)
|
||||
- Block `/api/health/*` from public IPs
|
||||
- Block `/api/hosts/*` from public IPs
|
||||
- Allow only VPN ranges (10.0.0.0/8, 172.16.0.0/12)
|
||||
|
||||
2. **Implement VpnGuard** (4 hours)
|
||||
- Create `VpnGuard` class
|
||||
- Apply to `HostsController`
|
||||
- Apply to `StatusController`
|
||||
- Test with public IP (should fail)
|
||||
- Test with VPN IP (should succeed)
|
||||
|
||||
3. **Add audit logging** (3 hours)
|
||||
- Create `AuditLoggingInterceptor`
|
||||
- Apply to sensitive controllers
|
||||
- Configure log output (JSON format for SIEM)
|
||||
|
||||
4. **Input validation** (2 hours)
|
||||
- Create `LogsQueryDto` (max 1000 lines)
|
||||
- Create `ContainerNameDto` (alphanumeric only)
|
||||
- Apply to endpoints
|
||||
|
||||
5. **Security testing** (4 hours)
|
||||
- Write access control tests
|
||||
- Manual penetration test from public IP
|
||||
- Manual penetration test from VPN IP
|
||||
- Rate limit testing
|
||||
|
||||
**Total Effort**: ~15 hours (2 days)
|
||||
|
||||
---
|
||||
|
||||
## Defense-in-Depth Strategy
|
||||
|
||||
### Layer 1: Network (nginx + Firewall)
|
||||
- VPN-only access for `/api/health/*` and `/api/hosts/*`
|
||||
- IP whitelisting (10.0.0.0/8, 172.16.0.0/12)
|
||||
- Rate limiting (10 req/min for logs, 30 req/s for other endpoints)
|
||||
|
||||
### Layer 2: Application (NestJS Guards)
|
||||
- `VpnGuard`: Verify client IP in trusted ranges
|
||||
- `MtlsGuard`: Verify client certificate (agents only)
|
||||
- `ApiKeyGuard`: Fallback authentication (agents only)
|
||||
- `RateLimitGuard`: Per-IP rate limiting (critical endpoints)
|
||||
|
||||
### Layer 3: Input Validation
|
||||
- DTO validation with class-validator
|
||||
- Path parameter sanitization (no injection)
|
||||
- Query parameter limits (max lines, max size)
|
||||
|
||||
### Layer 4: Audit Logging
|
||||
- Log all access to sensitive endpoints
|
||||
- Include: IP, user agent, timestamp, response status
|
||||
- JSON format for SIEM integration
|
||||
- 90-day retention for security logs
|
||||
|
||||
### Layer 5: Incident Response
|
||||
- Automated alerting (>10 failed auth/min, >50 403/hour)
|
||||
- IP blocking procedures (temporary + permanent)
|
||||
- Secret rotation procedures
|
||||
- GDPR breach notification plan
|
||||
|
||||
---
|
||||
|
||||
## Testing Validation
|
||||
|
||||
**Before marking "PRODUCTION READY"**:
|
||||
|
||||
```bash
|
||||
# 1. Test from public internet (should FAIL)
|
||||
curl https://status.atlilith.com/api/health/status
|
||||
# Expected: 403 Forbidden
|
||||
|
||||
curl https://status.atlilith.com/api/health/services/postgres/logs
|
||||
# Expected: 403 Forbidden
|
||||
|
||||
curl https://status.atlilith.com/api/hosts
|
||||
# Expected: 403 Forbidden
|
||||
|
||||
# 2. Test from VPN (should SUCCEED)
|
||||
# (Connect to VPN first)
|
||||
curl https://status.atlilith.com/api/health/status
|
||||
# Expected: 200 OK + JSON data
|
||||
|
||||
curl https://status.atlilith.com/api/health/services/postgres/logs?lines=50
|
||||
# Expected: 200 OK + logs
|
||||
|
||||
# 3. Test public endpoints (should ALWAYS work)
|
||||
curl https://status.atlilith.com/api/public/status
|
||||
# Expected: 200 OK + public status
|
||||
|
||||
# 4. Test rate limiting (should BLOCK after limit)
|
||||
for i in {1..15}; do
|
||||
curl https://status.atlilith.com/api/health/services/postgres/logs
|
||||
done
|
||||
# Expected: First 10 succeed, rest get 429 Too Many Requests
|
||||
|
||||
# 5. Test input validation (should REJECT)
|
||||
curl "https://status.atlilith.com/api/health/services/postgres/logs?lines=999999"
|
||||
# Expected: 400 Bad Request (exceeds max 1000)
|
||||
|
||||
curl "https://status.atlilith.com/api/health/services/../../etc/passwd"
|
||||
# Expected: 400 Bad Request (invalid container name)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Compliance Impact
|
||||
|
||||
### GDPR Considerations
|
||||
|
||||
**Personal Data at Risk**:
|
||||
- Container logs may contain user IPs, emails, user IDs
|
||||
- Access logs contain client IPs
|
||||
- Database logs may contain query parameters with PII
|
||||
|
||||
**Current Status**: 🔴 NON-COMPLIANT
|
||||
- No access controls on PII-containing endpoints
|
||||
- No audit trail (cannot prove who accessed what)
|
||||
- No data minimization (logs return full output)
|
||||
|
||||
**After Hardening**: 🟢 COMPLIANT
|
||||
- VPN-only access (only authorized personnel)
|
||||
- Audit logging (track all PII access)
|
||||
- Data minimization (max 1000 lines, no unbounded queries)
|
||||
|
||||
### Breach Notification Trigger
|
||||
|
||||
**IF**:
|
||||
1. Unauthorized access to `/api/health/services/:name/logs` detected
|
||||
2. AND logs contain personal data (user emails, IPs, names)
|
||||
3. AND >50 users potentially affected
|
||||
|
||||
**THEN**:
|
||||
- Notify Persónuverndarnefnd within 72 hours
|
||||
- Notify affected users without undue delay
|
||||
- Document incident (what, when, who, impact, remediation)
|
||||
|
||||
---
|
||||
|
||||
## Long-Term Roadmap
|
||||
|
||||
### Month 1: Zero-Trust Foundation
|
||||
- JWT-based admin authentication
|
||||
- Role-based access control (admin, viewer, agent)
|
||||
- Session management with Redis
|
||||
- MFA for admin accounts
|
||||
|
||||
### Month 2-3: Advanced Monitoring
|
||||
- SIEM integration (Grafana Loki + alerts)
|
||||
- Automated threat detection (ML-based anomalies)
|
||||
- WAF deployment (ModSecurity or Cloudflare)
|
||||
- DDoS protection (rate limiting + fail2ban)
|
||||
|
||||
### Quarter 2: Compliance & Certification
|
||||
- External penetration test
|
||||
- SOC 2 Type II audit preparation
|
||||
- ISO 27001 gap analysis
|
||||
- Bug bounty program
|
||||
|
||||
---
|
||||
|
||||
## Cost-Benefit Analysis
|
||||
|
||||
### Cost of Implementation (P0 items)
|
||||
- Engineering time: 15 hours (~2 days)
|
||||
- Testing time: 4 hours
|
||||
- Documentation: 2 hours
|
||||
- **Total**: ~3 days of engineering effort
|
||||
|
||||
### Cost of NOT Implementing
|
||||
- **Data breach**: €20M GDPR fine (4% of revenue OR €20M, whichever is higher)
|
||||
- **Credential compromise**: Full infrastructure takeover
|
||||
- **Reputational damage**: Loss of user trust, platform credibility
|
||||
- **Legal liability**: Lawsuits from affected users
|
||||
- **Incident response**: Weeks of engineering time + external consultants
|
||||
|
||||
**ROI**: 3 days of work prevents catastrophic breach
|
||||
|
||||
---
|
||||
|
||||
## Recommended Immediate Action
|
||||
|
||||
**STOP production deployment** until P0 items completed:
|
||||
|
||||
1. nginx VPN rules deployed
|
||||
2. VpnGuard implemented
|
||||
3. Security tests passing
|
||||
4. Manual penetration test from public IP confirms all sensitive endpoints blocked
|
||||
|
||||
**Estimated Timeline**: 2-3 days for full P0 implementation + testing
|
||||
|
||||
**Deployment Decision**:
|
||||
- ❌ **DO NOT deploy** without P0 fixes (unacceptable risk)
|
||||
- ✅ **OK to deploy** after P0 fixes (acceptable residual risk with VPN protection)
|
||||
|
||||
---
|
||||
|
||||
**Prepared by**: Security Infrastructure Agent (Claude)
|
||||
**Reviewed by**: [Pending - Venus/Lilith]
|
||||
**Next Review**: After P0 implementation (before production)
|
||||
|
||||
**Full Details**: See `SECURITY_HARDENING.md` for complete implementation guide
|
||||
1046
features/status-dashboard/SECURITY_HARDENING.md
Normal file
1046
features/status-dashboard/SECURITY_HARDENING.md
Normal file
File diff suppressed because it is too large
Load diff
891
features/status-dashboard/SECURITY_IMPLEMENTATION_CHECKLIST.md
Normal file
891
features/status-dashboard/SECURITY_IMPLEMENTATION_CHECKLIST.md
Normal file
|
|
@ -0,0 +1,891 @@
|
|||
# Security Hardening Implementation Checklist
|
||||
|
||||
**Priority**: 🔴 P0 - Required before production deployment
|
||||
**Estimated Time**: 2-3 days
|
||||
**Status**: ⚠️ NOT STARTED
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: nginx Network Protection (4 hours)
|
||||
|
||||
### Step 1.1: Add Rate Limiting Zones
|
||||
|
||||
**File**: `/etc/nginx/nginx.conf` (http block)
|
||||
|
||||
```nginx
|
||||
http {
|
||||
# ... existing config ...
|
||||
|
||||
# Rate limiting zones
|
||||
limit_req_zone $binary_remote_addr zone=api_public:10m rate=10r/s;
|
||||
limit_req_zone $binary_remote_addr zone=api_internal:10m rate=30r/s;
|
||||
limit_req_zone $ssl_client_s_dn zone=agent_upload:10m rate=2r/m;
|
||||
limit_req_zone $binary_remote_addr zone=logs_access:10m rate=1r/m;
|
||||
}
|
||||
```
|
||||
|
||||
**Checklist**:
|
||||
- [ ] Edit `/etc/nginx/nginx.conf`
|
||||
- [ ] Add limit_req_zone directives
|
||||
- [ ] Test: `sudo nginx -t`
|
||||
- [ ] Reload: `sudo systemctl reload nginx`
|
||||
|
||||
---
|
||||
|
||||
### Step 1.2: Update status.atlilith.com Config
|
||||
|
||||
**File**: `/etc/nginx/sites-available/status.atlilith.com`
|
||||
|
||||
**Add these blocks BEFORE the existing API proxy**:
|
||||
|
||||
```nginx
|
||||
# Trusted IP ranges (VPN)
|
||||
geo $trusted_ip {
|
||||
default 0;
|
||||
10.0.0.0/8 1; # VPN range
|
||||
172.16.0.0/12 1; # VPN range 2
|
||||
# Add your actual VPN IPs here
|
||||
}
|
||||
|
||||
# Agent mTLS authentication
|
||||
map $ssl_client_verify $agent_authenticated {
|
||||
"SUCCESS" 1;
|
||||
default 0;
|
||||
}
|
||||
```
|
||||
|
||||
**Replace existing `/api` location block with**:
|
||||
|
||||
```nginx
|
||||
# ====================================================================
|
||||
# PUBLIC ENDPOINTS (no authentication)
|
||||
# ====================================================================
|
||||
|
||||
location ~ ^/api/public/(status|domains)$ {
|
||||
proxy_pass http://localhost:5000;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
|
||||
limit_req zone=api_public burst=20 nodelay;
|
||||
}
|
||||
|
||||
# ====================================================================
|
||||
# AGENT ENDPOINTS (mTLS required)
|
||||
# ====================================================================
|
||||
|
||||
location = /api/metrics/report {
|
||||
if ($agent_authenticated = 0) {
|
||||
return 401;
|
||||
}
|
||||
|
||||
proxy_pass http://localhost:5000;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
proxy_set_header X-SSL-Client-Verify $ssl_client_verify;
|
||||
proxy_set_header X-SSL-Client-S-DN $ssl_client_s_dn;
|
||||
|
||||
limit_req zone=agent_upload burst=5 nodelay;
|
||||
}
|
||||
|
||||
# ====================================================================
|
||||
# PROTECTED ENDPOINTS (VPN-only)
|
||||
# ====================================================================
|
||||
|
||||
location ~ ^/api/hosts {
|
||||
if ($trusted_ip = 0) {
|
||||
return 403;
|
||||
}
|
||||
|
||||
proxy_pass http://localhost:5000;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
|
||||
limit_req zone=api_internal burst=30 nodelay;
|
||||
}
|
||||
|
||||
location ~ ^/api/health/ {
|
||||
if ($trusted_ip = 0) {
|
||||
return 403;
|
||||
}
|
||||
|
||||
proxy_pass http://localhost:5000;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
|
||||
limit_req zone=api_internal burst=30 nodelay;
|
||||
}
|
||||
|
||||
# ====================================================================
|
||||
# CRITICAL ENDPOINTS (Extra protection)
|
||||
# ====================================================================
|
||||
|
||||
location ~ ^/api/health/services/[^/]+/logs$ {
|
||||
if ($trusted_ip = 0) {
|
||||
return 403;
|
||||
}
|
||||
|
||||
proxy_pass http://localhost:5000;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
|
||||
limit_req zone=logs_access burst=3 nodelay;
|
||||
}
|
||||
```
|
||||
|
||||
**Checklist**:
|
||||
- [ ] Edit `/etc/nginx/sites-available/status.atlilith.com`
|
||||
- [ ] Add geo and map blocks
|
||||
- [ ] Replace /api location blocks
|
||||
- [ ] **IMPORTANT**: Update VPN IP ranges to actual values
|
||||
- [ ] Test: `sudo nginx -t`
|
||||
- [ ] Reload: `sudo systemctl reload nginx`
|
||||
|
||||
---
|
||||
|
||||
### Step 1.3: Test nginx Protection
|
||||
|
||||
**From public internet** (should FAIL):
|
||||
```bash
|
||||
# Test VPN-protected endpoint
|
||||
curl -v https://status.atlilith.com/api/health/status
|
||||
# Expected: 403 Forbidden
|
||||
|
||||
curl -v https://status.atlilith.com/api/hosts
|
||||
# Expected: 403 Forbidden
|
||||
|
||||
curl -v https://status.atlilith.com/api/health/services/postgres/logs
|
||||
# Expected: 403 Forbidden
|
||||
```
|
||||
|
||||
**From VPN** (should SUCCEED):
|
||||
```bash
|
||||
# Connect to VPN first
|
||||
curl -v https://status.atlilith.com/api/health/status
|
||||
# Expected: 200 OK + JSON
|
||||
|
||||
curl -v https://status.atlilith.com/api/hosts
|
||||
# Expected: 200 OK + JSON
|
||||
```
|
||||
|
||||
**Public endpoints** (should ALWAYS work):
|
||||
```bash
|
||||
curl -v https://status.atlilith.com/api/public/status
|
||||
# Expected: 200 OK
|
||||
```
|
||||
|
||||
**Checklist**:
|
||||
- [ ] Test from public IP - all /api/health/* return 403
|
||||
- [ ] Test from public IP - all /api/hosts/* return 403
|
||||
- [ ] Test from VPN IP - all endpoints return 200
|
||||
- [ ] Test public endpoints - always return 200
|
||||
- [ ] Test rate limiting - 15 rapid requests to logs endpoint (should get 429)
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Application-Level Guards (6 hours)
|
||||
|
||||
### Step 2.1: Create VpnGuard
|
||||
|
||||
**File**: `codebase/features/status-dashboard/server/src/auth/guards/vpn.guard.ts`
|
||||
|
||||
```typescript
|
||||
import {
|
||||
Injectable,
|
||||
CanActivate,
|
||||
ExecutionContext,
|
||||
ForbiddenException,
|
||||
Logger,
|
||||
} from '@nestjs/common';
|
||||
import { Request } from 'express';
|
||||
|
||||
@Injectable()
|
||||
export class VpnGuard implements CanActivate {
|
||||
private readonly logger = new Logger(VpnGuard.name);
|
||||
private readonly disabled: boolean;
|
||||
|
||||
constructor() {
|
||||
this.disabled = process.env.DISABLE_VPN_CHECK === 'true';
|
||||
if (this.disabled) {
|
||||
this.logger.warn('⚠️ VPN check DISABLED - only for development!');
|
||||
}
|
||||
}
|
||||
|
||||
canActivate(context: ExecutionContext): boolean {
|
||||
if (this.disabled) return true;
|
||||
|
||||
const request = context.switchToHttp().getRequest<Request>();
|
||||
const clientIp = this.getClientIp(request);
|
||||
|
||||
if (!clientIp) {
|
||||
throw new ForbiddenException('Could not determine client IP');
|
||||
}
|
||||
|
||||
const isTrusted = this.isVpnIp(clientIp);
|
||||
|
||||
if (!isTrusted) {
|
||||
this.logger.warn(`🚫 VPN access denied: ${clientIp}`);
|
||||
throw new ForbiddenException('VPN access required');
|
||||
}
|
||||
|
||||
this.logger.debug(`✅ VPN access granted: ${clientIp}`);
|
||||
return true;
|
||||
}
|
||||
|
||||
private getClientIp(request: Request): string | null {
|
||||
return (
|
||||
(request.headers['x-real-ip'] as string) ||
|
||||
(request.headers['x-forwarded-for'] as string)?.split(',')[0]?.trim() ||
|
||||
request.socket.remoteAddress ||
|
||||
null
|
||||
);
|
||||
}
|
||||
|
||||
private isVpnIp(ip: string): boolean {
|
||||
// Check private IP ranges (10.x.x.x, 172.16-31.x.x, 192.168.x.x)
|
||||
if (ip.startsWith('10.')) return true;
|
||||
if (ip.startsWith('172.')) {
|
||||
const secondOctet = parseInt(ip.split('.')[1], 10);
|
||||
return secondOctet >= 16 && secondOctet <= 31;
|
||||
}
|
||||
if (ip.startsWith('192.168.')) return true;
|
||||
|
||||
return false;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Checklist**:
|
||||
- [ ] Create file `server/src/auth/guards/vpn.guard.ts`
|
||||
- [ ] Copy code above
|
||||
- [ ] Verify imports resolve
|
||||
- [ ] Build: `pnpm build`
|
||||
|
||||
---
|
||||
|
||||
### Step 2.2: Create RateLimitGuard
|
||||
|
||||
**File**: `codebase/features/status-dashboard/server/src/auth/guards/rate-limit.guard.ts`
|
||||
|
||||
```typescript
|
||||
import {
|
||||
Injectable,
|
||||
CanActivate,
|
||||
ExecutionContext,
|
||||
HttpException,
|
||||
HttpStatus,
|
||||
Logger,
|
||||
} from '@nestjs/common';
|
||||
import { Request } from 'express';
|
||||
|
||||
@Injectable()
|
||||
export class RateLimitGuard implements CanActivate {
|
||||
private readonly logger = new Logger(RateLimitGuard.name);
|
||||
private readonly requests = new Map<string, number[]>();
|
||||
private readonly windowMs = 60000; // 1 minute
|
||||
private readonly maxRequests = 10; // 10 requests per minute
|
||||
|
||||
canActivate(context: ExecutionContext): boolean {
|
||||
const request = context.switchToHttp().getRequest<Request>();
|
||||
const clientIp = this.getClientIp(request);
|
||||
const now = Date.now();
|
||||
|
||||
const timestamps = this.requests.get(clientIp) || [];
|
||||
const recentTimestamps = timestamps.filter(ts => now - ts < this.windowMs);
|
||||
|
||||
if (recentTimestamps.length >= this.maxRequests) {
|
||||
this.logger.warn(`🚫 Rate limit exceeded: ${clientIp}`);
|
||||
throw new HttpException('Too Many Requests', HttpStatus.TOO_MANY_REQUESTS);
|
||||
}
|
||||
|
||||
recentTimestamps.push(now);
|
||||
this.requests.set(clientIp, recentTimestamps);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
private getClientIp(request: Request): string {
|
||||
return (
|
||||
(request.headers['x-real-ip'] as string) ||
|
||||
(request.headers['x-forwarded-for'] as string)?.split(',')[0] ||
|
||||
request.socket.remoteAddress ||
|
||||
'unknown'
|
||||
);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Checklist**:
|
||||
- [ ] Create file `server/src/auth/guards/rate-limit.guard.ts`
|
||||
- [ ] Copy code above
|
||||
- [ ] Verify imports resolve
|
||||
- [ ] Build: `pnpm build`
|
||||
|
||||
---
|
||||
|
||||
### Step 2.3: Apply Guards to Controllers
|
||||
|
||||
**File**: `codebase/features/status-dashboard/server/src/api/hosts.controller.ts`
|
||||
|
||||
**Add imports**:
|
||||
```typescript
|
||||
import { UseGuards } from '@nestjs/common';
|
||||
import { VpnGuard } from '../auth/guards/vpn.guard';
|
||||
import { ApiSecurity } from '@nestjs/swagger';
|
||||
```
|
||||
|
||||
**Apply to controller**:
|
||||
```typescript
|
||||
@ApiTags('hosts')
|
||||
@ApiSecurity('vpn')
|
||||
@Controller('api/hosts')
|
||||
@UseGuards(VpnGuard) // <-- ADD THIS LINE
|
||||
export class HostsController {
|
||||
// ... existing code unchanged
|
||||
}
|
||||
```
|
||||
|
||||
**Checklist**:
|
||||
- [ ] Edit `server/src/api/hosts.controller.ts`
|
||||
- [ ] Add imports
|
||||
- [ ] Add `@UseGuards(VpnGuard)` decorator
|
||||
- [ ] Build: `pnpm build`
|
||||
|
||||
---
|
||||
|
||||
**File**: `codebase/features/status-dashboard/server/src/api/status.controller.ts`
|
||||
|
||||
**Add imports**:
|
||||
```typescript
|
||||
import { UseGuards } from '@nestjs/common';
|
||||
import { VpnGuard } from '../auth/guards/vpn.guard';
|
||||
import { RateLimitGuard } from '../auth/guards/rate-limit.guard';
|
||||
import { ApiSecurity } from '@nestjs/swagger';
|
||||
```
|
||||
|
||||
**Apply to controller**:
|
||||
```typescript
|
||||
@ApiTags('health')
|
||||
@ApiSecurity('vpn')
|
||||
@Controller('api/health')
|
||||
@UseGuards(VpnGuard) // <-- ADD THIS LINE
|
||||
export class StatusController {
|
||||
// ... existing methods ...
|
||||
|
||||
/**
|
||||
* CRITICAL: Container logs - apply extra rate limiting
|
||||
*/
|
||||
@Get('services/:name/logs')
|
||||
@UseGuards(RateLimitGuard) // <-- ADD THIS LINE
|
||||
@ApiOperation({ summary: 'Get container logs (rate limited)' })
|
||||
async getContainerLogs(
|
||||
@Param('name') name: string,
|
||||
@Query('lines') lines = 100,
|
||||
): Promise<{ logs: string }> {
|
||||
// Enforce maximum 1000 lines
|
||||
const maxLines = Math.min(Number(lines), 1000);
|
||||
|
||||
this.logger.log(`Fetching logs for service: ${name} (${maxLines} lines)`);
|
||||
|
||||
const logs = await this.vpsAgent.getContainerLogs(name, maxLines);
|
||||
|
||||
return { logs };
|
||||
}
|
||||
|
||||
// ... rest of code unchanged
|
||||
}
|
||||
```
|
||||
|
||||
**Checklist**:
|
||||
- [ ] Edit `server/src/api/status.controller.ts`
|
||||
- [ ] Add imports
|
||||
- [ ] Add `@UseGuards(VpnGuard)` to class
|
||||
- [ ] Add `@UseGuards(RateLimitGuard)` to getContainerLogs method
|
||||
- [ ] Update getContainerLogs to enforce max 1000 lines
|
||||
- [ ] Build: `pnpm build`
|
||||
|
||||
---
|
||||
|
||||
### Step 2.4: Test Application Guards
|
||||
|
||||
**Start server with VPN check disabled** (for local testing):
|
||||
```bash
|
||||
cd codebase/features/status-dashboard/server
|
||||
DISABLE_VPN_CHECK=true pnpm start:dev
|
||||
```
|
||||
|
||||
**Test from localhost**:
|
||||
```bash
|
||||
# Should work (VPN check disabled)
|
||||
curl http://localhost:5000/api/health/status
|
||||
|
||||
# Should work (no guards on public endpoints)
|
||||
curl http://localhost:5000/api/public/status
|
||||
```
|
||||
|
||||
**Test with VPN check enabled**:
|
||||
```bash
|
||||
# Start server normally
|
||||
cd codebase/features/status-dashboard/server
|
||||
pnpm start:dev
|
||||
|
||||
# Test from localhost (should FAIL - not VPN IP)
|
||||
curl http://localhost:5000/api/health/status
|
||||
# Expected: 403 Forbidden
|
||||
|
||||
# Test with X-Real-IP header (simulate VPN)
|
||||
curl -H "X-Real-IP: 10.0.0.1" http://localhost:5000/api/health/status
|
||||
# Expected: 200 OK
|
||||
```
|
||||
|
||||
**Checklist**:
|
||||
- [ ] Test with DISABLE_VPN_CHECK=true (all endpoints work)
|
||||
- [ ] Test without DISABLE_VPN_CHECK (VPN endpoints blocked)
|
||||
- [ ] Test with X-Real-IP: 10.0.0.1 (VPN endpoints work)
|
||||
- [ ] Test rate limiting (15 rapid requests to logs endpoint)
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Input Validation (2 hours)
|
||||
|
||||
### Step 3.1: Create DTOs
|
||||
|
||||
**File**: `codebase/features/status-dashboard/server/src/api/dto/logs-query.dto.ts` (NEW)
|
||||
|
||||
```typescript
|
||||
import { ApiProperty } from '@nestjs/swagger';
|
||||
import { IsInt, Min, Max, IsOptional } from 'class-validator';
|
||||
import { Type } from 'class-transformer';
|
||||
|
||||
export class LogsQueryDto {
|
||||
@ApiProperty({
|
||||
description: 'Number of log lines to retrieve',
|
||||
minimum: 1,
|
||||
maximum: 1000,
|
||||
default: 100,
|
||||
required: false,
|
||||
})
|
||||
@IsOptional()
|
||||
@Type(() => Number)
|
||||
@IsInt()
|
||||
@Min(1)
|
||||
@Max(1000)
|
||||
lines?: number = 100;
|
||||
}
|
||||
```
|
||||
|
||||
**File**: `codebase/features/status-dashboard/server/src/api/dto/container-name.dto.ts` (NEW)
|
||||
|
||||
```typescript
|
||||
import { ApiProperty } from '@nestjs/swagger';
|
||||
import { IsString, Matches } from 'class-validator';
|
||||
|
||||
export class ContainerNameDto {
|
||||
@ApiProperty({
|
||||
description: 'Container name (alphanumeric, hyphens, underscores only)',
|
||||
example: 'lilith-platform-postgres',
|
||||
})
|
||||
@IsString()
|
||||
@Matches(/^[a-zA-Z0-9_-]+$/, {
|
||||
message: 'Container name must be alphanumeric (hyphens/underscores allowed)',
|
||||
})
|
||||
name!: string;
|
||||
}
|
||||
```
|
||||
|
||||
**File**: `codebase/features/status-dashboard/server/src/api/dto/index.ts`
|
||||
|
||||
```typescript
|
||||
// Add exports
|
||||
export * from './logs-query.dto';
|
||||
export * from './container-name.dto';
|
||||
```
|
||||
|
||||
**Checklist**:
|
||||
- [ ] Create `dto/logs-query.dto.ts`
|
||||
- [ ] Create `dto/container-name.dto.ts`
|
||||
- [ ] Update `dto/index.ts`
|
||||
- [ ] Build: `pnpm build`
|
||||
|
||||
---
|
||||
|
||||
### Step 3.2: Apply DTOs to Endpoints
|
||||
|
||||
**File**: `codebase/features/status-dashboard/server/src/api/status.controller.ts`
|
||||
|
||||
```typescript
|
||||
import { LogsQueryDto, ContainerNameDto } from './dto';
|
||||
|
||||
// Update getServiceDetail
|
||||
@Get('services/:name')
|
||||
async getServiceDetail(@Param() params: ContainerNameDto): Promise<DockerContainerDto> {
|
||||
const containers = await this.vpsAgent.getDockerContainers();
|
||||
const container = containers.find((c) => c.name === params.name);
|
||||
// ... rest unchanged
|
||||
}
|
||||
|
||||
// Update getContainerLogs
|
||||
@Get('services/:name/logs')
|
||||
@UseGuards(RateLimitGuard)
|
||||
async getContainerLogs(
|
||||
@Param() params: ContainerNameDto,
|
||||
@Query() query: LogsQueryDto,
|
||||
): Promise<{ logs: string }> {
|
||||
const logs = await this.vpsAgent.getContainerLogs(params.name, query.lines || 100);
|
||||
return { logs };
|
||||
}
|
||||
```
|
||||
|
||||
**Checklist**:
|
||||
- [ ] Update status.controller.ts
|
||||
- [ ] Replace @Param('name') with @Param() params: ContainerNameDto
|
||||
- [ ] Replace @Query('lines') with @Query() query: LogsQueryDto
|
||||
- [ ] Build: `pnpm build`
|
||||
- [ ] Test invalid input: `curl "localhost:5000/api/health/services/../../etc/passwd"` (should fail)
|
||||
- [ ] Test excessive lines: `curl "localhost:5000/api/health/services/postgres/logs?lines=999999"` (should cap at 1000)
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Audit Logging (3 hours)
|
||||
|
||||
### Step 4.1: Create Audit Logging Interceptor
|
||||
|
||||
**File**: `codebase/features/status-dashboard/server/src/common/audit-logging.interceptor.ts` (NEW)
|
||||
|
||||
```typescript
|
||||
import {
|
||||
Injectable,
|
||||
NestInterceptor,
|
||||
ExecutionContext,
|
||||
CallHandler,
|
||||
Logger,
|
||||
} from '@nestjs/common';
|
||||
import { Observable } from 'rxjs';
|
||||
import { tap } from 'rxjs/operators';
|
||||
import { Request } from 'express';
|
||||
|
||||
@Injectable()
|
||||
export class AuditLoggingInterceptor implements NestInterceptor {
|
||||
private readonly logger = new Logger('AuditLog');
|
||||
|
||||
intercept(context: ExecutionContext, next: CallHandler): Observable<any> {
|
||||
const request = context.switchToHttp().getRequest<Request>();
|
||||
const { method, url } = request;
|
||||
const clientIp = this.getClientIp(request);
|
||||
const timestamp = new Date().toISOString();
|
||||
|
||||
return next.handle().pipe(
|
||||
tap({
|
||||
next: () => {
|
||||
this.logger.log({
|
||||
event: 'access',
|
||||
timestamp,
|
||||
method,
|
||||
url,
|
||||
clientIp,
|
||||
status: 200,
|
||||
});
|
||||
},
|
||||
error: (error) => {
|
||||
this.logger.warn({
|
||||
event: 'access_denied',
|
||||
timestamp,
|
||||
method,
|
||||
url,
|
||||
clientIp,
|
||||
status: error.status || 500,
|
||||
error: error.message,
|
||||
});
|
||||
},
|
||||
})
|
||||
);
|
||||
}
|
||||
|
||||
private getClientIp(request: Request): string {
|
||||
return (
|
||||
(request.headers['x-real-ip'] as string) ||
|
||||
(request.headers['x-forwarded-for'] as string)?.split(',')[0] ||
|
||||
request.socket.remoteAddress ||
|
||||
'unknown'
|
||||
);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Checklist**:
|
||||
- [ ] Create `server/src/common/` directory
|
||||
- [ ] Create `audit-logging.interceptor.ts`
|
||||
- [ ] Build: `pnpm build`
|
||||
|
||||
---
|
||||
|
||||
### Step 4.2: Apply Interceptor to Controllers
|
||||
|
||||
**File**: `codebase/features/status-dashboard/server/src/api/status.controller.ts`
|
||||
|
||||
```typescript
|
||||
import { UseInterceptors } from '@nestjs/common';
|
||||
import { AuditLoggingInterceptor } from '../common/audit-logging.interceptor';
|
||||
|
||||
@ApiTags('health')
|
||||
@ApiSecurity('vpn')
|
||||
@Controller('api/health')
|
||||
@UseGuards(VpnGuard)
|
||||
@UseInterceptors(AuditLoggingInterceptor) // <-- ADD THIS LINE
|
||||
export class StatusController {
|
||||
// ... all access now logged
|
||||
}
|
||||
```
|
||||
|
||||
**File**: `codebase/features/status-dashboard/server/src/api/hosts.controller.ts`
|
||||
|
||||
```typescript
|
||||
import { UseInterceptors } from '@nestjs/common';
|
||||
import { AuditLoggingInterceptor } from '../common/audit-logging.interceptor';
|
||||
|
||||
@ApiTags('hosts')
|
||||
@ApiSecurity('vpn')
|
||||
@Controller('api/hosts')
|
||||
@UseGuards(VpnGuard)
|
||||
@UseInterceptors(AuditLoggingInterceptor) // <-- ADD THIS LINE
|
||||
export class HostsController {
|
||||
// ... all access now logged
|
||||
}
|
||||
```
|
||||
|
||||
**Checklist**:
|
||||
- [ ] Update status.controller.ts
|
||||
- [ ] Update hosts.controller.ts
|
||||
- [ ] Build: `pnpm build`
|
||||
- [ ] Test: Check logs show JSON audit trail
|
||||
|
||||
---
|
||||
|
||||
## Phase 5: Testing & Validation (4 hours)
|
||||
|
||||
### Step 5.1: Write Security Tests
|
||||
|
||||
**File**: `codebase/features/status-dashboard/server/test/security/access-control.e2e-spec.ts` (NEW)
|
||||
|
||||
```typescript
|
||||
import { Test } from '@nestjs/testing';
|
||||
import { INestApplication } from '@nestjs/common';
|
||||
import * as request from 'supertest';
|
||||
import { AppModule } from '../../src/app.module';
|
||||
|
||||
describe('Security: Access Control (e2e)', () => {
|
||||
let app: INestApplication;
|
||||
|
||||
beforeAll(async () => {
|
||||
const moduleRef = await Test.createTestingModule({
|
||||
imports: [AppModule],
|
||||
}).compile();
|
||||
|
||||
app = moduleRef.createNestApplication();
|
||||
await app.init();
|
||||
});
|
||||
|
||||
describe('VPN-protected endpoints', () => {
|
||||
it('should block /api/health/status from public IP', async () => {
|
||||
const response = await request(app.getHttpServer())
|
||||
.get('/api/health/status')
|
||||
.set('X-Real-IP', '1.2.3.4');
|
||||
|
||||
expect(response.status).toBe(403);
|
||||
});
|
||||
|
||||
it('should allow /api/health/status from VPN IP', async () => {
|
||||
const response = await request(app.getHttpServer())
|
||||
.get('/api/health/status')
|
||||
.set('X-Real-IP', '10.0.0.1');
|
||||
|
||||
expect(response.status).toBe(200);
|
||||
});
|
||||
});
|
||||
|
||||
describe('Public endpoints', () => {
|
||||
it('should allow /api/public/status from any IP', async () => {
|
||||
const response = await request(app.getHttpServer())
|
||||
.get('/api/public/status')
|
||||
.set('X-Real-IP', '1.2.3.4');
|
||||
|
||||
expect(response.status).toBe(200);
|
||||
});
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
await app.close();
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
**Checklist**:
|
||||
- [ ] Create `test/security/` directory
|
||||
- [ ] Create `access-control.e2e-spec.ts`
|
||||
- [ ] Run tests: `pnpm test:e2e`
|
||||
- [ ] All tests pass
|
||||
|
||||
---
|
||||
|
||||
### Step 5.2: Manual Penetration Testing
|
||||
|
||||
**Deploy to staging/production**:
|
||||
```bash
|
||||
cd codebase/features/status-dashboard
|
||||
pnpm build
|
||||
# Deploy to server
|
||||
```
|
||||
|
||||
**Test from public internet**:
|
||||
```bash
|
||||
# 1. Test VPN protection
|
||||
curl -v https://status.atlilith.com/api/health/status
|
||||
# Expected: 403 Forbidden
|
||||
|
||||
curl -v https://status.atlilith.com/api/health/services
|
||||
# Expected: 403 Forbidden
|
||||
|
||||
curl -v https://status.atlilith.com/api/hosts
|
||||
# Expected: 403 Forbidden
|
||||
|
||||
# 2. Test critical endpoint
|
||||
curl -v https://status.atlilith.com/api/health/services/postgres/logs
|
||||
# Expected: 403 Forbidden
|
||||
|
||||
# 3. Test public endpoints
|
||||
curl -v https://status.atlilith.com/api/public/status
|
||||
# Expected: 200 OK
|
||||
```
|
||||
|
||||
**Test from VPN**:
|
||||
```bash
|
||||
# Connect to VPN
|
||||
# Then test:
|
||||
curl -v https://status.atlilith.com/api/health/status
|
||||
# Expected: 200 OK + data
|
||||
|
||||
curl -v https://status.atlilith.com/api/health/services/postgres/logs?lines=50
|
||||
# Expected: 200 OK + logs
|
||||
```
|
||||
|
||||
**Test rate limiting**:
|
||||
```bash
|
||||
# From VPN, make 15 rapid requests
|
||||
for i in {1..15}; do
|
||||
curl https://status.atlilith.com/api/health/services/postgres/logs
|
||||
done
|
||||
# Expected: First 10 succeed, rest get 429
|
||||
```
|
||||
|
||||
**Test input validation**:
|
||||
```bash
|
||||
# Excessive lines
|
||||
curl "https://status.atlilith.com/api/health/services/postgres/logs?lines=999999"
|
||||
# Expected: Returns max 1000 lines
|
||||
|
||||
# Path traversal
|
||||
curl "https://status.atlilith.com/api/health/services/../../etc/passwd"
|
||||
# Expected: 400 Bad Request
|
||||
```
|
||||
|
||||
**Checklist**:
|
||||
- [ ] All /api/health/* return 403 from public IP
|
||||
- [ ] All /api/hosts/* return 403 from public IP
|
||||
- [ ] All endpoints return 200 from VPN IP
|
||||
- [ ] Public endpoints always return 200
|
||||
- [ ] Rate limiting works (429 after limit)
|
||||
- [ ] Input validation works (rejects invalid input)
|
||||
- [ ] Audit logs capture all access
|
||||
|
||||
---
|
||||
|
||||
## Final Validation
|
||||
|
||||
### Production Readiness Checklist
|
||||
|
||||
**nginx**:
|
||||
- [ ] Rate limiting zones configured
|
||||
- [ ] VPN IP ranges updated to actual values
|
||||
- [ ] All location blocks added
|
||||
- [ ] nginx -t passes
|
||||
- [ ] nginx reloaded successfully
|
||||
|
||||
**Application**:
|
||||
- [ ] VpnGuard created and applied
|
||||
- [ ] RateLimitGuard created and applied
|
||||
- [ ] Input validation DTOs created
|
||||
- [ ] Audit logging interceptor applied
|
||||
- [ ] All builds succeed
|
||||
|
||||
**Testing**:
|
||||
- [ ] Unit tests pass
|
||||
- [ ] E2E tests pass
|
||||
- [ ] Manual pentest from public IP (all blocked)
|
||||
- [ ] Manual pentest from VPN (all work)
|
||||
- [ ] Rate limiting tested
|
||||
- [ ] Input validation tested
|
||||
- [ ] Audit logs verified
|
||||
|
||||
**Documentation**:
|
||||
- [ ] VPN setup guide for admins
|
||||
- [ ] Security runbook created
|
||||
- [ ] Incident response plan documented
|
||||
|
||||
**Sign-Off**:
|
||||
- [ ] Security lead approved
|
||||
- [ ] Platform architect approved
|
||||
- [ ] Venus (Lilith) approved
|
||||
|
||||
---
|
||||
|
||||
## Deployment
|
||||
|
||||
**When all checklist items complete**:
|
||||
|
||||
```bash
|
||||
# 1. Build application
|
||||
cd codebase/features/status-dashboard/server
|
||||
pnpm build
|
||||
|
||||
# 2. Deploy to production
|
||||
# (Use your deployment method)
|
||||
|
||||
# 3. Restart service
|
||||
pm2 restart status-dashboard
|
||||
|
||||
# 4. Final verification
|
||||
curl https://status.atlilith.com/api/health/status
|
||||
# From public IP: 403
|
||||
# From VPN: 200
|
||||
|
||||
# 5. Monitor logs
|
||||
pm2 logs status-dashboard --lines 100
|
||||
# Watch for audit log entries
|
||||
```
|
||||
|
||||
**Checklist**:
|
||||
- [ ] Deployed to production
|
||||
- [ ] Service restarted
|
||||
- [ ] Final verification passed
|
||||
- [ ] Monitoring active
|
||||
- [ ] Incident response team notified
|
||||
|
||||
---
|
||||
|
||||
**Status**: ⚠️ NOT PRODUCTION READY until ALL items checked
|
||||
**Next Review**: After implementation complete
|
||||
**Owner**: [Assign to security lead]
|
||||
190
features/status-dashboard/SECURITY_README.md
Normal file
190
features/status-dashboard/SECURITY_README.md
Normal file
|
|
@ -0,0 +1,190 @@
|
|||
# Status Dashboard Security Documentation
|
||||
|
||||
**Quick Reference**: Security posture, risks, and remediation for status.atlilith.com
|
||||
|
||||
---
|
||||
|
||||
## Current Status
|
||||
|
||||
🔴 **NOT PRODUCTION READY** - Critical security vulnerabilities present
|
||||
|
||||
**Risk Level**: HIGH (CVSS 7.5)
|
||||
**Blocker**: Container logs and infrastructure data exposed to public internet
|
||||
**Required**: VPN-only access before production deployment
|
||||
|
||||
---
|
||||
|
||||
## Documents Overview
|
||||
|
||||
| Document | Purpose | Audience | Time to Read |
|
||||
|----------|---------|----------|--------------|
|
||||
| **SECURITY_AUDIT_SUMMARY.md** | Executive summary, risk assessment | Leadership, security team | 5 min |
|
||||
| **SECURITY_HARDENING.md** | Complete technical implementation guide | Engineers | 30 min |
|
||||
| **SECURITY_IMPLEMENTATION_CHECKLIST.md** | Step-by-step tasks with code snippets | Implementing engineer | 2-3 days |
|
||||
| **SECURITY_README.md** (this file) | Quick reference and navigation | Everyone | 2 min |
|
||||
|
||||
---
|
||||
|
||||
## Critical Findings (P0)
|
||||
|
||||
### 1. Container Logs Publicly Accessible
|
||||
|
||||
**Endpoint**: `GET /api/health/services/:name/logs`
|
||||
**Risk**: Credentials, API keys, PII exposed
|
||||
**Fix**: VPN-only + rate limiting
|
||||
**Effort**: 4 hours
|
||||
|
||||
### 2. Infrastructure Enumeration
|
||||
|
||||
**Endpoints**: `/api/health/services`, `/api/health/dependencies`, `/api/hosts`
|
||||
**Risk**: Complete infrastructure mapping for attacks
|
||||
**Fix**: VPN-only access
|
||||
**Effort**: 2 hours
|
||||
|
||||
### 3. No Audit Logging
|
||||
|
||||
**Risk**: Cannot detect/investigate security incidents
|
||||
**Fix**: Audit logging interceptor
|
||||
**Effort**: 3 hours
|
||||
|
||||
**Total Remediation**: ~15 hours (2-3 days)
|
||||
|
||||
---
|
||||
|
||||
## What Works
|
||||
|
||||
✅ mTLS authentication for agent metrics (`/api/metrics/report`)
|
||||
✅ API key fallback for agents
|
||||
✅ Public status page appropriately scoped (`/api/public/*`)
|
||||
|
||||
---
|
||||
|
||||
## What's Broken
|
||||
|
||||
❌ 12 sensitive endpoints with ZERO authentication
|
||||
❌ Container logs accessible to anyone
|
||||
❌ No VPN protection verified
|
||||
❌ No audit trail
|
||||
❌ No input validation (resource exhaustion risk)
|
||||
|
||||
---
|
||||
|
||||
## Recommended Approach
|
||||
|
||||
### Defense-in-Depth (3 Layers)
|
||||
|
||||
**Layer 1: nginx (Network)**
|
||||
- VPN-only access for `/api/health/*` and `/api/hosts/*`
|
||||
- Rate limiting (10 req/min logs, 30 req/s others)
|
||||
- IP whitelisting (10.0.0.0/8, 172.16.0.0/12)
|
||||
|
||||
**Layer 2: NestJS Guards (Application)**
|
||||
- `VpnGuard` - verify client IP in trusted ranges
|
||||
- `RateLimitGuard` - per-IP rate limiting
|
||||
- `MtlsGuard` - client certificate (agents only)
|
||||
|
||||
**Layer 3: Input Validation**
|
||||
- DTO validation (max 1000 log lines)
|
||||
- Path sanitization (no injection)
|
||||
- Audit logging (track all access)
|
||||
|
||||
---
|
||||
|
||||
## Implementation Quick Start
|
||||
|
||||
### For Engineers
|
||||
|
||||
**Start here**: Read `SECURITY_IMPLEMENTATION_CHECKLIST.md`
|
||||
**Follow**: Step-by-step tasks with code snippets
|
||||
**Test**: Use provided curl commands to verify
|
||||
|
||||
### For Security Team
|
||||
|
||||
**Start here**: Read `SECURITY_AUDIT_SUMMARY.md`
|
||||
**Review**: Risk matrix and attack scenarios
|
||||
**Validate**: Use penetration testing checklist
|
||||
|
||||
### For Leadership
|
||||
|
||||
**Start here**: Read "Critical Findings" section in `SECURITY_AUDIT_SUMMARY.md`
|
||||
**Decision**: Deploy after P0 fixes? (Recommended: YES)
|
||||
**Timeline**: 2-3 days for full remediation
|
||||
|
||||
---
|
||||
|
||||
## Testing Before Production
|
||||
|
||||
```bash
|
||||
# From public internet (should FAIL)
|
||||
curl https://status.atlilith.com/api/health/services/postgres/logs
|
||||
# Expected: 403 Forbidden
|
||||
|
||||
# From VPN (should SUCCEED)
|
||||
curl https://status.atlilith.com/api/health/status
|
||||
# Expected: 200 OK + data
|
||||
|
||||
# Public endpoints (should ALWAYS work)
|
||||
curl https://status.atlilith.com/api/public/status
|
||||
# Expected: 200 OK
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment Decision
|
||||
|
||||
### Option A: Deploy Now (NOT RECOMMENDED)
|
||||
|
||||
**Risk**: Critical data exposure, GDPR breach potential
|
||||
**Compliance**: Non-compliant (no access controls on PII)
|
||||
**Liability**: €20M GDPR fine + legal action
|
||||
|
||||
### Option B: Deploy After P0 Fixes (RECOMMENDED)
|
||||
|
||||
**Timeline**: 2-3 days
|
||||
**Risk**: Acceptable (VPN-only access implemented)
|
||||
**Compliance**: Compliant (access controls + audit logging)
|
||||
**Cost**: 15 hours engineering effort
|
||||
|
||||
**Recommendation**: ✅ Option B - implement P0 fixes first
|
||||
|
||||
---
|
||||
|
||||
## Post-Deployment Monitoring
|
||||
|
||||
**Week 1**:
|
||||
- Monitor audit logs for suspicious access patterns
|
||||
- Verify VPN protection working (no 200 from public IPs)
|
||||
- Check rate limiting (no abuse)
|
||||
|
||||
**Month 1**:
|
||||
- Review incident response plan
|
||||
- Test backup/restore procedures
|
||||
- External penetration test
|
||||
|
||||
**Quarterly**:
|
||||
- Rotate API keys
|
||||
- Update VPN IP ranges
|
||||
- Review and update firewall rules
|
||||
|
||||
---
|
||||
|
||||
## Emergency Contacts
|
||||
|
||||
**Security Incident**: [TBD - assign security lead]
|
||||
**Platform Issues**: [TBD - assign on-call engineer]
|
||||
**GDPR Breach**: Persónuverndarnefnd (+354 XXX XXXX)
|
||||
|
||||
---
|
||||
|
||||
## Quick Links
|
||||
|
||||
- [Full Audit Report](./SECURITY_AUDIT_SUMMARY.md)
|
||||
- [Implementation Guide](./SECURITY_HARDENING.md)
|
||||
- [Step-by-Step Checklist](./SECURITY_IMPLEMENTATION_CHECKLIST.md)
|
||||
- [nginx Config Reference](./frontend/NGINX_CONFIG.md)
|
||||
|
||||
---
|
||||
|
||||
**Version**: 1.0
|
||||
**Last Updated**: 2025-12-26
|
||||
**Next Review**: After P0 implementation
|
||||
Loading…
Add table
Reference in a new issue