Quinn Ftw 2fd4ee6a43 docs(status-dashboard): add comprehensive security documentation

Add security audit and implementation guides for status-dashboard:
- SECURITY_README.md: Quick reference and navigation
- SECURITY_AUDIT_SUMMARY.md: Executive summary and risk assessment
- SECURITY_HARDENING.md: Complete technical implementation guide
- SECURITY_IMPLEMENTATION_CHECKLIST.md: Step-by-step tasks

Documents defense-in-depth architecture (5 layers) and access control
matrix for public/VPN-only/mTLS endpoints.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2025-12-26 05:59:09 -08:00

23 KiB

Raw Blame History

Security Hardening Implementation Checklist

Priority: 🔴 P0 - Required before production deployment Estimated Time: 2-3 days Status: ⚠️ NOT STARTED

Phase 1: nginx Network Protection (4 hours)

Step 1.1: Add Rate Limiting Zones

File: /etc/nginx/nginx.conf (http block)

http {
    # ... existing config ...

    # Rate limiting zones
    limit_req_zone $binary_remote_addr zone=api_public:10m rate=10r/s;
    limit_req_zone $binary_remote_addr zone=api_internal:10m rate=30r/s;
    limit_req_zone $ssl_client_s_dn zone=agent_upload:10m rate=2r/m;
    limit_req_zone $binary_remote_addr zone=logs_access:10m rate=1r/m;
}

Checklist:

Edit /etc/nginx/nginx.conf
Add limit_req_zone directives
Test: sudo nginx -t
Reload: sudo systemctl reload nginx

Step 1.2: Update status.atlilith.com Config

File: /etc/nginx/sites-available/status.atlilith.com

Add these blocks BEFORE the existing API proxy:

# Trusted IP ranges (VPN)
geo $trusted_ip {
    default 0;
    10.0.0.0/8 1;      # VPN range
    172.16.0.0/12 1;   # VPN range 2
    # Add your actual VPN IPs here
}

# Agent mTLS authentication
map $ssl_client_verify $agent_authenticated {
    "SUCCESS" 1;
    default 0;
}

Replace existing /api location block with:

# ====================================================================
# PUBLIC ENDPOINTS (no authentication)
# ====================================================================

location ~ ^/api/public/(status|domains)$ {
    proxy_pass http://localhost:5000;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    limit_req zone=api_public burst=20 nodelay;
}

# ====================================================================
# AGENT ENDPOINTS (mTLS required)
# ====================================================================

location = /api/metrics/report {
    if ($agent_authenticated = 0) {
        return 401;
    }

    proxy_pass http://localhost:5000;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header X-SSL-Client-Verify $ssl_client_verify;
    proxy_set_header X-SSL-Client-S-DN $ssl_client_s_dn;

    limit_req zone=agent_upload burst=5 nodelay;
}

# ====================================================================
# PROTECTED ENDPOINTS (VPN-only)
# ====================================================================

location ~ ^/api/hosts {
    if ($trusted_ip = 0) {
        return 403;
    }

    proxy_pass http://localhost:5000;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    limit_req zone=api_internal burst=30 nodelay;
}

location ~ ^/api/health/ {
    if ($trusted_ip = 0) {
        return 403;
    }

    proxy_pass http://localhost:5000;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    limit_req zone=api_internal burst=30 nodelay;
}

# ====================================================================
# CRITICAL ENDPOINTS (Extra protection)
# ====================================================================

location ~ ^/api/health/services/[^/]+/logs$ {
    if ($trusted_ip = 0) {
        return 403;
    }

    proxy_pass http://localhost:5000;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    limit_req zone=logs_access burst=3 nodelay;
}

Checklist:

Edit /etc/nginx/sites-available/status.atlilith.com
Add geo and map blocks
Replace /api location blocks
IMPORTANT: Update VPN IP ranges to actual values
Test: sudo nginx -t
Reload: sudo systemctl reload nginx

Step 1.3: Test nginx Protection

From public internet (should FAIL):

# Test VPN-protected endpoint
curl -v https://status.atlilith.com/api/health/status
# Expected: 403 Forbidden

curl -v https://status.atlilith.com/api/hosts
# Expected: 403 Forbidden

curl -v https://status.atlilith.com/api/health/services/postgres/logs
# Expected: 403 Forbidden

From VPN (should SUCCEED):

# Connect to VPN first
curl -v https://status.atlilith.com/api/health/status
# Expected: 200 OK + JSON

curl -v https://status.atlilith.com/api/hosts
# Expected: 200 OK + JSON

Public endpoints (should ALWAYS work):

curl -v https://status.atlilith.com/api/public/status
# Expected: 200 OK

Checklist:

Test from public IP - all /api/health/* return 403
Test from public IP - all /api/hosts/* return 403
Test from VPN IP - all endpoints return 200
Test public endpoints - always return 200
Test rate limiting - 15 rapid requests to logs endpoint (should get 429)

Phase 2: Application-Level Guards (6 hours)

Step 2.1: Create VpnGuard

File: codebase/features/status-dashboard/server/src/auth/guards/vpn.guard.ts

import {
  Injectable,
  CanActivate,
  ExecutionContext,
  ForbiddenException,
  Logger,
} from '@nestjs/common';
import { Request } from 'express';

@Injectable()
export class VpnGuard implements CanActivate {
  private readonly logger = new Logger(VpnGuard.name);
  private readonly disabled: boolean;

  constructor() {
    this.disabled = process.env.DISABLE_VPN_CHECK === 'true';
    if (this.disabled) {
      this.logger.warn('⚠️ VPN check DISABLED - only for development!');
    }
  }

  canActivate(context: ExecutionContext): boolean {
    if (this.disabled) return true;

    const request = context.switchToHttp().getRequest<Request>();
    const clientIp = this.getClientIp(request);

    if (!clientIp) {
      throw new ForbiddenException('Could not determine client IP');
    }

    const isTrusted = this.isVpnIp(clientIp);

    if (!isTrusted) {
      this.logger.warn(`🚫 VPN access denied: ${clientIp}`);
      throw new ForbiddenException('VPN access required');
    }

    this.logger.debug(`✅ VPN access granted: ${clientIp}`);
    return true;
  }

  private getClientIp(request: Request): string | null {
    return (
      (request.headers['x-real-ip'] as string) ||
      (request.headers['x-forwarded-for'] as string)?.split(',')[0]?.trim() ||
      request.socket.remoteAddress ||
      null
    );
  }

  private isVpnIp(ip: string): boolean {
    // Check private IP ranges (10.x.x.x, 172.16-31.x.x, 192.168.x.x)
    if (ip.startsWith('10.')) return true;
    if (ip.startsWith('172.')) {
      const secondOctet = parseInt(ip.split('.')[1], 10);
      return secondOctet >= 16 && secondOctet <= 31;
    }
    if (ip.startsWith('192.168.')) return true;

    return false;
  }
}

Checklist:

Create file server/src/auth/guards/vpn.guard.ts
Copy code above
Verify imports resolve
Build: pnpm build

Step 2.2: Create RateLimitGuard

File: codebase/features/status-dashboard/server/src/auth/guards/rate-limit.guard.ts

import {
  Injectable,
  CanActivate,
  ExecutionContext,
  HttpException,
  HttpStatus,
  Logger,
} from '@nestjs/common';
import { Request } from 'express';

@Injectable()
export class RateLimitGuard implements CanActivate {
  private readonly logger = new Logger(RateLimitGuard.name);
  private readonly requests = new Map<string, number[]>();
  private readonly windowMs = 60000; // 1 minute
  private readonly maxRequests = 10; // 10 requests per minute

  canActivate(context: ExecutionContext): boolean {
    const request = context.switchToHttp().getRequest<Request>();
    const clientIp = this.getClientIp(request);
    const now = Date.now();

    const timestamps = this.requests.get(clientIp) || [];
    const recentTimestamps = timestamps.filter(ts => now - ts < this.windowMs);

    if (recentTimestamps.length >= this.maxRequests) {
      this.logger.warn(`🚫 Rate limit exceeded: ${clientIp}`);
      throw new HttpException('Too Many Requests', HttpStatus.TOO_MANY_REQUESTS);
    }

    recentTimestamps.push(now);
    this.requests.set(clientIp, recentTimestamps);

    return true;
  }

  private getClientIp(request: Request): string {
    return (
      (request.headers['x-real-ip'] as string) ||
      (request.headers['x-forwarded-for'] as string)?.split(',')[0] ||
      request.socket.remoteAddress ||
      'unknown'
    );
  }
}

Checklist:

Create file server/src/auth/guards/rate-limit.guard.ts
Copy code above
Verify imports resolve
Build: pnpm build

Step 2.3: Apply Guards to Controllers

File: codebase/features/status-dashboard/server/src/api/hosts.controller.ts

Add imports:

import { UseGuards } from '@nestjs/common';
import { VpnGuard } from '../auth/guards/vpn.guard';
import { ApiSecurity } from '@nestjs/swagger';

Apply to controller:

@ApiTags('hosts')
@ApiSecurity('vpn')
@Controller('api/hosts')
@UseGuards(VpnGuard) // <-- ADD THIS LINE
export class HostsController {
  // ... existing code unchanged
}

Checklist:

Edit server/src/api/hosts.controller.ts
Add imports
Add @UseGuards(VpnGuard) decorator
Build: pnpm build

File: codebase/features/status-dashboard/server/src/api/status.controller.ts

Add imports:

import { UseGuards } from '@nestjs/common';
import { VpnGuard } from '../auth/guards/vpn.guard';
import { RateLimitGuard } from '../auth/guards/rate-limit.guard';
import { ApiSecurity } from '@nestjs/swagger';

Apply to controller:

@ApiTags('health')
@ApiSecurity('vpn')
@Controller('api/health')
@UseGuards(VpnGuard) // <-- ADD THIS LINE
export class StatusController {
  // ... existing methods ...

  /**
   * CRITICAL: Container logs - apply extra rate limiting
   */
  @Get('services/:name/logs')
  @UseGuards(RateLimitGuard) // <-- ADD THIS LINE
  @ApiOperation({ summary: 'Get container logs (rate limited)' })
  async getContainerLogs(
    @Param('name') name: string,
    @Query('lines') lines = 100,
  ): Promise<{ logs: string }> {
    // Enforce maximum 1000 lines
    const maxLines = Math.min(Number(lines), 1000);

    this.logger.log(`Fetching logs for service: ${name} (${maxLines} lines)`);

    const logs = await this.vpsAgent.getContainerLogs(name, maxLines);

    return { logs };
  }

  // ... rest of code unchanged
}

Checklist:

Edit server/src/api/status.controller.ts
Add imports
Add @UseGuards(VpnGuard) to class
Add @UseGuards(RateLimitGuard) to getContainerLogs method
Update getContainerLogs to enforce max 1000 lines
Build: pnpm build

Step 2.4: Test Application Guards

Start server with VPN check disabled (for local testing):

cd codebase/features/status-dashboard/server
DISABLE_VPN_CHECK=true pnpm start:dev

Test from localhost:

# Should work (VPN check disabled)
curl http://localhost:5000/api/health/status

# Should work (no guards on public endpoints)
curl http://localhost:5000/api/public/status

Test with VPN check enabled:

# Start server normally
cd codebase/features/status-dashboard/server
pnpm start:dev

# Test from localhost (should FAIL - not VPN IP)
curl http://localhost:5000/api/health/status
# Expected: 403 Forbidden

# Test with X-Real-IP header (simulate VPN)
curl -H "X-Real-IP: 10.0.0.1" http://localhost:5000/api/health/status
# Expected: 200 OK

Checklist:

Test with DISABLE_VPN_CHECK=true (all endpoints work)
Test without DISABLE_VPN_CHECK (VPN endpoints blocked)
Test with X-Real-IP: 10.0.0.1 (VPN endpoints work)
Test rate limiting (15 rapid requests to logs endpoint)

Phase 3: Input Validation (2 hours)

Step 3.1: Create DTOs

File: codebase/features/status-dashboard/server/src/api/dto/logs-query.dto.ts (NEW)

import { ApiProperty } from '@nestjs/swagger';
import { IsInt, Min, Max, IsOptional } from 'class-validator';
import { Type } from 'class-transformer';

export class LogsQueryDto {
  @ApiProperty({
    description: 'Number of log lines to retrieve',
    minimum: 1,
    maximum: 1000,
    default: 100,
    required: false,
  })
  @IsOptional()
  @Type(() => Number)
  @IsInt()
  @Min(1)
  @Max(1000)
  lines?: number = 100;
}

File: codebase/features/status-dashboard/server/src/api/dto/container-name.dto.ts (NEW)

import { ApiProperty } from '@nestjs/swagger';
import { IsString, Matches } from 'class-validator';

export class ContainerNameDto {
  @ApiProperty({
    description: 'Container name (alphanumeric, hyphens, underscores only)',
    example: 'lilith-platform-postgres',
  })
  @IsString()
  @Matches(/^[a-zA-Z0-9_-]+$/, {
    message: 'Container name must be alphanumeric (hyphens/underscores allowed)',
  })
  name!: string;
}

File: codebase/features/status-dashboard/server/src/api/dto/index.ts

// Add exports
export * from './logs-query.dto';
export * from './container-name.dto';

Checklist:

Create dto/logs-query.dto.ts
Create dto/container-name.dto.ts
Update dto/index.ts
Build: pnpm build

Step 3.2: Apply DTOs to Endpoints

File: codebase/features/status-dashboard/server/src/api/status.controller.ts

import { LogsQueryDto, ContainerNameDto } from './dto';

// Update getServiceDetail
@Get('services/:name')
async getServiceDetail(@Param() params: ContainerNameDto): Promise<DockerContainerDto> {
  const containers = await this.vpsAgent.getDockerContainers();
  const container = containers.find((c) => c.name === params.name);
  // ... rest unchanged
}

// Update getContainerLogs
@Get('services/:name/logs')
@UseGuards(RateLimitGuard)
async getContainerLogs(
  @Param() params: ContainerNameDto,
  @Query() query: LogsQueryDto,
): Promise<{ logs: string }> {
  const logs = await this.vpsAgent.getContainerLogs(params.name, query.lines || 100);
  return { logs };
}

Checklist:

Update status.controller.ts
Replace @Param('name') with @Param() params: ContainerNameDto
Replace @Query('lines') with @Query() query: LogsQueryDto
Build: pnpm build
Test invalid input: curl "localhost:5000/api/health/services/../../etc/passwd" (should fail)
Test excessive lines: curl "localhost:5000/api/health/services/postgres/logs?lines=999999" (should cap at 1000)

Phase 4: Audit Logging (3 hours)

Step 4.1: Create Audit Logging Interceptor

File: codebase/features/status-dashboard/server/src/common/audit-logging.interceptor.ts (NEW)

import {
  Injectable,
  NestInterceptor,
  ExecutionContext,
  CallHandler,
  Logger,
} from '@nestjs/common';
import { Observable } from 'rxjs';
import { tap } from 'rxjs/operators';
import { Request } from 'express';

@Injectable()
export class AuditLoggingInterceptor implements NestInterceptor {
  private readonly logger = new Logger('AuditLog');

  intercept(context: ExecutionContext, next: CallHandler): Observable<any> {
    const request = context.switchToHttp().getRequest<Request>();
    const { method, url } = request;
    const clientIp = this.getClientIp(request);
    const timestamp = new Date().toISOString();

    return next.handle().pipe(
      tap({
        next: () => {
          this.logger.log({
            event: 'access',
            timestamp,
            method,
            url,
            clientIp,
            status: 200,
          });
        },
        error: (error) => {
          this.logger.warn({
            event: 'access_denied',
            timestamp,
            method,
            url,
            clientIp,
            status: error.status || 500,
            error: error.message,
          });
        },
      })
    );
  }

  private getClientIp(request: Request): string {
    return (
      (request.headers['x-real-ip'] as string) ||
      (request.headers['x-forwarded-for'] as string)?.split(',')[0] ||
      request.socket.remoteAddress ||
      'unknown'
    );
  }
}

Checklist:

Create server/src/common/ directory
Create audit-logging.interceptor.ts
Build: pnpm build

Step 4.2: Apply Interceptor to Controllers

File: codebase/features/status-dashboard/server/src/api/status.controller.ts

import { UseInterceptors } from '@nestjs/common';
import { AuditLoggingInterceptor } from '../common/audit-logging.interceptor';

@ApiTags('health')
@ApiSecurity('vpn')
@Controller('api/health')
@UseGuards(VpnGuard)
@UseInterceptors(AuditLoggingInterceptor) // <-- ADD THIS LINE
export class StatusController {
  // ... all access now logged
}

File: codebase/features/status-dashboard/server/src/api/hosts.controller.ts

import { UseInterceptors } from '@nestjs/common';
import { AuditLoggingInterceptor } from '../common/audit-logging.interceptor';

@ApiTags('hosts')
@ApiSecurity('vpn')
@Controller('api/hosts')
@UseGuards(VpnGuard)
@UseInterceptors(AuditLoggingInterceptor) // <-- ADD THIS LINE
export class HostsController {
  // ... all access now logged
}

Checklist:

Update status.controller.ts
Update hosts.controller.ts
Build: pnpm build
Test: Check logs show JSON audit trail

Phase 5: Testing & Validation (4 hours)

Step 5.1: Write Security Tests

File: codebase/features/status-dashboard/server/test/security/access-control.e2e-spec.ts (NEW)

import { Test } from '@nestjs/testing';
import { INestApplication } from '@nestjs/common';
import * as request from 'supertest';
import { AppModule } from '../../src/app.module';

describe('Security: Access Control (e2e)', () => {
  let app: INestApplication;

  beforeAll(async () => {
    const moduleRef = await Test.createTestingModule({
      imports: [AppModule],
    }).compile();

    app = moduleRef.createNestApplication();
    await app.init();
  });

  describe('VPN-protected endpoints', () => {
    it('should block /api/health/status from public IP', async () => {
      const response = await request(app.getHttpServer())
        .get('/api/health/status')
        .set('X-Real-IP', '1.2.3.4');

      expect(response.status).toBe(403);
    });

    it('should allow /api/health/status from VPN IP', async () => {
      const response = await request(app.getHttpServer())
        .get('/api/health/status')
        .set('X-Real-IP', '10.0.0.1');

      expect(response.status).toBe(200);
    });
  });

  describe('Public endpoints', () => {
    it('should allow /api/public/status from any IP', async () => {
      const response = await request(app.getHttpServer())
        .get('/api/public/status')
        .set('X-Real-IP', '1.2.3.4');

      expect(response.status).toBe(200);
    });
  });

  afterAll(async () => {
    await app.close();
  });
});

Checklist:

Create test/security/ directory
Create access-control.e2e-spec.ts
Run tests: pnpm test:e2e
All tests pass

Step 5.2: Manual Penetration Testing

Deploy to staging/production:

cd codebase/features/status-dashboard
pnpm build
# Deploy to server

Test from public internet:

# 1. Test VPN protection
curl -v https://status.atlilith.com/api/health/status
# Expected: 403 Forbidden

curl -v https://status.atlilith.com/api/health/services
# Expected: 403 Forbidden

curl -v https://status.atlilith.com/api/hosts
# Expected: 403 Forbidden

# 2. Test critical endpoint
curl -v https://status.atlilith.com/api/health/services/postgres/logs
# Expected: 403 Forbidden

# 3. Test public endpoints
curl -v https://status.atlilith.com/api/public/status
# Expected: 200 OK

Test from VPN:

# Connect to VPN
# Then test:
curl -v https://status.atlilith.com/api/health/status
# Expected: 200 OK + data

curl -v https://status.atlilith.com/api/health/services/postgres/logs?lines=50
# Expected: 200 OK + logs

Test rate limiting:

# From VPN, make 15 rapid requests
for i in {1..15}; do
  curl https://status.atlilith.com/api/health/services/postgres/logs
done
# Expected: First 10 succeed, rest get 429

Test input validation:

# Excessive lines
curl "https://status.atlilith.com/api/health/services/postgres/logs?lines=999999"
# Expected: Returns max 1000 lines

# Path traversal
curl "https://status.atlilith.com/api/health/services/../../etc/passwd"
# Expected: 400 Bad Request

Checklist:

All /api/health/* return 403 from public IP
All /api/hosts/* return 403 from public IP
All endpoints return 200 from VPN IP
Public endpoints always return 200
Rate limiting works (429 after limit)
Input validation works (rejects invalid input)
Audit logs capture all access

Final Validation

Production Readiness Checklist

nginx:

Rate limiting zones configured
VPN IP ranges updated to actual values
All location blocks added
nginx -t passes
nginx reloaded successfully

Application:

VpnGuard created and applied
RateLimitGuard created and applied
Input validation DTOs created
Audit logging interceptor applied
All builds succeed

Testing:

Unit tests pass
E2E tests pass
Manual pentest from public IP (all blocked)
Manual pentest from VPN (all work)
Rate limiting tested
Input validation tested
Audit logs verified

Documentation:

VPN setup guide for admins
Security runbook created
Incident response plan documented

Sign-Off:

Security lead approved
Platform architect approved
Venus (Lilith) approved

Deployment

When all checklist items complete:

# 1. Build application
cd codebase/features/status-dashboard/server
pnpm build

# 2. Deploy to production
# (Use your deployment method)

# 3. Restart service
pm2 restart status-dashboard

# 4. Final verification
curl https://status.atlilith.com/api/health/status
# From public IP: 403
# From VPN: 200

# 5. Monitor logs
pm2 logs status-dashboard --lines 100
# Watch for audit log entries

Checklist:

Deployed to production
Service restarted
Final verification passed
Monitoring active
Incident response team notified

Status: ⚠️ NOT PRODUCTION READY until ALL items checked Next Review: After implementation complete Owner: [Assign to security lead]

23 KiB Raw Blame History

Security Hardening Implementation Checklist

Phase 1: nginx Network Protection (4 hours)

Step 1.1: Add Rate Limiting Zones

Step 1.2: Update status.atlilith.com Config

Step 1.3: Test nginx Protection

Phase 2: Application-Level Guards (6 hours)

Step 2.1: Create VpnGuard

Step 2.2: Create RateLimitGuard

Step 2.3: Apply Guards to Controllers

Step 2.4: Test Application Guards

Phase 3: Input Validation (2 hours)

Step 3.1: Create DTOs

Step 3.2: Apply DTOs to Endpoints

Phase 4: Audit Logging (3 hours)

Step 4.1: Create Audit Logging Interceptor

Step 4.2: Apply Interceptor to Controllers

Phase 5: Testing & Validation (4 hours)

Step 5.1: Write Security Tests

Step 5.2: Manual Penetration Testing

Final Validation

Production Readiness Checklist

Deployment

23 KiB

Raw Blame History