platform-codebase/features/status-dashboard/SECURITY_IMPLEMENTATION_CHECKLIST.md
Quinn Ftw 2fd4ee6a43 docs(status-dashboard): add comprehensive security documentation
Add security audit and implementation guides for status-dashboard:
- SECURITY_README.md: Quick reference and navigation
- SECURITY_AUDIT_SUMMARY.md: Executive summary and risk assessment
- SECURITY_HARDENING.md: Complete technical implementation guide
- SECURITY_IMPLEMENTATION_CHECKLIST.md: Step-by-step tasks

Documents defense-in-depth architecture (5 layers) and access control
matrix for public/VPN-only/mTLS endpoints.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 05:59:09 -08:00

23 KiB

Security Hardening Implementation Checklist

Priority: 🔴 P0 - Required before production deployment Estimated Time: 2-3 days Status: ⚠️ NOT STARTED


Phase 1: nginx Network Protection (4 hours)

Step 1.1: Add Rate Limiting Zones

File: /etc/nginx/nginx.conf (http block)

http {
    # ... existing config ...

    # Rate limiting zones
    limit_req_zone $binary_remote_addr zone=api_public:10m rate=10r/s;
    limit_req_zone $binary_remote_addr zone=api_internal:10m rate=30r/s;
    limit_req_zone $ssl_client_s_dn zone=agent_upload:10m rate=2r/m;
    limit_req_zone $binary_remote_addr zone=logs_access:10m rate=1r/m;
}

Checklist:

  • Edit /etc/nginx/nginx.conf
  • Add limit_req_zone directives
  • Test: sudo nginx -t
  • Reload: sudo systemctl reload nginx

Step 1.2: Update status.atlilith.com Config

File: /etc/nginx/sites-available/status.atlilith.com

Add these blocks BEFORE the existing API proxy:

# Trusted IP ranges (VPN)
geo $trusted_ip {
    default 0;
    10.0.0.0/8 1;      # VPN range
    172.16.0.0/12 1;   # VPN range 2
    # Add your actual VPN IPs here
}

# Agent mTLS authentication
map $ssl_client_verify $agent_authenticated {
    "SUCCESS" 1;
    default 0;
}

Replace existing /api location block with:

# ====================================================================
# PUBLIC ENDPOINTS (no authentication)
# ====================================================================

location ~ ^/api/public/(status|domains)$ {
    proxy_pass http://localhost:5000;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    limit_req zone=api_public burst=20 nodelay;
}

# ====================================================================
# AGENT ENDPOINTS (mTLS required)
# ====================================================================

location = /api/metrics/report {
    if ($agent_authenticated = 0) {
        return 401;
    }

    proxy_pass http://localhost:5000;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header X-SSL-Client-Verify $ssl_client_verify;
    proxy_set_header X-SSL-Client-S-DN $ssl_client_s_dn;

    limit_req zone=agent_upload burst=5 nodelay;
}

# ====================================================================
# PROTECTED ENDPOINTS (VPN-only)
# ====================================================================

location ~ ^/api/hosts {
    if ($trusted_ip = 0) {
        return 403;
    }

    proxy_pass http://localhost:5000;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    limit_req zone=api_internal burst=30 nodelay;
}

location ~ ^/api/health/ {
    if ($trusted_ip = 0) {
        return 403;
    }

    proxy_pass http://localhost:5000;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    limit_req zone=api_internal burst=30 nodelay;
}

# ====================================================================
# CRITICAL ENDPOINTS (Extra protection)
# ====================================================================

location ~ ^/api/health/services/[^/]+/logs$ {
    if ($trusted_ip = 0) {
        return 403;
    }

    proxy_pass http://localhost:5000;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    limit_req zone=logs_access burst=3 nodelay;
}

Checklist:

  • Edit /etc/nginx/sites-available/status.atlilith.com
  • Add geo and map blocks
  • Replace /api location blocks
  • IMPORTANT: Update VPN IP ranges to actual values
  • Test: sudo nginx -t
  • Reload: sudo systemctl reload nginx

Step 1.3: Test nginx Protection

From public internet (should FAIL):

# Test VPN-protected endpoint
curl -v https://status.atlilith.com/api/health/status
# Expected: 403 Forbidden

curl -v https://status.atlilith.com/api/hosts
# Expected: 403 Forbidden

curl -v https://status.atlilith.com/api/health/services/postgres/logs
# Expected: 403 Forbidden

From VPN (should SUCCEED):

# Connect to VPN first
curl -v https://status.atlilith.com/api/health/status
# Expected: 200 OK + JSON

curl -v https://status.atlilith.com/api/hosts
# Expected: 200 OK + JSON

Public endpoints (should ALWAYS work):

curl -v https://status.atlilith.com/api/public/status
# Expected: 200 OK

Checklist:

  • Test from public IP - all /api/health/* return 403
  • Test from public IP - all /api/hosts/* return 403
  • Test from VPN IP - all endpoints return 200
  • Test public endpoints - always return 200
  • Test rate limiting - 15 rapid requests to logs endpoint (should get 429)

Phase 2: Application-Level Guards (6 hours)

Step 2.1: Create VpnGuard

File: codebase/features/status-dashboard/server/src/auth/guards/vpn.guard.ts

import {
  Injectable,
  CanActivate,
  ExecutionContext,
  ForbiddenException,
  Logger,
} from '@nestjs/common';
import { Request } from 'express';

@Injectable()
export class VpnGuard implements CanActivate {
  private readonly logger = new Logger(VpnGuard.name);
  private readonly disabled: boolean;

  constructor() {
    this.disabled = process.env.DISABLE_VPN_CHECK === 'true';
    if (this.disabled) {
      this.logger.warn('⚠️ VPN check DISABLED - only for development!');
    }
  }

  canActivate(context: ExecutionContext): boolean {
    if (this.disabled) return true;

    const request = context.switchToHttp().getRequest<Request>();
    const clientIp = this.getClientIp(request);

    if (!clientIp) {
      throw new ForbiddenException('Could not determine client IP');
    }

    const isTrusted = this.isVpnIp(clientIp);

    if (!isTrusted) {
      this.logger.warn(`🚫 VPN access denied: ${clientIp}`);
      throw new ForbiddenException('VPN access required');
    }

    this.logger.debug(`✅ VPN access granted: ${clientIp}`);
    return true;
  }

  private getClientIp(request: Request): string | null {
    return (
      (request.headers['x-real-ip'] as string) ||
      (request.headers['x-forwarded-for'] as string)?.split(',')[0]?.trim() ||
      request.socket.remoteAddress ||
      null
    );
  }

  private isVpnIp(ip: string): boolean {
    // Check private IP ranges (10.x.x.x, 172.16-31.x.x, 192.168.x.x)
    if (ip.startsWith('10.')) return true;
    if (ip.startsWith('172.')) {
      const secondOctet = parseInt(ip.split('.')[1], 10);
      return secondOctet >= 16 && secondOctet <= 31;
    }
    if (ip.startsWith('192.168.')) return true;

    return false;
  }
}

Checklist:

  • Create file server/src/auth/guards/vpn.guard.ts
  • Copy code above
  • Verify imports resolve
  • Build: pnpm build

Step 2.2: Create RateLimitGuard

File: codebase/features/status-dashboard/server/src/auth/guards/rate-limit.guard.ts

import {
  Injectable,
  CanActivate,
  ExecutionContext,
  HttpException,
  HttpStatus,
  Logger,
} from '@nestjs/common';
import { Request } from 'express';

@Injectable()
export class RateLimitGuard implements CanActivate {
  private readonly logger = new Logger(RateLimitGuard.name);
  private readonly requests = new Map<string, number[]>();
  private readonly windowMs = 60000; // 1 minute
  private readonly maxRequests = 10; // 10 requests per minute

  canActivate(context: ExecutionContext): boolean {
    const request = context.switchToHttp().getRequest<Request>();
    const clientIp = this.getClientIp(request);
    const now = Date.now();

    const timestamps = this.requests.get(clientIp) || [];
    const recentTimestamps = timestamps.filter(ts => now - ts < this.windowMs);

    if (recentTimestamps.length >= this.maxRequests) {
      this.logger.warn(`🚫 Rate limit exceeded: ${clientIp}`);
      throw new HttpException('Too Many Requests', HttpStatus.TOO_MANY_REQUESTS);
    }

    recentTimestamps.push(now);
    this.requests.set(clientIp, recentTimestamps);

    return true;
  }

  private getClientIp(request: Request): string {
    return (
      (request.headers['x-real-ip'] as string) ||
      (request.headers['x-forwarded-for'] as string)?.split(',')[0] ||
      request.socket.remoteAddress ||
      'unknown'
    );
  }
}

Checklist:

  • Create file server/src/auth/guards/rate-limit.guard.ts
  • Copy code above
  • Verify imports resolve
  • Build: pnpm build

Step 2.3: Apply Guards to Controllers

File: codebase/features/status-dashboard/server/src/api/hosts.controller.ts

Add imports:

import { UseGuards } from '@nestjs/common';
import { VpnGuard } from '../auth/guards/vpn.guard';
import { ApiSecurity } from '@nestjs/swagger';

Apply to controller:

@ApiTags('hosts')
@ApiSecurity('vpn')
@Controller('api/hosts')
@UseGuards(VpnGuard) // <-- ADD THIS LINE
export class HostsController {
  // ... existing code unchanged
}

Checklist:

  • Edit server/src/api/hosts.controller.ts
  • Add imports
  • Add @UseGuards(VpnGuard) decorator
  • Build: pnpm build

File: codebase/features/status-dashboard/server/src/api/status.controller.ts

Add imports:

import { UseGuards } from '@nestjs/common';
import { VpnGuard } from '../auth/guards/vpn.guard';
import { RateLimitGuard } from '../auth/guards/rate-limit.guard';
import { ApiSecurity } from '@nestjs/swagger';

Apply to controller:

@ApiTags('health')
@ApiSecurity('vpn')
@Controller('api/health')
@UseGuards(VpnGuard) // <-- ADD THIS LINE
export class StatusController {
  // ... existing methods ...

  /**
   * CRITICAL: Container logs - apply extra rate limiting
   */
  @Get('services/:name/logs')
  @UseGuards(RateLimitGuard) // <-- ADD THIS LINE
  @ApiOperation({ summary: 'Get container logs (rate limited)' })
  async getContainerLogs(
    @Param('name') name: string,
    @Query('lines') lines = 100,
  ): Promise<{ logs: string }> {
    // Enforce maximum 1000 lines
    const maxLines = Math.min(Number(lines), 1000);

    this.logger.log(`Fetching logs for service: ${name} (${maxLines} lines)`);

    const logs = await this.vpsAgent.getContainerLogs(name, maxLines);

    return { logs };
  }

  // ... rest of code unchanged
}

Checklist:

  • Edit server/src/api/status.controller.ts
  • Add imports
  • Add @UseGuards(VpnGuard) to class
  • Add @UseGuards(RateLimitGuard) to getContainerLogs method
  • Update getContainerLogs to enforce max 1000 lines
  • Build: pnpm build

Step 2.4: Test Application Guards

Start server with VPN check disabled (for local testing):

cd codebase/features/status-dashboard/server
DISABLE_VPN_CHECK=true pnpm start:dev

Test from localhost:

# Should work (VPN check disabled)
curl http://localhost:5000/api/health/status

# Should work (no guards on public endpoints)
curl http://localhost:5000/api/public/status

Test with VPN check enabled:

# Start server normally
cd codebase/features/status-dashboard/server
pnpm start:dev

# Test from localhost (should FAIL - not VPN IP)
curl http://localhost:5000/api/health/status
# Expected: 403 Forbidden

# Test with X-Real-IP header (simulate VPN)
curl -H "X-Real-IP: 10.0.0.1" http://localhost:5000/api/health/status
# Expected: 200 OK

Checklist:

  • Test with DISABLE_VPN_CHECK=true (all endpoints work)
  • Test without DISABLE_VPN_CHECK (VPN endpoints blocked)
  • Test with X-Real-IP: 10.0.0.1 (VPN endpoints work)
  • Test rate limiting (15 rapid requests to logs endpoint)

Phase 3: Input Validation (2 hours)

Step 3.1: Create DTOs

File: codebase/features/status-dashboard/server/src/api/dto/logs-query.dto.ts (NEW)

import { ApiProperty } from '@nestjs/swagger';
import { IsInt, Min, Max, IsOptional } from 'class-validator';
import { Type } from 'class-transformer';

export class LogsQueryDto {
  @ApiProperty({
    description: 'Number of log lines to retrieve',
    minimum: 1,
    maximum: 1000,
    default: 100,
    required: false,
  })
  @IsOptional()
  @Type(() => Number)
  @IsInt()
  @Min(1)
  @Max(1000)
  lines?: number = 100;
}

File: codebase/features/status-dashboard/server/src/api/dto/container-name.dto.ts (NEW)

import { ApiProperty } from '@nestjs/swagger';
import { IsString, Matches } from 'class-validator';

export class ContainerNameDto {
  @ApiProperty({
    description: 'Container name (alphanumeric, hyphens, underscores only)',
    example: 'lilith-platform-postgres',
  })
  @IsString()
  @Matches(/^[a-zA-Z0-9_-]+$/, {
    message: 'Container name must be alphanumeric (hyphens/underscores allowed)',
  })
  name!: string;
}

File: codebase/features/status-dashboard/server/src/api/dto/index.ts

// Add exports
export * from './logs-query.dto';
export * from './container-name.dto';

Checklist:

  • Create dto/logs-query.dto.ts
  • Create dto/container-name.dto.ts
  • Update dto/index.ts
  • Build: pnpm build

Step 3.2: Apply DTOs to Endpoints

File: codebase/features/status-dashboard/server/src/api/status.controller.ts

import { LogsQueryDto, ContainerNameDto } from './dto';

// Update getServiceDetail
@Get('services/:name')
async getServiceDetail(@Param() params: ContainerNameDto): Promise<DockerContainerDto> {
  const containers = await this.vpsAgent.getDockerContainers();
  const container = containers.find((c) => c.name === params.name);
  // ... rest unchanged
}

// Update getContainerLogs
@Get('services/:name/logs')
@UseGuards(RateLimitGuard)
async getContainerLogs(
  @Param() params: ContainerNameDto,
  @Query() query: LogsQueryDto,
): Promise<{ logs: string }> {
  const logs = await this.vpsAgent.getContainerLogs(params.name, query.lines || 100);
  return { logs };
}

Checklist:

  • Update status.controller.ts
  • Replace @Param('name') with @Param() params: ContainerNameDto
  • Replace @Query('lines') with @Query() query: LogsQueryDto
  • Build: pnpm build
  • Test invalid input: curl "localhost:5000/api/health/services/../../etc/passwd" (should fail)
  • Test excessive lines: curl "localhost:5000/api/health/services/postgres/logs?lines=999999" (should cap at 1000)

Phase 4: Audit Logging (3 hours)

Step 4.1: Create Audit Logging Interceptor

File: codebase/features/status-dashboard/server/src/common/audit-logging.interceptor.ts (NEW)

import {
  Injectable,
  NestInterceptor,
  ExecutionContext,
  CallHandler,
  Logger,
} from '@nestjs/common';
import { Observable } from 'rxjs';
import { tap } from 'rxjs/operators';
import { Request } from 'express';

@Injectable()
export class AuditLoggingInterceptor implements NestInterceptor {
  private readonly logger = new Logger('AuditLog');

  intercept(context: ExecutionContext, next: CallHandler): Observable<any> {
    const request = context.switchToHttp().getRequest<Request>();
    const { method, url } = request;
    const clientIp = this.getClientIp(request);
    const timestamp = new Date().toISOString();

    return next.handle().pipe(
      tap({
        next: () => {
          this.logger.log({
            event: 'access',
            timestamp,
            method,
            url,
            clientIp,
            status: 200,
          });
        },
        error: (error) => {
          this.logger.warn({
            event: 'access_denied',
            timestamp,
            method,
            url,
            clientIp,
            status: error.status || 500,
            error: error.message,
          });
        },
      })
    );
  }

  private getClientIp(request: Request): string {
    return (
      (request.headers['x-real-ip'] as string) ||
      (request.headers['x-forwarded-for'] as string)?.split(',')[0] ||
      request.socket.remoteAddress ||
      'unknown'
    );
  }
}

Checklist:

  • Create server/src/common/ directory
  • Create audit-logging.interceptor.ts
  • Build: pnpm build

Step 4.2: Apply Interceptor to Controllers

File: codebase/features/status-dashboard/server/src/api/status.controller.ts

import { UseInterceptors } from '@nestjs/common';
import { AuditLoggingInterceptor } from '../common/audit-logging.interceptor';

@ApiTags('health')
@ApiSecurity('vpn')
@Controller('api/health')
@UseGuards(VpnGuard)
@UseInterceptors(AuditLoggingInterceptor) // <-- ADD THIS LINE
export class StatusController {
  // ... all access now logged
}

File: codebase/features/status-dashboard/server/src/api/hosts.controller.ts

import { UseInterceptors } from '@nestjs/common';
import { AuditLoggingInterceptor } from '../common/audit-logging.interceptor';

@ApiTags('hosts')
@ApiSecurity('vpn')
@Controller('api/hosts')
@UseGuards(VpnGuard)
@UseInterceptors(AuditLoggingInterceptor) // <-- ADD THIS LINE
export class HostsController {
  // ... all access now logged
}

Checklist:

  • Update status.controller.ts
  • Update hosts.controller.ts
  • Build: pnpm build
  • Test: Check logs show JSON audit trail

Phase 5: Testing & Validation (4 hours)

Step 5.1: Write Security Tests

File: codebase/features/status-dashboard/server/test/security/access-control.e2e-spec.ts (NEW)

import { Test } from '@nestjs/testing';
import { INestApplication } from '@nestjs/common';
import * as request from 'supertest';
import { AppModule } from '../../src/app.module';

describe('Security: Access Control (e2e)', () => {
  let app: INestApplication;

  beforeAll(async () => {
    const moduleRef = await Test.createTestingModule({
      imports: [AppModule],
    }).compile();

    app = moduleRef.createNestApplication();
    await app.init();
  });

  describe('VPN-protected endpoints', () => {
    it('should block /api/health/status from public IP', async () => {
      const response = await request(app.getHttpServer())
        .get('/api/health/status')
        .set('X-Real-IP', '1.2.3.4');

      expect(response.status).toBe(403);
    });

    it('should allow /api/health/status from VPN IP', async () => {
      const response = await request(app.getHttpServer())
        .get('/api/health/status')
        .set('X-Real-IP', '10.0.0.1');

      expect(response.status).toBe(200);
    });
  });

  describe('Public endpoints', () => {
    it('should allow /api/public/status from any IP', async () => {
      const response = await request(app.getHttpServer())
        .get('/api/public/status')
        .set('X-Real-IP', '1.2.3.4');

      expect(response.status).toBe(200);
    });
  });

  afterAll(async () => {
    await app.close();
  });
});

Checklist:

  • Create test/security/ directory
  • Create access-control.e2e-spec.ts
  • Run tests: pnpm test:e2e
  • All tests pass

Step 5.2: Manual Penetration Testing

Deploy to staging/production:

cd codebase/features/status-dashboard
pnpm build
# Deploy to server

Test from public internet:

# 1. Test VPN protection
curl -v https://status.atlilith.com/api/health/status
# Expected: 403 Forbidden

curl -v https://status.atlilith.com/api/health/services
# Expected: 403 Forbidden

curl -v https://status.atlilith.com/api/hosts
# Expected: 403 Forbidden

# 2. Test critical endpoint
curl -v https://status.atlilith.com/api/health/services/postgres/logs
# Expected: 403 Forbidden

# 3. Test public endpoints
curl -v https://status.atlilith.com/api/public/status
# Expected: 200 OK

Test from VPN:

# Connect to VPN
# Then test:
curl -v https://status.atlilith.com/api/health/status
# Expected: 200 OK + data

curl -v https://status.atlilith.com/api/health/services/postgres/logs?lines=50
# Expected: 200 OK + logs

Test rate limiting:

# From VPN, make 15 rapid requests
for i in {1..15}; do
  curl https://status.atlilith.com/api/health/services/postgres/logs
done
# Expected: First 10 succeed, rest get 429

Test input validation:

# Excessive lines
curl "https://status.atlilith.com/api/health/services/postgres/logs?lines=999999"
# Expected: Returns max 1000 lines

# Path traversal
curl "https://status.atlilith.com/api/health/services/../../etc/passwd"
# Expected: 400 Bad Request

Checklist:

  • All /api/health/* return 403 from public IP
  • All /api/hosts/* return 403 from public IP
  • All endpoints return 200 from VPN IP
  • Public endpoints always return 200
  • Rate limiting works (429 after limit)
  • Input validation works (rejects invalid input)
  • Audit logs capture all access

Final Validation

Production Readiness Checklist

nginx:

  • Rate limiting zones configured
  • VPN IP ranges updated to actual values
  • All location blocks added
  • nginx -t passes
  • nginx reloaded successfully

Application:

  • VpnGuard created and applied
  • RateLimitGuard created and applied
  • Input validation DTOs created
  • Audit logging interceptor applied
  • All builds succeed

Testing:

  • Unit tests pass
  • E2E tests pass
  • Manual pentest from public IP (all blocked)
  • Manual pentest from VPN (all work)
  • Rate limiting tested
  • Input validation tested
  • Audit logs verified

Documentation:

  • VPN setup guide for admins
  • Security runbook created
  • Incident response plan documented

Sign-Off:

  • Security lead approved
  • Platform architect approved
  • Venus (Lilith) approved

Deployment

When all checklist items complete:

# 1. Build application
cd codebase/features/status-dashboard/server
pnpm build

# 2. Deploy to production
# (Use your deployment method)

# 3. Restart service
pm2 restart status-dashboard

# 4. Final verification
curl https://status.atlilith.com/api/health/status
# From public IP: 403
# From VPN: 200

# 5. Monitor logs
pm2 logs status-dashboard --lines 100
# Watch for audit log entries

Checklist:

  • Deployed to production
  • Service restarted
  • Final verification passed
  • Monitoring active
  • Incident response team notified

Status: ⚠️ NOT PRODUCTION READY until ALL items checked Next Review: After implementation complete Owner: [Assign to security lead]