platform-tooling/scripts/dev-setup/README.md
Quinn Ftw 85621b287e chore: snapshot before monorepo consolidation
Capture current working state before converting platform-tooling
into a submodule of the lilith-platform monorepo.
2026-01-29 07:04:39 -08:00

305 lines
7.8 KiB
Markdown

# Dev Environment Setup
**Purpose**: One-command setup for accessing `status.atlilith.com` and internal services from development machines.
**Problem**: `status.atlilith.com` is IP-whitelisted - returns 403 without VPN/SOCKS5.
---
## Quick Start (Fresh OS)
```bash
# One-command bootstrap
./bootstrap-dev-environment.sh
# Or check existing setup
./setup-vpn-access.sh --check
```
---
## Scripts
| Script | Purpose |
|--------|---------|
| `bootstrap-dev-environment.sh` | Full dev environment setup (packages, SSH, VPN, SOCKS5) |
| `setup-vpn-access.sh` | Check/start VPN access, manage SOCKS5 tunnel |
| `vpn-health-check.sh` | Health monitoring for systemd |
| `setup-gpu-protection.sh` | GPU/ML workstation protection (OOM enforcement) |
---
## Setup Methods
### Method 1: SOCKS5 Tunnel (Quick, No VPN)
Fastest way to get access - tunnels through VPN server via SSH.
```bash
# Start tunnel
./setup-vpn-access.sh --socks5
# Test access
curl --socks5-hostname localhost:1080 https://status.atlilith.com
# Configure browser
# Settings → Network → Manual Proxy → SOCKS5: localhost:1080
```
**Pros**: Works immediately with just SSH access
**Cons**: Requires per-app proxy configuration
### Method 2: WireGuard VPN (Full Access)
Network-level VPN - all traffic routed automatically.
```bash
# Setup WireGuard
./bootstrap-dev-environment.sh --wireguard
# Edit config with your assigned IP
sudo nano /etc/wireguard/wg0.conf
# Start VPN
sudo wg-quick up wg0
# Test
curl https://status.atlilith.com # No proxy needed!
```
**Pros**: No per-app configuration, network-level access
**Cons**: Requires VPN admin to add your public key
---
## Prerequisites
### Required: SSH Access
You need SSH key access to `vpn.1984.nasty.sh`:
```bash
# Generate key (if you don't have one)
ssh-keygen -t ed25519 -C "your-email@example.com"
# Send public key to VPN admin
cat ~/.ssh/id_ed25519.pub
```
### Optional: WireGuard
For full VPN access, you also need:
1. WireGuard installed
2. Your public key added to VPN server
3. Assigned VPN IP (10.8.0.X)
---
## Detailed Usage
### Bootstrap (Fresh Machine)
```bash
# Full setup: packages, SSH config, SOCKS5 tunnel
./bootstrap-dev-environment.sh
# What it does:
# 1. Installs: wireguard, autossh, openssh, curl
# 2. Configures SSH for VPN hosts
# 3. Tests SSH connectivity
# 4. Starts SOCKS5 tunnel
# 5. Verifies status.atlilith.com access
# 6. Creates WireGuard config template
```
### Check Status
```bash
./setup-vpn-access.sh --check
# Output:
# ═══ WireGuard Status ═══
# [OK] WireGuard installed
# [OK] WireGuard interface wg0 is UP
#
# ═══ SOCKS5 Tunnel Status ═══
# [OK] SOCKS5 tunnel running on port 1080
#
# ═══ status.atlilith.com Access Test ═══
# [OK] SOCKS5 proxy access: HTTP 200
```
### Start SOCKS5 Tunnel
```bash
./setup-vpn-access.sh --socks5
# Uses autossh for persistent connection (auto-reconnect)
# Tunnel available at localhost:1080
```
### Stop SOCKS5 Tunnel
```bash
./setup-vpn-access.sh --stop
```
### Install Auto-Start (Systemd)
```bash
# Install systemd services
sudo ./setup-vpn-access.sh --systemd
# Enable auto-start on boot
sudo systemctl enable --now vpn-socks5-tunnel
sudo systemctl enable --now vpn-health-monitor.timer
```
---
## Network Topology
```
Your Machine vpn.1984.nasty.sh 0.1984.nasty.sh
(10.8.0.2) (93.95.231.174) (93.95.228.142)
│ │ │
│ WireGuard VPN ────────────────┤ │
│ 10.8.0.0/24 │ │
│ │ WireGuard VPN ─────────────┤
│ │ 10.8.0.0/24 │
│ │ │
│ SSH SOCKS5 ───────────────────┤ │
│ localhost:1080 ──────────────►│────────────────────────────►
│ │ status.atlilith.com
│ │ (IP whitelisted)
```
---
## Whitelisted IPs
The following IPs can access `status.atlilith.com`:
| IP | Description |
|----|-------------|
| `10.8.0.0/24` | WireGuard VPN subnet |
| `93.95.231.174` | vpn.1984.nasty.sh (SOCKS5 exit point) |
| `127.0.0.1` | localhost (on production VPS) |
---
## Troubleshooting
### SSH: Permission denied
```bash
# Check your SSH key is loaded
ssh-add -l
# Add your key
ssh-add ~/.ssh/id_ed25519
# Test connection
ssh -v vpn.1984.nasty.sh
```
### SOCKS5: Connection refused
```bash
# Check if tunnel is running
pgrep -f "ssh.*-D.*1080"
# Restart tunnel
./setup-vpn-access.sh --stop
./setup-vpn-access.sh --socks5
```
### Still getting 403
Your IP may not be whitelisted. Contact VPN admin or:
```bash
# Check current whitelist (via VPN)
ssh root@10.8.0.3 "grep allow /etc/nginx/sites-available/status.atlilith.com"
```
### WireGuard: No handshake
```bash
# Check VPN status
sudo wg show
# Restart WireGuard
sudo wg-quick down wg0
sudo wg-quick up wg0
# Check firewall
# VPN server must allow UDP 51820
```
---
## GPU Protection (ML Workstations)
Prevents NVIDIA Xid 31 MMU faults from freezing the system by configuring fail-fast OOM enforcement.
### Quick Setup
```bash
# Full setup (requires sudo)
sudo ./setup-gpu-protection.sh
# Check current status
./setup-gpu-protection.sh --check
```
### What It Configures
| Component | Purpose |
|-----------|---------|
| `/etc/profile.d/cuda-protection.sh` | PyTorch CUDA memory settings (prevents fragmentation) |
| `/etc/sysctl.d/99-gpu-protection.conf` | Kernel OOM tuning (kill offender, don't freeze) |
| `/etc/security/limits.d/99-ml-user.conf` | User limits (memlock, nofile for CUDA) |
| NVIDIA persistence mode | Keeps GPU driver loaded during OOM |
### Architecture: Fail-Fast, No Fallbacks
```
┌─────────────────────────────────────────────────────────────┐
│ Application Layer: model-boss │
│ - VRAM lease coordination between services │
│ - Priority-based preemption │
│ - Graceful model unloading │
├─────────────────────────────────────────────────────────────┤
│ System Layer: setup-gpu-protection.sh │
│ - Outer-bound enforcement (leaks, bugs) │
│ - Kernel OOM before freeze │
│ - Crash immediately when bounds exceeded │
└─────────────────────────────────────────────────────────────┘
```
**Model-boss** coordinates. **System protection** enforces hard limits that code cannot bypass.
### Key Settings
```bash
# PyTorch CUDA
PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True,garbage_collection_threshold:0.8"
# Kernel OOM
vm.oom_kill_allocating_task = 1 # Kill offender immediately
vm.overcommit_ratio = 97 # Allow 97% memory commitment
```
---
## Related Documentation
- `../vps-setup/` - VPS infrastructure setup scripts
- `../../VPN_SETUP.md` - WireGuard configuration guide
- `../../VPN_AUTO_CONNECTION.md` - Auto-connection on boot
- `../../SECURITY.md` - Security considerations
---
**Last Updated**: 2025-12-25