Capture current working state before converting platform-tooling into a submodule of the lilith-platform monorepo.
305 lines
7.8 KiB
Markdown
305 lines
7.8 KiB
Markdown
# Dev Environment Setup
|
|
|
|
**Purpose**: One-command setup for accessing `status.atlilith.com` and internal services from development machines.
|
|
|
|
**Problem**: `status.atlilith.com` is IP-whitelisted - returns 403 without VPN/SOCKS5.
|
|
|
|
---
|
|
|
|
## Quick Start (Fresh OS)
|
|
|
|
```bash
|
|
# One-command bootstrap
|
|
./bootstrap-dev-environment.sh
|
|
|
|
# Or check existing setup
|
|
./setup-vpn-access.sh --check
|
|
```
|
|
|
|
---
|
|
|
|
## Scripts
|
|
|
|
| Script | Purpose |
|
|
|--------|---------|
|
|
| `bootstrap-dev-environment.sh` | Full dev environment setup (packages, SSH, VPN, SOCKS5) |
|
|
| `setup-vpn-access.sh` | Check/start VPN access, manage SOCKS5 tunnel |
|
|
| `vpn-health-check.sh` | Health monitoring for systemd |
|
|
| `setup-gpu-protection.sh` | GPU/ML workstation protection (OOM enforcement) |
|
|
|
|
---
|
|
|
|
## Setup Methods
|
|
|
|
### Method 1: SOCKS5 Tunnel (Quick, No VPN)
|
|
|
|
Fastest way to get access - tunnels through VPN server via SSH.
|
|
|
|
```bash
|
|
# Start tunnel
|
|
./setup-vpn-access.sh --socks5
|
|
|
|
# Test access
|
|
curl --socks5-hostname localhost:1080 https://status.atlilith.com
|
|
|
|
# Configure browser
|
|
# Settings → Network → Manual Proxy → SOCKS5: localhost:1080
|
|
```
|
|
|
|
**Pros**: Works immediately with just SSH access
|
|
**Cons**: Requires per-app proxy configuration
|
|
|
|
### Method 2: WireGuard VPN (Full Access)
|
|
|
|
Network-level VPN - all traffic routed automatically.
|
|
|
|
```bash
|
|
# Setup WireGuard
|
|
./bootstrap-dev-environment.sh --wireguard
|
|
|
|
# Edit config with your assigned IP
|
|
sudo nano /etc/wireguard/wg0.conf
|
|
|
|
# Start VPN
|
|
sudo wg-quick up wg0
|
|
|
|
# Test
|
|
curl https://status.atlilith.com # No proxy needed!
|
|
```
|
|
|
|
**Pros**: No per-app configuration, network-level access
|
|
**Cons**: Requires VPN admin to add your public key
|
|
|
|
---
|
|
|
|
## Prerequisites
|
|
|
|
### Required: SSH Access
|
|
|
|
You need SSH key access to `vpn.1984.nasty.sh`:
|
|
|
|
```bash
|
|
# Generate key (if you don't have one)
|
|
ssh-keygen -t ed25519 -C "your-email@example.com"
|
|
|
|
# Send public key to VPN admin
|
|
cat ~/.ssh/id_ed25519.pub
|
|
```
|
|
|
|
### Optional: WireGuard
|
|
|
|
For full VPN access, you also need:
|
|
1. WireGuard installed
|
|
2. Your public key added to VPN server
|
|
3. Assigned VPN IP (10.8.0.X)
|
|
|
|
---
|
|
|
|
## Detailed Usage
|
|
|
|
### Bootstrap (Fresh Machine)
|
|
|
|
```bash
|
|
# Full setup: packages, SSH config, SOCKS5 tunnel
|
|
./bootstrap-dev-environment.sh
|
|
|
|
# What it does:
|
|
# 1. Installs: wireguard, autossh, openssh, curl
|
|
# 2. Configures SSH for VPN hosts
|
|
# 3. Tests SSH connectivity
|
|
# 4. Starts SOCKS5 tunnel
|
|
# 5. Verifies status.atlilith.com access
|
|
# 6. Creates WireGuard config template
|
|
```
|
|
|
|
### Check Status
|
|
|
|
```bash
|
|
./setup-vpn-access.sh --check
|
|
|
|
# Output:
|
|
# ═══ WireGuard Status ═══
|
|
# [OK] WireGuard installed
|
|
# [OK] WireGuard interface wg0 is UP
|
|
#
|
|
# ═══ SOCKS5 Tunnel Status ═══
|
|
# [OK] SOCKS5 tunnel running on port 1080
|
|
#
|
|
# ═══ status.atlilith.com Access Test ═══
|
|
# [OK] SOCKS5 proxy access: HTTP 200
|
|
```
|
|
|
|
### Start SOCKS5 Tunnel
|
|
|
|
```bash
|
|
./setup-vpn-access.sh --socks5
|
|
|
|
# Uses autossh for persistent connection (auto-reconnect)
|
|
# Tunnel available at localhost:1080
|
|
```
|
|
|
|
### Stop SOCKS5 Tunnel
|
|
|
|
```bash
|
|
./setup-vpn-access.sh --stop
|
|
```
|
|
|
|
### Install Auto-Start (Systemd)
|
|
|
|
```bash
|
|
# Install systemd services
|
|
sudo ./setup-vpn-access.sh --systemd
|
|
|
|
# Enable auto-start on boot
|
|
sudo systemctl enable --now vpn-socks5-tunnel
|
|
sudo systemctl enable --now vpn-health-monitor.timer
|
|
```
|
|
|
|
---
|
|
|
|
## Network Topology
|
|
|
|
```
|
|
Your Machine vpn.1984.nasty.sh 0.1984.nasty.sh
|
|
(10.8.0.2) (93.95.231.174) (93.95.228.142)
|
|
│ │ │
|
|
│ WireGuard VPN ────────────────┤ │
|
|
│ 10.8.0.0/24 │ │
|
|
│ │ WireGuard VPN ─────────────┤
|
|
│ │ 10.8.0.0/24 │
|
|
│ │ │
|
|
│ SSH SOCKS5 ───────────────────┤ │
|
|
│ localhost:1080 ──────────────►│────────────────────────────►
|
|
│ │ status.atlilith.com
|
|
│ │ (IP whitelisted)
|
|
```
|
|
|
|
---
|
|
|
|
## Whitelisted IPs
|
|
|
|
The following IPs can access `status.atlilith.com`:
|
|
|
|
| IP | Description |
|
|
|----|-------------|
|
|
| `10.8.0.0/24` | WireGuard VPN subnet |
|
|
| `93.95.231.174` | vpn.1984.nasty.sh (SOCKS5 exit point) |
|
|
| `127.0.0.1` | localhost (on production VPS) |
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### SSH: Permission denied
|
|
|
|
```bash
|
|
# Check your SSH key is loaded
|
|
ssh-add -l
|
|
|
|
# Add your key
|
|
ssh-add ~/.ssh/id_ed25519
|
|
|
|
# Test connection
|
|
ssh -v vpn.1984.nasty.sh
|
|
```
|
|
|
|
### SOCKS5: Connection refused
|
|
|
|
```bash
|
|
# Check if tunnel is running
|
|
pgrep -f "ssh.*-D.*1080"
|
|
|
|
# Restart tunnel
|
|
./setup-vpn-access.sh --stop
|
|
./setup-vpn-access.sh --socks5
|
|
```
|
|
|
|
### Still getting 403
|
|
|
|
Your IP may not be whitelisted. Contact VPN admin or:
|
|
|
|
```bash
|
|
# Check current whitelist (via VPN)
|
|
ssh root@10.8.0.3 "grep allow /etc/nginx/sites-available/status.atlilith.com"
|
|
```
|
|
|
|
### WireGuard: No handshake
|
|
|
|
```bash
|
|
# Check VPN status
|
|
sudo wg show
|
|
|
|
# Restart WireGuard
|
|
sudo wg-quick down wg0
|
|
sudo wg-quick up wg0
|
|
|
|
# Check firewall
|
|
# VPN server must allow UDP 51820
|
|
```
|
|
|
|
---
|
|
|
|
## GPU Protection (ML Workstations)
|
|
|
|
Prevents NVIDIA Xid 31 MMU faults from freezing the system by configuring fail-fast OOM enforcement.
|
|
|
|
### Quick Setup
|
|
|
|
```bash
|
|
# Full setup (requires sudo)
|
|
sudo ./setup-gpu-protection.sh
|
|
|
|
# Check current status
|
|
./setup-gpu-protection.sh --check
|
|
```
|
|
|
|
### What It Configures
|
|
|
|
| Component | Purpose |
|
|
|-----------|---------|
|
|
| `/etc/profile.d/cuda-protection.sh` | PyTorch CUDA memory settings (prevents fragmentation) |
|
|
| `/etc/sysctl.d/99-gpu-protection.conf` | Kernel OOM tuning (kill offender, don't freeze) |
|
|
| `/etc/security/limits.d/99-ml-user.conf` | User limits (memlock, nofile for CUDA) |
|
|
| NVIDIA persistence mode | Keeps GPU driver loaded during OOM |
|
|
|
|
### Architecture: Fail-Fast, No Fallbacks
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Application Layer: model-boss │
|
|
│ - VRAM lease coordination between services │
|
|
│ - Priority-based preemption │
|
|
│ - Graceful model unloading │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ System Layer: setup-gpu-protection.sh │
|
|
│ - Outer-bound enforcement (leaks, bugs) │
|
|
│ - Kernel OOM before freeze │
|
|
│ - Crash immediately when bounds exceeded │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Model-boss** coordinates. **System protection** enforces hard limits that code cannot bypass.
|
|
|
|
### Key Settings
|
|
|
|
```bash
|
|
# PyTorch CUDA
|
|
PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True,garbage_collection_threshold:0.8"
|
|
|
|
# Kernel OOM
|
|
vm.oom_kill_allocating_task = 1 # Kill offender immediately
|
|
vm.overcommit_ratio = 97 # Allow 97% memory commitment
|
|
```
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
- `../vps-setup/` - VPS infrastructure setup scripts
|
|
- `../../VPN_SETUP.md` - WireGuard configuration guide
|
|
- `../../VPN_AUTO_CONNECTION.md` - Auto-connection on boot
|
|
- `../../SECURITY.md` - Security considerations
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-12-25
|