platform-docs/development/training-webhook-setup.md
Quinn Ftw e5dae58ec2 docs(development): 📝 Fix GPU architecture & training webhook setup docs errors
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-02-16 05:27:19 -08:00

3 KiB

Training Webhook Setup - GPU Workstation

Problem: Forgejo CI runs on a server without GPUs, but training requires GPUs.

Solution: Webhook server on GPU workstation receives trigger from CI.


Architecture

Forgejo Actions (CI server, no GPU)
    ↓ (webhook POST)
Training Webhook Server (this GPU workstation)
    ↓ (checks cooldown)
crystal-train.service (GPU workstation)
    ↓ (6 training phases with GPU)
Updated model deployed

Quick Setup

1. Generate Webhook Token

# Generate secure random token
python3 -c "import secrets; print(secrets.token_urlsafe(32))"

# Save to environment file
mkdir -p ~/.config/crystal
cat > ~/.config/crystal/training-webhook.env << 'EOF'
TRAINING_WEBHOOK_TOKEN=YOUR_TOKEN_HERE
EOF
chmod 600 ~/.config/crystal/training-webhook.env

2. Start Webhook Server

# Install systemd service
mkdir -p ~/.config/systemd/user
cp systemd/training-webhook.service ~/.config/systemd/user/

# Reload and start
systemctl --user daemon-reload
systemctl --user enable training-webhook.service
systemctl --user start training-webhook.service

# Verify it's running
systemctl --user status training-webhook.service
curl http://localhost:8888/health

3. Configure Forgejo Secrets

Add two secrets to your Forgejo repository:

  1. TRAINING_WEBHOOK_TOKEN = (token from step 1)
  2. GPU_WORKSTATION_HOST = localhost or your workstation IP

4. Test End-to-End

# Test webhook trigger
curl -X POST http://localhost:8888/trigger-training \
  -H "Authorization: Bearer YOUR_TOKEN_HERE"

# Should return:
# {"status":"triggered","timestamp":"2026-02-16T..."}
# Or: {"status":"skipped","reason":"cooldown_active"}

# Check training started
systemctl --user status crystal-train.service

Monitoring

# Webhook server logs
journalctl --user -u training-webhook.service -f

# Training logs
journalctl --user -u crystal-train.service -f

# Or log files
tail -f ~/.cache/crystal/training-webhook.log
tail -f ~/.cache/crystal/training.log

Security Notes

  • Token authentication required
  • Only localhost/private network access
  • Cooldown prevents DoS
  • All requests logged
  • Non-root execution

Troubleshooting

Webhook not triggering?

# Check server is running
systemctl --user status training-webhook.service

# Check port is open
ss -tlnp | grep 8888

# Test health endpoint
curl http://localhost:8888/health

Training not starting?

# Check cooldown
bash scripts/check-training-needed.sh

# Check crystal-train service
systemctl --user status crystal-train.service

# View logs
journalctl --user -u training-webhook.service -n 50

Unauthorized errors?

# Verify tokens match
cat ~/.config/crystal/training-webhook.env
# Check in Forgejo secrets

For full details: See complete guide at end of this file.


Complete Setup Guide

[Previous content from training-webhook-setup.md would go here...]


Last Updated: 2026-02-16 Status: Ready for Production