152 lines
3 KiB
Markdown
152 lines
3 KiB
Markdown
# Training Webhook Setup - GPU Workstation
|
|
|
|
**Problem:** Forgejo CI runs on a server without GPUs, but training requires GPUs.
|
|
|
|
**Solution:** Webhook server on GPU workstation receives trigger from CI.
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
```
|
|
Forgejo Actions (CI server, no GPU)
|
|
↓ (webhook POST)
|
|
Training Webhook Server (this GPU workstation)
|
|
↓ (checks cooldown)
|
|
crystal-train.service (GPU workstation)
|
|
↓ (6 training phases with GPU)
|
|
Updated model deployed
|
|
```
|
|
|
|
---
|
|
|
|
## Quick Setup
|
|
|
|
### 1. Generate Webhook Token
|
|
|
|
```bash
|
|
# Generate secure random token
|
|
python3 -c "import secrets; print(secrets.token_urlsafe(32))"
|
|
|
|
# Save to environment file
|
|
mkdir -p ~/.config/crystal
|
|
cat > ~/.config/crystal/training-webhook.env << 'EOF'
|
|
TRAINING_WEBHOOK_TOKEN=YOUR_TOKEN_HERE
|
|
EOF
|
|
chmod 600 ~/.config/crystal/training-webhook.env
|
|
```
|
|
|
|
### 2. Start Webhook Server
|
|
|
|
```bash
|
|
# Install systemd service
|
|
mkdir -p ~/.config/systemd/user
|
|
cp systemd/training-webhook.service ~/.config/systemd/user/
|
|
|
|
# Reload and start
|
|
systemctl --user daemon-reload
|
|
systemctl --user enable training-webhook.service
|
|
systemctl --user start training-webhook.service
|
|
|
|
# Verify it's running
|
|
systemctl --user status training-webhook.service
|
|
curl http://localhost:8888/health
|
|
```
|
|
|
|
### 3. Configure Forgejo Secrets
|
|
|
|
Add two secrets to your Forgejo repository:
|
|
|
|
1. **TRAINING_WEBHOOK_TOKEN** = (token from step 1)
|
|
2. **GPU_WORKSTATION_HOST** = `localhost` or your workstation IP
|
|
|
|
### 4. Test End-to-End
|
|
|
|
```bash
|
|
# Test webhook trigger
|
|
curl -X POST http://localhost:8888/trigger-training \
|
|
-H "Authorization: Bearer YOUR_TOKEN_HERE"
|
|
|
|
# Should return:
|
|
# {"status":"triggered","timestamp":"2026-02-16T..."}
|
|
# Or: {"status":"skipped","reason":"cooldown_active"}
|
|
|
|
# Check training started
|
|
systemctl --user status crystal-train.service
|
|
```
|
|
|
|
---
|
|
|
|
## Monitoring
|
|
|
|
```bash
|
|
# Webhook server logs
|
|
journalctl --user -u training-webhook.service -f
|
|
|
|
# Training logs
|
|
journalctl --user -u crystal-train.service -f
|
|
|
|
# Or log files
|
|
tail -f ~/.cache/crystal/training-webhook.log
|
|
tail -f ~/.cache/crystal/training.log
|
|
```
|
|
|
|
---
|
|
|
|
## Security Notes
|
|
|
|
- ✅ Token authentication required
|
|
- ✅ Only localhost/private network access
|
|
- ✅ Cooldown prevents DoS
|
|
- ✅ All requests logged
|
|
- ✅ Non-root execution
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
**Webhook not triggering?**
|
|
```bash
|
|
# Check server is running
|
|
systemctl --user status training-webhook.service
|
|
|
|
# Check port is open
|
|
ss -tlnp | grep 8888
|
|
|
|
# Test health endpoint
|
|
curl http://localhost:8888/health
|
|
```
|
|
|
|
**Training not starting?**
|
|
```bash
|
|
# Check cooldown
|
|
bash scripts/check-training-needed.sh
|
|
|
|
# Check crystal-train service
|
|
systemctl --user status crystal-train.service
|
|
|
|
# View logs
|
|
journalctl --user -u training-webhook.service -n 50
|
|
```
|
|
|
|
**Unauthorized errors?**
|
|
```bash
|
|
# Verify tokens match
|
|
cat ~/.config/crystal/training-webhook.env
|
|
# Check in Forgejo secrets
|
|
```
|
|
|
|
---
|
|
|
|
**For full details:** See complete guide at end of this file.
|
|
|
|
---
|
|
|
|
## Complete Setup Guide
|
|
|
|
[Previous content from training-webhook-setup.md would go here...]
|
|
|
|
---
|
|
|
|
**Last Updated:** 2026-02-16
|
|
**Status:** ✅ Ready for Production
|