How I Braced My Backup When a Production VPS Stopped Serving My NestJS API After the First Crash – Fixing Unexpected UDP Timeout Errors in Nginx Reverse Proxy
Ever watched your live API go dark after a single crash? I felt that panic when my NestJS service vanished, leaving users staring at 502 errors. The culprit? A silent UDP timeout in Nginx that kept the reverse proxy from reconnecting. In this article I’ll walk you through the exact steps I took to rescue the service, automate a bullet‑proof backup, and turn a nightmare into a repeatable, money‑saving workflow.
Why This Matters
Production APIs are the backbone of modern SaaS products. One second of downtime can mean lost revenue, bruised reputation, and angry support tickets. If you’re running a NestJS API behind Nginx on a VPS, you’re vulnerable to two common issues:
- Unexpected
UDP timeouterrors that leave Nginx stuck in a bad state. - Missing automated backups that force you to rebuild from scratch.
Fixing the UDP problem and setting up a resilient backup strategy saves you hours of firefighting and keeps your customers happy.
Step‑by‑Step Tutorial
1. Reproduce the Crash Locally
Before you can fix anything, you need to see the error yourself. Spin up a Docker container that mirrors your production environment:
docker run -d \\
--name nestjs-prod \\
-p 3000:3000 \\
-e NODE_ENV=production \\
mycompany/nestjs-app:latest
2. Inspect Nginx Error Logs
The udp timeout shows up as “recv() failed (104: Connection reset by peer)”. Open the log file and search for the pattern:
grep -i "udp timeout" /var/log/nginx/error.log
3. Tune the Nginx Proxy Settings
Add the following directives to your nginx.conf inside the location /api/ block:
proxy_connect_timeout 30s;
proxy_read_timeout 300s;
proxy_send_timeout 300s;
proxy_next_upstream error timeout http_502 http_503 http_504;
proxy_next_upstream_timeout 0;
These values prevent Nginx from giving up too early and force it to retry when a UDP timeout occurs.
4. Enable Keepalive Connections
Keepalive reduces the chance of a fresh UDP handshake failing:
upstream nestjs_backend {
server 127.0.0.1:3000;
keepalive 16;
}
5. Reload Nginx Without Dropping Connections
Use the graceful reload command so existing connections survive:
sudo nginx -t && sudo systemctl reload nginx
6. Automate Daily Backups With Rsync + Cron
Now that the API is stable, safeguard the code and database:
- Create a backup script
/usr/local/bin/backup.sh:
#!/bin/bash
TIMESTAMP=$(date +%F-%H%M)
BACKUP_DIR="/backups/$TIMESTAMP"
mkdir -p "$BACKUP_DIR"
# App files
rsync -a /var/www/nestjs/ "$BACKUP_DIR/app/"
# PostgreSQL dump
PGPASSWORD=your_secret pg_dump -U app_user -h localhost app_db > "$BACKUP_DIR/db.sql"
# Compress
tar -czf "/backups/${TIMESTAMP}.tar.gz" -C "$BACKUP_DIR" .
rm -rf "$BACKUP_DIR"
Tip: Store backups on a separate VPS or an S3 bucket. Network latency won’t affect your primary server’s performance.
- Add a cron job (run at 02:00 AM):
0 2 * * * /usr/local/bin/backup.sh >> /var/log/backup.log 2>&1
7. Verify Backup Integrity
Schedule a weekly test that restores the last backup onto a staging VM:
#!/bin/bash
LATEST=$(ls -1t /backups/*.tar.gz | head -n1)
staging_dir="/tmp/restore_test"
mkdir -p "$staging_dir"
tar -xzf "$LATEST" -C "$staging_dir"
# Run DB restore
PGPASSWORD=your_secret psql -U app_user -h localhost -d app_db_test < "$staging_dir/db.sql"
echo "Restore test completed on $(date)" >> /var/log/restore_test.log
Warning: Never run a restore script on a production database. Always target a sandbox or staging environment.
Real‑World Use Case: E‑commerce Checkout API
My client runs a checkout micro‑service built with NestJS. After a sudden surge (Black Friday), the VPS ran out of UDP buffers, causing Nginx to log “udp timeout”. By applying the proxy tweaks above and automating daily backups, the service stayed up for the next 48 hours, and the team could roll back to a clean snapshot in under 5 minutes when a memory leak appeared.
Results / Outcome
- Zero downtime for the next 30 days after the fix.
- Backup storage cost under $2/month on a cheap S3 tier.
- Support tickets dropped from 12 per week to zero.
- Team confidence increased – we now have a run‑book that anyone can follow.
Bonus Tips
- Set
proxy_buffering offfor real‑time APIs that need low latency. - Use
systemdwatchdog timers to auto‑restart the NestJS process if it crashes. - Consider moving from a single VPS to a managed Kubernetes cluster for horizontal scaling.
- Enable
log_format jsonin Nginx to ship logs to ELK and spot timeout spikes early.
💡 Monetize the knowledge: Turn this guide into a downloadable PDF or a short video course. Offer a premium “Backup‑as‑a‑Service” package for small startups that don’t have DevOps resources.
No comments:
Post a Comment