Saturday, May 2, 2026

How I Battled the NestJS 504 Gateway Timeout on a Shared VPS: One Midnight Debugging Session That Saved My Production Codebase

How I Battled the NestJS 504 Gateway Timeout on a Shared VPS: One Midnight Debugging Session That Saved My Production Codebase

Ever stared at a blank terminal at 2 AM, watching the same “504 Gateway Timeout” flash over and over? I have. My API, built with NestJS, stopped answering requests just when my SaaS startup needed it most. Below you’ll see exactly how I ripped that timeout apart, step by step, and turned a panic‑filled night into a faster, more resilient production stack.

Why This Matters

A 504 error isn’t just a polite “sorry, we’re busy.” On a shared VPS, it signals that your Node process exhausted resources, your reverse proxy (NGINX, Caddy, etc.) gave up, or a network glitch stalled the request. In a real‑world product, every timeout is a lost customer, a hit to SEO, and a potential revenue dip.

If you’re running NestJS on a budget VPS, you’ll face the same limits: CPU throttling, memory caps, and limited concurrent connections. Knowing how to diagnose and fix a 504 can keep your users happy and your bottom line healthy.

The Midnight Debugging Session – Step‑by‑Step

1. Replicate the Timeout Locally

First, I needed a reproducible test. I set up a curl loop that bombarded the endpoint with 200 simultaneous requests. The pattern was clear: after ~120 requests, the VPS responded with 504.

for i in {1..200}; do
  curl -s -o /dev/null -w "%{http_code}\n" https://api.myapp.com/users &
done
wait

2. Check NGINX Timeouts

On the VPS, my reverse proxy was NGINX. Its default proxy_read_timeout is 60 seconds, but my heavy query sometimes needed 90 seconds.

Tip: Setting both proxy_connect_timeout and proxy_send_timeout to the same value prevents mismatched timeouts.

# /etc/nginx/conf.d/api.conf
server {
  listen 80;
  location / {
    proxy_pass http://localhost:3000;
    proxy_connect_timeout 120s;
    proxy_send_timeout 120s;
    proxy_read_timeout 120s;
  }
}

3. Profile NestJS CPU & Memory

I SSHed into the VPS and ran top while the load test was active. The Node process spiked to 98% CPU and 1.4 GB of RAM (the VPS only had 2 GB total).

Warning: Ignoring high CPU on a shared VPS can cause your container to be throttled, leading to more 504s.

4. Optimize the Bottleneck Query

The culprit was a massive JOIN on a PostgreSQL table with 1.7 M rows. Adding an index and limiting selected columns shaved 1.3 seconds off the query.

// src/users/users.service.ts
async findHeavy() {
  return this.repo.createQueryBuilder('u')
    .select(['u.id', 'u.email', 'p.profile_picture'])
    .leftJoin('u.profile', 'p')
    .where('u.is_active = :active', { active: true })
    .orderBy('u.created_at', 'DESC')
    .limit(100)
    .getMany();
}

5. Enable NestJS Built‑in Rate Limiting

To protect the server from future spikes, I added @nestjs/throttler. This throttles each IP to 30 requests per minute, giving the app breathing room.

// app.module.ts
import { ThrottlerModule } from '@nestjs/throttler';

@Module({
  imports: [
    ThrottlerModule.forRoot({
      ttl: 60,
      limit: 30,
    }),
    // other imports …
  ],
})
export class AppModule {}

6. Deploy a Node Process Manager (PM2)

PM2 restarts the app automatically if it crashes and keeps a small memory footprint by enabling --max-memory-restart. This guard prevented the process from being OOM‑killed.

# Install PM2 globally
npm i -g pm2

# Start NestJS with memory limit
pm2 start dist/main.js --name api --max-memory-restart 1500M

# Save the process list
pm2 save

Real‑World Use Case: SaaS Billing API

My startup’s billing micro‑service runs on the same VPS. After the fixes, the endpoint that creates invoices now averages 320 ms, even under a 100‑request burst. That translates to over 2,000 saved seconds per day and a noticeable drop in churn because customers no longer see “Timeout” errors during checkout.

Results / Outcome

  • 504 errors eliminated in production for 30+ consecutive days.
  • CPU usage stabilized at 45% under load.
  • Memory consumption dropped from 1.4 GB to 850 MB.
  • Revenue impact: $1,200/month saved by preventing abortive checkout flows.

Bonus Tips for Future‑Proofing

  • Enable HTTP/2 on NGINX – reduces latency for API calls.
  • Use Redis Cache for repeat‑read queries; a 5‑second DB call became 30 ms.
  • Monitor with Grafana + Prometheus – set alerts for CPU > 80% or response time > 500 ms.
  • Consider Light‑weight VPS upgrades (e.g., 2 vCPU) once you consistently hit 70% CPU.

Monetization (Optional)

If you found this walkthrough helpful, check out my DevTools bundle – a curated set of monitoring scripts, Dockerfiles, and NGINX configs that shave minutes off any Node deployment. Use code CODEMASTER10 for a 10% discount.

“The difference between a night‑mare and a minor hiccup is knowing the exact command to run at 2 AM.” – Yours truly

© 2026 YourName.dev – All rights reserved.

No comments:

Post a Comment