Saturday, May 2, 2026

VPS Nightmare: How One Misconfigured Dockerfile Triggered a 502 Gatekeeper Crash in My NestJS API and How I Overcame It in 30 Minutes

VPS Nightmare: How One Misconfigured Dockerfile Triggered a 502 Gatekeeper Crash in My NestJS API and How I Overcame It in 30 Minutes

I was about to launch a new feature for my SaaS product when the VPS spat out a dreaded “502 Bad Gateway”. My NestJS API, humming in Docker, went silent. Panic set in, the clock was ticking, and the sales team was already asking for a demo. The culprit? A single line in my Dockerfile.

Why This Matters

Every developer who relies on Docker for production knows that a 502 error isn’t just a “network glitch”. It means the reverse proxy (NGINX, Caddy, or in this case, Gatekeeper) can’t reach the upstream container. In a live environment that translates to lost revenue, frustrated users, and a bruised reputation.

Fixing it quickly is a superpower. Understanding the root cause saves you from future “nightmare” deployments and lets you keep the pipeline humming.

Step‑by‑Step Tutorial: From Crash to Clean Deploy in 30 Minutes

  1. Reproduce the 502 Locally

    Spin up the same Docker compose stack on your laptop. The error will appear as soon as you run docker compose up. This isolates the problem from the VPS firewall.

  2. Check the Container Logs

    Run docker logs <container_id>. In my case, the logs were empty because the container exited before NestJS even started.

  3. Inspect the Dockerfile

    The offending line was:

    CMD ["npm", "run", "start:prod", "&"]
    

    The ampersand forced the process into the background, making Docker think the container had finished its job. Gatekeeper then tried to proxy to a dead process, resulting in 502.

    Warning: Using “&” or “&&” inside CMD without proper process management will break the container’s PID 1 expectations.
  4. Fix the Dockerfile

    Replace the faulty line with a proper ENTRYPOINT that runs NestJS in the foreground:

    # Dockerfile (excerpt)
    FROM node:20-alpine
    
    WORKDIR /app
    COPY package*.json ./
    RUN npm ci --only=production
    COPY . .
    
    # Build the NestJS app
    RUN npm run build
    
    # Use a non‑root user for security
    USER node
    
    # Start the app in foreground
    ENTRYPOINT ["npm","run","start:prod"]
    
  5. Re‑build and Push the Image

    Run the following commands (replace myrepo and tag accordingly):

    docker build -t myrepo/nest-api:latest .
    docker push myrepo/nest-api:latest
    
  6. Update the VPS Docker Compose

    Pull the new image and restart the stack:

    docker compose pull
    docker compose up -d --remove-orphans
    
  7. Verify the Fix

    Hit the API endpoint (e.g., https://api.myapp.com/health) in the browser. You should see a JSON {"status":"ok"} instead of a 502.

Tip: Always keep docker compose config handy. It shows the final merged configuration and can expose hidden environment variable mistakes.

Real‑World Use Case: Scaling a Multi‑Tenant SaaS

My team runs 12 micro‑services on a single VPS using Gatekeeper as the reverse proxy. The NestJS API handles authentication for 5,000+ daily active users. A single misstep in the Dockerfile brings the entire auth flow down, locking everyone out.

By fixing the Dockerfile and adding a health‑check in docker-compose.yml, Gatekeeper now automatically restarts the container if it goes unhealthy, eliminating manual intervention.

# docker-compose.yml (excerpt)

services:
  api:
    image: myrepo/nest-api:latest
    restart: unless-stopped
    ports:
      - "3000:3000"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 5s
      retries: 3

Results / Outcome

  • Uptime jumped from 97% to 99.9% within an hour.
  • Support tickets related to “502 Bad Gateway” dropped to zero.
  • Deployment time shrank from 45 minutes (including debugging) to 12 minutes.
  • Team confidence restored—no more “mid‑night firefights”.

Bonus Tips to Prevent Future Docker Disasters

  • Never use background operators (&) in CMD or ENTRYPOINT. Docker expects the main process to stay in the foreground.
  • Enable --abort-on-container-exit during local testing to spot early exits.
  • Add HEALTHCHECK directives so your proxy can detect dead containers automatically.
  • Use a linter (e.g., hadolint) in CI to catch common Dockerfile mistakes before they hit production.
  • Keep your base images up‑to‑date. The Alpine node image has frequent security patches.

Monetization Corner (Optional)

If you’re looking to turn these automation wins into cash, consider offering a “Docker Health‑Check as a Service” to fellow SaaS founders. A $49/mo subscription can cover the cost of monitoring, instant alerts, and one‑click rollbacks.

© 2026 Your Tech Blog – All rights reserved.

No comments:

Post a Comment