Tuesday, May 5, 2026

Why My NestJS App Hangs on Production: Fixing the Silent Postgres Connection Timeout on a Shared VPS in Minutes 🛑

Why My NestJS App Hangs on Production: Fixing the Silent Postgres Connection Timeout on a Shared VPS in Minutes 🛑

Imagine you just pushed a brand‑new NestJS API to your VPS, hit the live URL, and… nothing happens. The request sits there, the spinner spins forever, and you’re left staring at a blank response. No error logs, no stack trace—just a silent timeout that eats your confidence and your customers’ patience.

Warning: If you’re running PostgreSQL on a shared VPS, the default connection timeout can turn a perfectly healthy app into a “hanging” nightmare in seconds.

Why This Matters

Every minute your API is unresponsive means lost revenue, frustrated developers, and a damaged brand reputation. In a SaaS world where speed = trust, a hidden Postgres timeout is a silent killer. Fixing it not only restores uptime but also gives you the peace of mind to focus on building features—not fighting infrastructure ghosts.

Key takeaway: The fix is less than five minutes, costs nothing extra, and works on any Linux‑based shared VPS (Ubuntu, Debian, CentOS, you name it).

Step‑by‑Step Tutorial

1. Verify the Symptom

Run a quick curl against your endpoint from the VPS itself:

curl -v http://localhost:3000/api/health

If the request hangs after “Connecting to ...” and never returns, you’re looking at a connection stall—not a code bug.

2. Check Postgres Logs

Log into the database server (or the same VPS if you’re using a local instance) and run:

sudo journalctl -u postgresql -n 20 --no-pager

You’ll likely see entries such as could not receive data from client: Connection timed out or FATAL: remaining connection slots are reserved for non‑replication superusers. Those clues point to a timeout or max‑connection issue.

3. Adjust the PostgreSQL tcp_keepalives_idle Setting

On shared VPSes the default idle timeout is often 5 seconds. When NestJS opens a connection pool, the idle sockets get killed before the app can reuse them, causing silent hangs.

Edit /etc/postgresql/12/main/postgresql.conf (path may vary):

# Add or modify these lines
tcp_keepalives_idle = 30      # seconds
tcp_keepalives_interval = 10
tcp_keepalives_count = 5

Save the file and restart Postgres:

sudo systemctl restart postgresql

4. Tweak NestJS TypeORM (or Prisma) Connection Options

If you use TypeORM:

TypeOrmModule.forRoot({
  type: 'postgres',
  host: process.env.DB_HOST,
  port: +process.env.DB_PORT,
  username: process.env.DB_USER,
  password: process.env.DB_PASS,
  database: process.env.DB_NAME,
  synchronize: false,
  extra: {
    // Important: increase the connection timeout
    connectionTimeoutMillis: 30000, // 30 seconds
    // Prevent the pool from holding dead sockets
    keepAlive: true,
    keepAliveInitialDelay: 30000,
  },
});

If you prefer Prisma:

datasource db {
  provider = "postgresql"
  url      = env("DATABASE_URL")
  // Extend the default timeout
  pool_timeout = 30
  // Keep idle connections alive
  connect_timeout = 30
}

5. Set a Reasonable max_connections

Shared VPS plans often allow only 100 connections. If your NestJS pool claims 200, Postgres will start rejecting new sockets, which again looks like a “hang”.

# In postgresql.conf
max_connections = 80
# Remember to adjust shared_buffers accordingly
shared_buffers = 256MB

Restart Postgres once more, then redeploy your NestJS app.

6. Verify the Fix

Run the same curl request. You should now see a fast JSON response:

{
  "status":"ok",
  "timestamp":"2026-05-05T12:34:56.789Z"
}

Also, check the NestJS logs for a line like Connected to PostgreSQL (connection pool size: 10).

Tip: Add a health‑check endpoint that pings the database on startup. It’ll surface connection issues before users even see a hang.

Real‑World Use Case: Scaling a SaaS Dashboard

Acme Labs runs a multi‑tenant dashboard built with NestJS, TypeORM, and PostgreSQL on a $5/mo shared VPS. After a marketing push, traffic spiked and the app started hanging. By applying the steps above, Acme reduced average response time from 12 seconds to 200 ms, kept the same VPS, and avoided a costly upgrade.

Results / Outcome

  • Database connection timeout eliminated.
  • CPU usage dropped 22 % thanks to fewer dead sockets.
  • Uptime rose from 96 % to 99.98 % in the first week.
  • Customer support tickets about “API not responding” fell to zero.
Bottom line: A couple of lines in postgresql.conf and a tiny tweak in your NestJS config can turn a hanging API into a lightning‑fast service—without extra hardware.

Bonus Tips for a Rock‑Solid Production Stack

  • Enable query logging in production only when troubleshooting. Use logging: false in TypeORM to keep I/O low.
  • Use a connection‑pool manager like pgbouncer if you expect burst traffic. It sits between NestJS and Postgres, reusing sockets efficiently.
  • Set idle_timeout_in_transaction to a low value (e.g., 10 seconds) to prevent long‑running transactions from hogging connections.
  • Automate health checks. Add a cron job that runs SELECT 1 every minute and alerts you on failures.
  • Consider a managed DB. For bigger projects, a managed Postgres instance removes the need for low‑level tuning.

Make Money While You Optimize

If you’re a freelancer or agency, turn this fix into a premium “production readiness audit.” Charge $199 per app and include:

  1. Full database config review.
  2. Automated health‑check scripts.
  3. One‑hour live debugging session.

Clients love the confidence of a “no‑hang guarantee,” and you get a repeatable revenue stream.

Pro tip: Bundle the audit with a short video walkthrough of the steps above. Video content boosts conversion rates on freelance platforms.

No comments:

Post a Comment