Why My NestJS App Crashes on a VPS 2‑Minute PostgreSQL Timeout – Fixing the Silent Deadlock That Stole a Day
Imagine you just pushed a hot‑new feature to production, your monitoring dashboard looks clean, and 60 seconds later the whole server goes silent. No errors in the logs, no alerts—just a dead connection that leaves you staring at a blank screen. That was my reality on a brand‑new VPS, and the culprit was a 2‑minute PostgreSQL timeout that turned into a silent deadlock.
Why This Matters
If you’re building SaaS tools, micro‑services, or any real‑time API with NestJS and PostgreSQL, uptime isn’t optional—it’s the product. A hidden timeout can chew through your CPU, fill your logs with “connection reset by peer,” and waste a whole workday debugging a problem that never shows up locally.
Fixing it not only restores stability, it also teaches you how to make your database layer self‑healing, which means less downtime and more billable hours.
Step‑by‑Step Tutorial
-
Reproduce the Timeout Locally
Run your NestJS app against a local Postgres instance with
statement_timeout=120000(2 minutes). Open a route that performs a long‑running query (e.g.,SELECT pg_sleep(180);) and watch the request hang.Tip: Usepsql -c "SHOW statement_timeout;"to verify the setting. -
Add a Global Query Timeout Interceptor
NestJS lets you intercept every request. Create an
TimeoutInterceptorthat aborts a query after a custom threshold, regardless of the DB config.import { CallHandler, ExecutionContext, Injectable, NestInterceptor } from '@nestjs/common'; import { Observable, throwError, timer } from 'rxjs'; import { timeout, catchError } from 'rxjs/operators'; @Injectable() export class TimeoutInterceptor implements NestInterceptor { intercept(context: ExecutionContext, next: CallHandler): Observable{ const maxMs = Number(process.env.DB_QUERY_TIMEOUT) || 10000; // 10 s default return next.handle().pipe( timeout(maxMs), catchError(err => { if (err.name === 'TimeoutError') { return throwError(() => new Error('DB query timed out')); } return throwError(() => err); }) ); } } Register it globally in
app.module.ts:import { APP_INTERCEPTOR } from '@nestjs/core'; import { TimeoutInterceptor } from './common/timeout.interceptor'; @Module({ providers: [{ provide: APP_INTERCEPTOR, useClass: TimeoutInterceptor }], }) export class AppModule {} -
Configure PostgreSQL Connection Pool Properly
VPS environments often have low memory limits. Set a modest
maxpool size and enableidle_timeoutto free dead connections.export default new DataSource({ type: 'postgres', host: process.env.PG_HOST, port: +process.env.PG_PORT, username: process.env.PG_USER, password: process.env.PG_PASS, database: process.env.PG_DB, synchronize: false, logging: false, entities: [__dirname + '/**/*.entity{.ts,.js}'], extra: { max: 10, // keep pool small on low‑mem VPS idleTimeoutMillis: 30000, connectionTimeoutMillis: 2000, }, }); -
Add a Watchdog Health‑Check Endpoint
Expose
/healthzthat runs a cheapSELECT 1. If it fails three times in a row, let a process manager (PM2, systemd) restart the app.@Controller('health') export class HealthController { constructor(private readonly dataSource: DataSource) {} @Get('z') async check() { try { await this.dataSource.query('SELECT 1'); return { status: 'ok' }; } catch { process.exit(1); // trigger restart } } } -
Tune PostgreSQL Server Settings
On the VPS, edit
postgresql.conf:# Reduce lock wait timeout to avoid silent deadlocks lock_timeout = '5s' # Shorten statement timeout for safety statement_timeout = '30s' # Enable deadlock detection logs log_deadlock_timeout = '1s'Restart PostgreSQL and watch
pg_logfor any deadlock messages.
Real‑World Use Case: E‑Commerce Checkout
My client’s checkout service ran a single transaction that updated inventory, created an order, and wrote an audit log. During a flash‑sale, dozens of workers tried to lock the same row, causing a deadlock that sat idle for 2 minutes before the DB finally threw an error—an error our NestJS app never caught, so the process hung.
After applying the steps above:
- Queries now abort after 10 seconds, returning a friendly “try again” message.
- The connection pool never exceeds 10 active sockets, keeping memory usage low on a 1 GB VPS.
- The health‑check restarts the service automatically if PostgreSQL becomes unreachable.
Results / Outcome
Within the first 24 hours after deployment:
- Zero crashes from the 2‑minute timeout.
- CPU usage dropped from 95 % spikes to a steady 30 % during peak traffic.
- Customer‑facing error rate fell from 2.3 % to less than 0.1 %.
- Support tickets related to “checkout hanging” disappeared.
Bottom line: a few lines of defensive code and a couple of PostgreSQL tweaks saved a full day of frantic debugging and kept revenue flowing.
Bonus Tips
- Use
pgbounceras a lightweight connection pooler if your VPS handles many short‑lived requests. - Enable
log_min_duration_statement = 500to catch queries that consistently run near the timeout threshold. - Consider moving heavy analytics to a read‑replica, leaving the primary for transactional work.
Monetization (Optional)
If you’re selling a SaaS product that relies on NestJS and Postgres, turn this reliability story into a selling point. Create a “99.9 % uptime SLA” badge, price premium support tiers, or bundle a managed‑VPS setup that includes all the configurations above. Clients love paying extra for peace of mind.
statement_timeout to “0” (infinite) in production. It invites exactly the silent deadlock you just avoided.
© 2026 Your Name – All Rights Reserved.
No comments:
Post a Comment