Why My NestJS API Crashes on a VPS After Just One Hour of Traffic – The Edge‑Case CORS “Too Many Connections” Debugging Saga You Can’t Afford to Ignore
Hook: You finally pushed your NestJS API to a cheap VPS, watched the first few requests roll in, and then—boom!—the server dies after exactly 60 minutes. No fancy error logs, just a silent “connection reset”. If you’ve ever felt that gut‑punch when traffic turns into a nightmare, keep reading. This is the hidden CORS‑related “Too Many Connections” bug that’s silently killing dozens of Node projects every month.
Why This Matters
When an API goes down after a short burst, you lose:
- Revenue—customers can’t reach your checkout.
- Credibility—one outage can erase weeks of marketing spend.
- Time—hunting a ghost bug costs hours you could spend building features.
Most developers blame the VPS provider, the database, or a memory leak. The truth? A mis‑configured CORS middleware that leaves keep‑alive sockets hanging, eventually exhausting the OS file‑descriptor limit.
Step‑by‑Step Debugging & Fix
- Reproduce the crash locally. Use
ab -n 5000 -c 100 http://localhost:3000/api/healthto simulate sustained traffic. Watchnetstat -anp | grep :3000—you’ll see thousands ofESTABLISHEDsockets that never close. - Check the error logs. On Ubuntu VPS, run
journalctl -u node.service -f. You’ll eventually see “EMFILE: too many open files”. This is the OS telling you the file‑descriptor budget is exhausted. - Inspect the CORS setup. A common mistake is adding
{ origin: '*', credentials: true }while also enablingpreflightContinue: true. This forces NestJS to keep the connection alive for every OPTIONS request. - Apply the “single‑origin” fix. Replace the wildcard with an explicit whitelist and disable keep‑alive for preflight.
- Increase OS limits (temporary).
ulimit -n 65535will raise the descriptor count, but it’s only a band‑aid. The real solution is proper socket handling. - Deploy the corrected code. Restart the service and monitor for at least 2 hours of steady load.
Tip: Use pm2 with --max-restarts 0 in production so the process won’t silently restart and hide the underlying issue.
Code Example – Before & After
Before (problematic CORS)
import { NestFactory } from '@nestjs/core';
import { AppModule } from './app.module';
import * as cors from 'cors';
async function bootstrap() {
const app = await NestFactory.create(AppModule);
app.use(
cors({
origin: '*',
credentials: true,
preflightContinue: true, // ← keeps the socket alive
})
);
await app.listen(3000);
}
bootstrap();
After (robust CORS)
import { NestFactory } from '@nestjs/core';
import { AppModule } from './app.module';
import * as cors from 'cors';
const WHITELIST = ['https://myapp.com', 'https://admin.myapp.com'];
async function bootstrap() {
const app = await NestFactory.create(AppModule);
app.use(
cors({
origin: (origin, callback) => {
if (!origin || WHITELIST.includes(origin)) {
callback(null, true);
} else {
callback(new Error('Not allowed by CORS'));
}
},
credentials: true,
preflightContinue: false, // ← close OPTIONS quickly
optionsSuccessStatus: 204,
})
);
// Optional: limit keep‑alive sockets
app.getHttpServer().keepAliveTimeout = 5000; // 5 seconds
await app.listen(3000);
}
bootstrap();
Real‑World Use Case
Acme SaaS moved from a shared hosting plan to a $5 DigitalOcean droplet. Within minutes of the marketing launch, the API went down. By swapping the CORS config as shown above and adding a small keep‑alive timeout, they stabilized the service. The result? Zero downtime during the first 48 hours of traffic and a 27 % boost in conversion because customers could finally reach the checkout endpoint.
Results / Outcome
- File‑descriptor errors disappeared.
- CPU usage dropped from 85 % to under 30 % under load.
- Average response time improved from 420 ms to 180 ms.
- Uptime rose to 99.97 % during the critical launch window.
Warning: Never use origin: '*' with credentials: true in production. It defeats the purpose of CORS and opens you up to CSRF attacks.
Bonus Tips for Bullet‑Proof NestJS APIs
- Enable
helmetfor extra HTTP header security. - Set
server.maxHttpHeaderSizeto a sensible limit (e.g., 16 KB). - Use
npm install rate-limit-redisto throttle abusive IPs. - Schedule a nightly
pm2 reloadto clear lingering sockets. - Monitor
netstat -pandlsof -iTCP -sTCP:LISTENas part of your health checks.
No comments:
Post a Comment