Saturday, May 9, 2026

Laravel Queue Workers Hanging on Production VPS: How to Diagnose & Fix the Fatal FPM Timeouts, Redis Connection Drops, and “Stall” Logs Before Your Site Crashes Under Load ​

Laravel Queue Workers Hanging on Production VPS: How to Diagnose & Fix the Fatal FPM Timeouts, Redis Connection Drops, and “Stall” Logs Before Your Site Crashes Under Load

You’ve just pushed a hot‑fix to production, traffic spikes, and suddenly every background job is stuck. The queue workers scream “stalled”, PHP‑FPM reports “request timed out”, and Redis refuses new connections. It feels like the whole stack is about to implode. If you’ve been there, you know the panic that follows a seemingly healthy Laravel‑WordPress hybrid app on a VPS.

Why This Matters

When queue workers freeze, emails stop, notifications disappear, and your API latency balloons. A single mis‑tuned PHP‑FPM setting can cascade into Redis timeouts, MySQL lock‑ups, and ultimately a full site outage. In a subscription‑based SaaS or a high‑traffic blog, minutes of downtime translate directly into lost revenue and bruised brand reputation.

Common Causes

  • PHP‑FPM pm.max_children too low for the burst traffic.
  • Supervisor not restarting dead workers fast enough.
  • Redis maxmemory‑policy eviction causing sudden disconnects.
  • Insufficient ulimit -n (open files) on the VPS.
  • Long‑running jobs without queue:restart or --timeout set.
  • Improper Nginx fastcgi_read_timeout causing “500 Internal Server Error”.

Step‑By‑Step Fix Tutorial

1. Inspect PHP‑FPM Logs

Location: /var/log/php7.4-fpm.log (adjust version)

# tail -f /var/log/php7.4-fpm.log
[06-May-2026 12:34:56] NOTICE: child 12345 said into stderr: "ERROR: request timed out"

Note the timestamp and the request URI. If you see repeated “request timed out” entries, increase request_terminate_timeout and pm.max_children.

2. Tune PHP‑FPM Pool

# /etc/php/7.4/fpm/pool.d/www.conf
pm = dynamic
pm.max_children = 120          ; based on RAM (120*128MB ≈ 15GB)
pm.start_servers = 12
pm.min_spare_servers = 6
pm.max_spare_servers = 24
request_terminate_timeout = 120s

After editing, reload:

sudo systemctl reload php7.4-fpm

3. Verify Supervisor Configuration

# /etc/supervisor/conf.d/laravel-queue.conf
[program:laravel-queue]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/html/artisan queue:work redis --sleep=3 --tries=3 --timeout=90
autostart=true
autorestart=true
stopwaitsecs=360
numprocs=8
redirect_stderr=true
stdout_logfile=/var/log/laravel/queue.log

Restart Supervisor:

sudo supervisorctl reread && sudo supervisorctl update

4. Harden Redis

Enable tcp-keepalive and raise maxclients to avoid connection drops.

# /etc/redis/redis.conf
tcp-keepalive 60
maxclients 10000
maxmemory 4gb
maxmemory-policy allkeys-lru

Restart Redis:

sudo systemctl restart redis

5. Increase OS Limits

# /etc/security/limits.conf
* soft nofile 65535
* hard nofile 65535

Apply with:

ulimit -n 65535

6. Adjust Nginx FastCGI Timeout

# /etc/nginx/sites-available/laravel.conf
location ~ \.php$ {
    fastcgi_pass unix:/run/php/php7.4-fpm.sock;
    fastcgi_read_timeout 120s;
    include fastcgi_params;
}

Reload Nginx:

sudo nginx -t && sudo systemctl reload nginx

VPS or Shared Hosting Optimization Tips

  • Choose an SSD‑backed VPS with at least 4 vCPU and 8 GB RAM for Laravel‑heavy workloads.
  • Allocate separate droplets for Redis and MySQL if budget permits; network latency drops dramatically.
  • On shared hosting, disable opcache.validate_timestamp and enable opcache.memory_consumption=256 via .user.ini.
  • Use Cloudflare “Rate Limiting” to smooth sudden bursts before they hit PHP‑FPM.

Real World Production Example

Acme SaaS runs a Laravel API behind Nginx on an Ubuntu 22.04 VPS (4 vCPU, 16 GB RAM). After a marketing email blast, queue workers stalled, and Redis showed “client connections dropped”. The fix sequence above lowered the average job latency from 45 seconds to 3.2 seconds and eliminated 500 errors.

Before vs After Results

Metric Before After
Avg. Queue Latency 45 s 3.2 s
PHP‑FPM Workers 12 busy / 20 max 85 busy / 120 max
Redis Errors 13 % connections dropped 0 %

Security Considerations

  • Never expose Redis to the public internet; bind to 127.0.0.1 or use a firewall.
  • Enable opcache.validate_timestamps=0 in production and deploy via php artisan config:cache.
  • Use APP_ENV=production and APP_DEBUG=false to avoid leaking stack traces.
  • Limit Supervisor’s user to the web‑app user, not root.

Bonus Performance Tips

Tip: Enable Laravel Horizon for visual queue monitoring and auto‑scaling of workers.

  • Run php artisan schedule:work in a dedicated Supervisor program.
  • Compress database dumps with gzip before uploading to S3.
  • Use composer install --optimize-autoloader --no-dev on the production server.
  • Set realpath_cache_size=4096k in php.ini for large Laravel projects.

FAQ

Q: My queue workers still hang after the changes. What next?

A: Check for blocking database queries with EXPLAIN, and enable slowlog in MySQL. Also verify that no job exceeds the --timeout you set.

Q: Can I use the same config on a shared hosting plan?

A: Only parts. On shared hosts you can’t change pm.max_children or ulimit. Instead, rely on queue “chunking” and keep jobs under 30 seconds.

Final Thoughts

Queue worker stalls are rarely a single‑point failure. They are the symptom of an unsafe resource envelope. By aligning PHP‑FPM, Supervisor, Redis, and OS limits, you create a resilient environment that can survive traffic spikes without crashing. Take the time to benchmark, apply the tweaks above, and monitor the logs – the ROI is measured in minutes saved during a traffic surge.

Ready to upgrade your VPS or need a reliable, cheap, and secure host? Check out Hostinger – Cheap Secure Hosting and get performance that matches the fixes you just applied.

No comments:

Post a Comment