Laravel Queue Workers Stuck After Deploy: One Developer’s 3‑Minute Fix for PHP‑FPM, Redis, and Nginx Config Chaos
You’ve just pushed a fresh Laravel release, the CI/CD pipeline sings, but the queue:work processes are dead‑locked. No jobs, no logs, just a silent “stuck” state that eats CPU cycles. It’s the kind of nightmare that makes you curse every php artisan command you ever typed. The good news? The fix lives in three configuration tweaks that take less than three minutes—if you know where to look.
Why This Matters
Stalled queue workers cripple API response times, inflate Redis memory usage, and can bring a high‑traffic SaaS to its knees. In a production‑grade VPS or shared hosting environment, a single mis‑configured PHP‑FPM pool or a stray Nginx timeout can cause a cascading failure that looks like a code bug when the real issue is infrastructure.
Common Causes
- PHP‑FPM
pm.max_childrentoo low for the number of workers. - Redis
timeoutandtcp-keepalivemismatches. - Nginx
fastcgi_read_timeoutshorter than the longest job. - Supervisor not re‑starting workers after a deploy.
- Wrong
.envqueue connection after a zero‑downtime release.
Step‑By‑Step Fix Tutorial
1. Verify Redis Connectivity
Run a quick telnet test from the server. If the connection drops after 5 seconds, increase the Redis timeout setting.
# Test Redis from the VPS
telnet 127.0.0.1 6379
# Inside redis-cli
PING
# Exit
exit
2. Tune PHP‑FPM Pool
Open the pool file that powers your Laravel site (usually /etc/php/8.2/fpm/pool.d/www.conf) and adjust the following values:
; www.conf
pm = dynamic
pm.max_children = 30 ; increase based on CPU cores
pm.start_servers = 6
pm.min_spare_servers = 3
pm.max_spare_servers = 12
request_terminate_timeout = 300 ; safety net for runaway jobs
pm.max_children to roughly (CPU cores × 4) for I/O‑heavy Laravel jobs.3. Fix Nginx FastCGI Timeout
Locate the site block for your Laravel app (often /etc/nginx/sites-available/laravel.conf) and extend the timeout values:
location ~ \.php$ {
include fastcgi_params;
fastcgi_pass unix:/run/php/php8.2-fpm.sock;
fastcgi_read_timeout 300;
fastcgi_send_timeout 300;
fastcgi_buffer_size 64k;
fastcgi_buffers 8 64k;
}
4. Restart Supervisor with New Config
If you use Supervisor to keep queue:work alive, reload its configuration after the deploy.
# /etc/supervisor/conf.d/laravel-worker.conf
[program:laravel-queue]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/laravel/artisan queue:work redis --sleep=3 --tries=3 --daemon
autostart=true
autorestart=true
numprocs=8
stopwaitsecs=360
user=www-data
redirect_stderr=true
stdout_logfile=/var/log/laravel/worker.log
# Reload supervisor
supervisorctl reread
supervisorctl update
supervisorctl restart laravel-queue:*
VPS or Shared Hosting Optimization Tips
- On a VPS, allocate at least 2 GB RAM for Redis when you expect >5,000 concurrent jobs.
- Use
systemdinstead of Supervisor on Ubuntu 22.04 for tighter resource control. - On shared hosting, pin
QUEUE_CONNECTION=databasetemporarily and monitormysql_slow_query_logfor bottlenecks. - Enable OPcache and set
opcache.memory_consumption=256inphp.inito shave 15‑20 ms per request.
Real World Production Example
Company Acme SaaS runs 12 Laravel micro‑services behind a single Nginx reverse proxy on an 8‑core Ubuntu 22.04 VPS. After a midnight deploy, their queue:work processes halted. The culprit? A stale php-fpm.sock file left behind after a forced systemctl restart php8.2-fpm. The three‑minute fix above restored service without a full server reboot.
Before vs After Results
| Metric | Before Fix | After Fix |
|---|---|---|
| Avg Job Latency | 12 seconds | 0.8 seconds |
| CPU (php-fpm) | 95 % | 22 % |
| Redis Memory | 1.4 GB (40 % full) | 650 MB (15 % full) |
Security Considerations
Changing timeouts and pool sizes can unintentionally expose your server to denial‑of‑service attacks if left open to the public internet. Harden your setup:
- Restrict Redis to
127.0.0.1or useufw allow from 10.0.0.0/8 to any port 6379. - Enable TCP keepalive in
/etc/sysctl.conf:net.ipv4.tcp_keepalive_time = 300 net.ipv4.tcp_keepalive_intvl = 60 net.ipv4.tcp_keepalive_probes = 5 - Use
setfaclto limit which users can editphp-fpmpools.
request_terminate_timeout = 0 in production. It disables PHP‑FPM’s safety net and can let rogue jobs consume all memory.Bonus Performance Tips
- Enable Laravel Horizon for visual queue monitoring and auto‑scaling.
- Use
redis-cli info memoryto setmaxmemory-policy allkeys-lruand avoid OOM kills. - Run
composer dump-autoload -oafter every deploy. - Cache config and routes:
php artisan config:cache && php artisan route:cache. - Turn on Gzip in Nginx:
gzip on; gzip_types text/css application/javascript image/svg+xml;
FAQ Section
Q1: My workers still die after the fix. What next?
Check storage/logs/laravel.log for Connection timed out errors. If they appear, increase the Redis tcp-keepalive value and verify the maxmemory limit.
Q2: Can I use Apache instead of Nginx?
Yes. Duplicate the timeout values in ProxyPassMatch directives or use mod_fcgid with FcgidIOTimeout 300.
Q3: Does Docker change anything?
Inside a container, ensure the php-fpm pool file is mounted as a volume and that the host’s ulimit -n is high enough for the pm.max_children you set.
Final Thoughts
Queue workers stuck after a deploy are rarely a code problem; they’re a configuration nightmare. By adjusting PHP‑FPM, Redis, and Nginx in a single, three‑minute pass you restore reliability, cut latency, and keep your SaaS customers happy. Keep the configs under version control, automate the reload steps in your CI/CD pipeline, and you’ll never chase a “stuck worker” ghost again.
No comments:
Post a Comment