Laravel Queue Workers Crashing on Nginx + PHP‑FPM: Why Redis Timeouts and Mis‑Set File Permissions Are Ruining Your Deploys
If you’ve ever watched a php artisan queue:work process die silently while your Redis dashboard lights up with “timeout” errors, you know the feeling – frustration, wasted hours, and a looming deadline. In a production‑grade Laravel app running behind Nginx and PHP‑FPM, one tiny permission slip or a mis‑configured Redis timeout can cascade into a full‑blown deployment nightmare.
Why This Matters
Queue workers are the heart of any modern SaaS or WordPress‑backed Laravel API. When they fail:
- Background emails never send.
- Webhook retries pile up.
- Cache warm‑ups stall, causing API latency spikes.
All of these translate directly to lost revenue, higher churn, and angry users. Fixing the root cause – not just restarting the worker – is the only way to keep your VPS or shared host stable.
Common Causes
1. Redis connection timeout
Laravel’s default timeout of 5 seconds is often too low for high‑load environments, especially when the Redis instance is on a separate droplet behind Cloudflare.
2. Incorrect file and directory permissions
Supervisor spawns workers under the www-data user, but if storage or bootstrap/cache are owned by root, the worker crashes with “Permission denied”.
3. PHP‑FPM pool limits
Too few pm.max_children or aggressive request_terminate_timeout will kill long‑running jobs mid‑process.
4. Supervisor mis‑configuration
Missing stdout_logfile or an incorrect user directive hides error output, making debugging a nightmare.
maxmemory-policy to noeviction on a production queue. It will cause jobs to sit forever when memory fills up.
Step‑By‑Step Fix Tutorial
Step 1 – Verify Redis Connectivity
redis-cli -h 127.0.0.1 -p 6379 ping
# Expected output:
PONG
If you get a timeout, increase the Laravel timeout value:
// config/database.php
'redis' => [
'client' => env('REDIS_CLIENT', 'phpredis'),
'default' => [
'host' => env('REDIS_HOST', '127.0.0.1'),
'port' => env('REDIS_PORT', 6379),
'timeout' => 15, // <- increase from 5 to 15 seconds
'read_timeout' => 15,
],
],
Step 2 – Fix File Permissions
# Set proper ownership
sudo chown -R www-data:www-data /var/www/your‑app
# Set directory permissions
find /var/www/your‑app/storage -type d -exec chmod 2755 {} \;
find /var/www/your‑app/bootstrap/cache -type d -exec chmod 2755 {} \;
# Set file permissions
find /var/www/your‑app -type f -exec chmod 0644 {} \;
storage/framework and can create lock files without dying.
Step 3 – Tune PHP‑FPM Pool
# /etc/php/8.2/fpm/pool.d/www.conf
pm = dynamic
pm.max_children = 30
pm.start_servers = 5
pm.min_spare_servers = 5
pm.max_spare_servers = 10
request_terminate_timeout = 300
After editing, reload PHP‑FPM:
sudo systemctl reload php8.2-fpm
Step 4 – Configure Supervisor Properly
# /etc/supervisor/conf.d/laravel-queue.conf
[program:laravel-queue]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/your-app/artisan queue:work redis --sleep=3 --tries=3 --timeout=300
autostart=true
autorestart=true
user=www-data
numprocs=4
redirect_stderr=true
stdout_logfile=/var/log/laravel/queue.log
stopwaitsecs=3600
Reload Supervisor and check the logs:
sudo supervisorctl reread
sudo supervisorctl update
sudo supervisorctl status laravel-queue:*
tail -f /var/log/laravel/queue.log
VPS or Shared Hosting Optimization Tips
- Swap Management: Allocate a 1 GB swap file on low‑memory VPS to avoid OOM kills during spikes.
- Ulimit Adjustments: Increase
nofilelimit forwww-data–ulimit -n 4096. - OpCache: Enable
opcache.enable_cli=1for faster artisan commands. - Cloudflare Caching: Bypass /queue/* routes to prevent Cloudflare from throttling long‑poll requests.
- Shared Hosting: Use
php artisan queue:listenwith--daemonand setmax_execution_time=0in.htaccessif you cannot install Supervisor.
Real World Production Example
Company Acme SaaS ran a 12‑worker Laravel queue on a 2 vCPU, 4 GB VPS. After the fixes above, they saw:
Before Fix:
- 30% of jobs failed with “Redis connection timed out”
- Queue workers restarted every 5 minutes (Supervisor logs)
- Average API response: 1.8 s
After Fix:
- 0% Redis timeout errors over 30 days
- Workers stable, no restarts
- Average API response: 0.9 s
- CPU usage dropped from 85% to 45%
Before vs After Results
| Metric | Before | After |
|---|---|---|
| Job Failure Rate | 28% | 0% |
| Avg. Queue Latency | 12 s | 3 s |
| CPU (PHP‑FPM) | 85% | 45% |
Security Considerations
- Never run Supervisor as
root. Use a dedicated low‑privilege user. - Set
redis.passwordin.envand enablerequirepassinredis.conf. - Lock down
storagedirectory withchmod 750for the group. - Enable
disable_functions=exec,shell_execfor PHP‑FPM if you do not need them.
Bonus Performance Tips
- Enable
queue:retrywith exponential back‑off. - Leverage Redis
LPUSH/BRPOPpatterns for ultra‑low latency. - Compress large payloads before pushing to the queue using
gzcompress(). - Turn on
opcache.validate_timestamps=0in production for faster CLI execution.
FAQ
Q: My queue works locally but dies after deployment.
A: Check file ownership on the server. Local environments usually run as your user, while production runs aswww-data. The permission fix in Step 2 solves most cases.
Q: Should I use Redis or database driver for small apps?
A: Redis is still faster and offloads DB load. For < 100 jobs/day a database driver works, but the config overhead is minimal; stick with Redis for consistency across environments.
Q: Can I run queue workers on a shared host without Supervisor?
A: Yes. Usephp artisan queue:work --daemonin a cron entry with@reboot. Remember to setmax_execution_time=0in.htaccess.
Final Thoughts
Queue stability is not a “set‑and‑forget” feature. It is a combination of correct Redis timeouts, airtight file permissions, well‑tuned PHP‑FPM pools, and a reliable process manager like Supervisor. Apply the steps above, monitor the logs, and you’ll turn those crashing workers into a predictable, high‑throughput backbone for your Laravel or WordPress‑powered SaaS.
Once your queues are stable, you can focus on scaling – add more workers, enable Horizon, or spin up a dedicated Redis cluster. The real money is saved when you stop firefighting and start shipping features.
No comments:
Post a Comment