Laravel Queue Workers Zombieing on Docker: 5 Proven Fixes for 504 Gateway Timeouts, CPU Spikes, and “Process Stuck” Crashes You Won’t Believe Are Just Permissions & OPCache Misconfigurations
If you’ve ever stared at a Docker‑ized Laravel app that suddenly throws 504 errors, spikes the CPU, or logs “Process stuck” while your queue workers stare back like the walking dead, you know the frustration. You’ve probably blamed the network, blamed Redis, even blamed the gods of Cloudflare – only to discover the real villain lives in a tiny php.ini line or a missing file permission.
Why This Matters
In a SaaS environment a stuck queue can delay email confirmations, break order processing, and push your API response times past the 200 ms sweet spot. On a shared hosting plan the same issue can get you a “CPU throttling” notice from the provider, forcing you to upgrade or face downtime. Fixing the root cause not only saves money on VPS resources but also protects your brand reputation.
Common Causes of Zombie Queue Workers
- Incorrect file permissions on
/var/www/storageand/var/www/bootstrap/cache - OPCache
validate_timestampsset to0inside Docker - Supervisor not reaping child processes after a crash
- Redis connection timeouts caused by
tcp-keepalivemis‑config - PHP‑FPM pool limits (
pm.max_children) that are too low for burst traffic
Step‑by‑Step Fix Tutorial
1. Align Permissions & Ownership
When Docker builds the image as root but runs the container as www-data, Laravel can’t write to the storage folder, causing queue workers to hang.
# Dockerfile snippet
FROM php:8.2-fpm-alpine
# Create app user
RUN addgroup -g 1000 app && adduser -u 1000 -G app -s /bin/sh -D app
# Set working directory
WORKDIR /var/www
# Copy source
COPY --chown=app:app . .
# Ensure permissions
RUN find /var/www -type d -exec chmod 775 {} \; \
&& find /var/www -type f -exec chmod 664 {} \;
RUN php artisan storage:link after the COPY step to avoid missing symbolic links in production.2. Disable OPCache Timestamp Validation in Docker
By default OPCache assumes files never change inside a container, which is perfect for production but disastrous when you hot‑reload code via docker‑compose up --build.
# php.ini (docker/php/conf.d/opcache.ini)
opcache.enable=1
opcache.memory_consumption=256
opcache.interned_strings_buffer=16
opcache.max_accelerated_files=10000
opcache.validate_timestamps=1
opcache.revalidate_freq=0
validate_timestamps=0 in a development container will cause stale code to be executed. Keep it enabled locally.3. Tune Supervisor for Graceful Restarts
Supervisor needs explicit stop signals and a proper numprocs count. Without it, crashed workers linger and the CPU spikes.
# /etc/supervisor/conf.d/laravel-queue.conf
[program:laravel-queue]
process_name=%(program_name)s_%(process_num)02d
command=php /var/www/artisan queue:work redis --sleep=3 --tries=3 --timeout=90
autostart=true
autorestart=true
stopasgroup=true
killasgroup=true
numprocs=4
user=app
stdout_logfile=/var/log/worker_stdout.log
stderr_logfile=/var/log/worker_stderr.log
4. Adjust PHP‑FPM Pool Settings
The default pm.max_children of 5 can choke a bursty API. Raise it based on your VPS RAM.
# /usr/local/etc/php-fpm.d/www.conf
pm = dynamic
pm.max_children = 20
pm.start_servers = 4
pm.min_spare_servers = 2
pm.max_spare_servers = 8
; Reduce idle timeout to free workers quicker
pm.process_idle_timeout = 10s
5. Harden Redis Connection & Nginx Proxy Timeouts
A mis‑configured Nginx proxy_read_timeout will return 504 before Laravel even gets a chance to respond.
# nginx.conf (site block)
location / {
try_files $uri $uri/ /index.php?$query_string;
proxy_pass http://php-fpm;
proxy_read_timeout 300s;
proxy_connect_timeout 60s;
}
# redis.conf (docker/redis/redis.conf)
tcp-keepalive 60
timeout 0
VPS or Shared Hosting Optimization Tips
- Swap Management: On a 2 GB VPS enable a 1 GB swap file to avoid OOM kills during sudden queue bursts.
- Linux Scheduler: Use
deadlineI/O scheduler for SSDs to cut latency. - Apache vs Nginx: If you’re on shared hosting with Apache, enable
mod_proxy_fcgiand setFcgidMaxProcessesto match your PHP‑FPM pool. - Composer Autoloader Optimization: Run
composer install --optimize-autoloader --no-devon production builds. - Cloudflare Caching: Bypass cache for
/api/*endpoints to prevent stale queue responses.
Real World Production Example
Acme SaaS runs a Laravel 10 API on a 4‑CPU Ubuntu 22.04 VPS behind Nginx, Docker, and Redis 6.0. Before the fixes the queue would hang after processing ~2 000 jobs, leading to a 504 on the checkout endpoint.
# Before: Docker‑compose.yml (simplified)
services:
app:
image: acme/laravel:latest
ports:
- "8000:80"
volumes:
- .:/var/www
depends_on:
- redis
redis:
image: redis:6-alpine
After applying the five fixes, the same workload processed 150 000 jobs without a single timeout.
Before vs After Results
| Metric | Before | After |
|---|---|---|
| Average CPU | 94 % | 12 % |
| 504 Errors / day | 27 | 0 |
| Queue Latency | 8.4 s | 0.9 s |
Security Considerations
- Never run the container as
root. Use a non‑privileged user (UID 1000) and setUSER appin Dockerfile. - Limit
opcache.restrict_apito your app directory to prevent arbitrary PHP code execution. - Enable Redis AUTH with a strong password and mount the password file as a secret.
- Configure Nginx
client_max_body_sizeto avoid DoS via large payloads. - Run
php artisan schedule:rununder Supervisor with--no-interactionto prevent accidental prompts.
Bonus Performance Tips
- Switch Laravel Horizon for Redis‑backed monitoring – it gives you live worker stats and auto‑scales.
- Use
php artisan queue:restartafter each deploy to gracefully kill old workers. - Enable
realpath_cache_size=4096kinphp.inifor faster file resolution. - Compress static assets with
gziporbrotliin Nginx to free up bandwidth for API calls. - Pin Composer dependencies to exact versions; a stray major upgrade can break the queue silently.
FAQ
Q: My queue still times out after these changes. What else can I check?
A: Look at worker_stderr.log. If you see “Failed to connect to Redis”, verify your REDIS_HOST environment variable and Docker network alias.
Q: Can I apply these fixes on a shared hosting plan that doesn’t support Docker?
A: Yes. The permission and OPCache settings apply to any PHP‑FPM environment. Use .user.ini to override opcache.validate_timestamps and ask your host to increase pm.max_children via cPanel.
Q: Do I need to restart Supervisor after every code push?
Only if you change the artisan queue:work command flags. Otherwise a simple supervisorctl reread && supervisorctl update will reload the config without dropping jobs.
Final Thoughts
Zombie queue workers are rarely a mystical Docker bug—they’re almost always a combination of permission slip‑ups and PHP opcode caching quirks. By aligning file ownership, tweaking OPCache, giving Supervisor the right signals, and matching PHP‑FPM pool sizes to your VPS memory, you turn a flaky Laravel API into a scalable, production‑ready service.
Implement the fixes today, monitor your top output, and you’ll see the CPU settle, the 504s disappear, and your customers happily receive their confirmation emails on time.
No comments:
Post a Comment