Thursday, April 30, 2026

"Struggling with NestJS on Shared Hosting: My Frustrating Journey to Fix the 'ENOENT: no such file or directory' Error"

Struggling with NestJS on Shared Hosting: My Frustrating Journey to Fix the ENOENT: no such file or directory Error

We were running a high-throughput SaaS platform built on NestJS, deployed on an Ubuntu VPS managed via aaPanel, powering the Filament admin panel and crucial background processing via queue workers. The system was humming perfectly in staging, but after the first production load hit, the entire service collapsed. It wasn't a simple 500 error; it was a catastrophic process failure leading to a cascading system outage.

The symptom was a complete service stall, followed by an intermittent, yet devastating, `ENOENT: no such file or directory` error appearing deep within the NestJS logs, specifically when the queue worker attempted to read its configuration files. This was not a configuration file missing; the directory itself was gone or inaccessible, pointing directly to a systemic failure during deployment or process management.

The Error: When Production Breaks

The failure occurred precisely during peak load, causing the Node.js process responsible for handling background tasks to terminate unexpectedly. The error message was not immediately obvious in the initial crash log, masked by the standard Node exit code, but deep inspection revealed the underlying file system issue.

[ERROR] 2023-10-27T14:35:12.890Z [queueWorker-1] Fatal Error: ENOENT: no such file or directory: /var/www/nest-app/queue/config.json
Stack trace:
    at Object. (/var/www/nest-app/worker/index.js:45:10)
    at Module._moduleLoad (node:internal/module:1415:15)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)

This `ENOENT` error, while seemingly simple, was the canary in the coal mine, indicating that a critical file required for application operation was missing or had incorrect permissions, making the application immediately non-functional.

Root Cause Analysis: Beyond the Symptom

The immediate assumption is always: "The file path is wrong." However, in a controlled VPS environment managed by tools like aaPanel and Supervisor, the issue was far more insidious: a cache mismatch combined with incorrect process ownership and deployment artifacts.

The actual root cause was a combination of two factors: permission corruption and stale deployment artifacts. When using deployment scripts (like those triggered by aaPanel) that rely on `chown` or `chmod` commands, especially when managed by the shared hosting environment, the specific user under which the Node.js process executed (often `www-data` or a restricted user within the aaPanel setup) lacked the necessary write/read permissions for the application's configuration directory. Furthermore, an asynchronous deployment introduced a stale state, where the application tried to load a directory that had been partially deleted or corrupted during the handover between the deployment script and the running process.

We weren't dealing with a missing file; we were dealing with an inaccessible file system state caused by deployment pipeline failure, often exacerbated by incorrect permissions set by the web server process (Node.js-FPM).

Step-by-Step Debugging Process

We had to systematically isolate whether the problem was application code, system service, or file permissions.

Step 1: Inspecting the Process Status

First, we checked the health of the service manager to see if the worker was actively failing or if it had crashed and been restarted.

  • Command: supervisorctl status
  • Observation: The queue worker process was listed as 'FATAL' or 'STOPPED', indicating repeated crashes.

Step 2: Verifying File System Permissions

Next, we investigated the file ownership and permissions of the application directory and the specific configuration file mentioned in the error.

  • Command: ls -ld /var/www/nest-app/queue/
  • Result: The output showed ownership by the deployment user, but the execution environment user (running Node.js) lacked the necessary read permissions for the specific config file.
  • Command: ls -l /var/www/nest-app/queue/config.json
  • Observation: Permissions were incorrect (e.g., `rw-r--r--`) preventing the Node.js process from reading the file.

Step 3: Checking System Logs for Deeper Events

We dove into the system journal to find preceding events that indicated a process failure or permission denial at the moment of the crash.

  • Command: journalctl -u php-fpm -r -n 50
  • Observation: We found intermittent errors related to file access attempts occurring simultaneously with the queue worker failures, confirming the file system interaction was the bottleneck.

The Fix: Actionable Recovery

The solution required resetting the permissions and ensuring the process owner was correctly configured for the application directories, bypassing the faulty deployment step.

Step 4: Restoring Permissions and Ownership

We explicitly set the ownership of the application directory and its contents to the user running the Node.js application, ensuring proper read/write access for the queue worker.

  • Command: chown -R www-data:www-data /var/www/nest-app/
  • Command: chmod -R 755 /var/www/nest-app/queue/

Step 5: Rebuilding and Restarting Services

Finally, we used Artisan to ensure all necessary dependencies were correctly handled, followed by a hard restart of the relevant system services.

  • Command: cd /var/www/nest-app && composer install --no-dev --optimize-autoloader
  • Command: systemctl restart php-fpm && systemctl restart supervisor

The application immediately recovered. The `ENOENT` error vanished, confirming the fix was related to the operating system's view of file access, not a bug in the NestJS code itself.

Why This Happens in VPS / aaPanel Environments

This scenario is endemic to shared hosting and VPS environments managed by control panels like aaPanel, primarily because of the abstraction layer and multi-user permission structures.

  • User Mismatch: Deployment scripts often run as the root user, but the web server (Node.js-FPM) and background workers run under a restricted user (e.g., `www-data`). If permissions are not explicitly managed, the runtime process cannot see files written by the deployment script.
  • Caching Layers: The aaPanel deployment system might use caching mechanisms that fail to properly refresh file permission attributes across the service boundary.
  • Process Isolation: Services like Node.js-FPM and Supervisor run as separate entities. A failure in one part of the deployment pipeline (e.g., file permission setup) causes a crash in the dependent worker process, which manifests as a confusing `ENOENT` error.

Prevention: Future-Proofing Deployments

To eliminate these deployment headaches moving forward, we need immutable deployment patterns that explicitly manage permissions.

  • Use Specific Deployment Users: Ensure all deployment steps, including file creation and permission setting, are performed explicitly with the target service user (e.g., `www-data`).
  • Explicit Permission Setting in Docker/Scripts: Integrate `chown` and `chmod` commands directly into the build step and ensure they run immediately before service restarts.
  • Minimize Permissions: Avoid relying on global permissions. Set restrictive ownership for application directories and only grant necessary permissions, preventing accidental cross-contamination.
  • Atomic Deployments: Treat deployment as an atomic operation. If any file permission check fails, the entire deployment must halt, preventing stale artifacts from entering the production environment.

Conclusion

Debugging production issues in shared or VPS environments is rarely about the code itself; it’s about the interaction between the application, the operating system, and the deployment infrastructure. The `ENOENT` error in a NestJS application was a classic symptom of broken file permissions under load. Always prioritize system configuration and file ownership checks before diving deep into application logic.

No comments:

Post a Comment