Struggling with Cannot find module Errors on Shared Hosting? My NestJS Fix Worked Like Magic!
The smell of burnt production servers is a familiar one. Last month, we were running a critical SaaS application built on NestJS, deployed on an Ubuntu VPS managed via aaPanel. Everything looked fine locally. We hit the deployment button, and within minutes, the entire system collapsed. Users were hitting 500 errors, and the entire production pipeline ground to a halt.
This wasn't a theoretical bug; it was a live, painful debugging session involving file permissions, Node version mismatches, and a stale cache state within the shared hosting environment. The symptom was glaring: every API request resulted in a fatal Cannot find module 'nestjs/common' error, making the application completely unusable.
The Production Failure: A Live Disaster
The system broke right after the deployment script ran, specifically when the queue worker attempted to spin up. The application seemed fine in the aaPanel interface, but the actual service was dead.
The Real NestJS Error Trace
When we dug into the Node.js process logs via the SSH console, the error wasn't a simple application crash; it was a deep module resolution failure:
Error: Cannot find module 'nestjs/common'
at Module._resolveFilename (node:internal/process/task_queues:1059:17)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
at Module._resolveFilename (node:internal/process/task_queues:1056:20)
Root Cause Analysis: The Cache and Permission Mismatch
The immediate, wrong assumption developers jump to is always "corrupted dependencies" or "version conflict." While those are often symptoms, the root cause in a managed VPS setup running NestJS via an automated panel (like aaPanel) is almost always a combination of file system permissions and stale Opcode Caches caused by manual deployment steps.
Specifically, when deploying via an automated pipeline, the setup process often runs as a limited user (or the web server user, e.g., `www` or `nginx`) which lacks the necessary read/write permissions for the application directory, or the caching layer (like Composer's cache or PHP's opcode cache, if PHP-FPM was involved) retained stale state from a previous deployment.
In our case, the issue was a classic file system permission trap coupled with how the Composer dependencies were installed. The application owner (the process running the queue worker) couldn't access the installed modules, even though they existed.
Step-by-Step Production Debugging Process
We had to treat the VPS like a forensic scene. We couldn't trust the aaPanel interface; we had to go straight to the source.
- Check the Process Status: First, I used
htopto confirm the Node.js process for the queue worker was actually running and immediately killed it. This provided a clean state to work from. - Inspect the Logs: I used
journalctl -u docker.service(if Docker was involved) or simply checked the application's standard output/error logs to confirm the exact failure point. The log showed the module lookup failure repeatedly. - Verify Permissions: I checked the ownership of the entire application directory and the installed dependencies.
ls -ld /var/www/nestjs-app # Output showed ownership was root:root, which was correct, but the application was still failing. ls -l /var/www/nestjs-app/node_modules # The installed dependencies were owned by a different user or lacked the correct execute permissions for the Node process. - Investigate Composer Cache: Since we knew the files existed, the problem had to be in the environment's ability to read them. I manually ran the dependency installation step, ensuring the process ran with the correct permissions.
- Re-run Composer with Permissions Fix: I used
chownand a cleancomposer installto force a fresh, permission-correct dependency install.
The Actionable Fix: Forcing a Clean Environment
The fix wasn't a simple restart; it was a forced environment reset and permission correction. This ensured that the files were accessible and the system caches were invalidated.
Fix Step 1: Correcting File Permissions
The application runtime user needs ownership of the `node_modules` directory and the application root.
sudo chown -R www-data:www-data /var/www/nestjs-app
Fix Step 2: Clearing Corrupted Modules and Reinstalling
To guarantee no corrupted modules were locked, we forcibly removed and reinstalled all dependencies, which forced Composer to recreate the structure under the correct ownership.
cd /var/www/nestjs-app rm -rf node_modules composer install --no-dev --optimize-autoloader
Fix Step 3: Restarting Services
Finally, we ensured the Node.js process and any related workers were restarted cleanly, forcing them to load the newly verified dependencies.
sudo systemctl restart nodejs-fpm sudo systemctl restart supervisor
Why This Happens in VPS / aaPanel Environments
Shared hosting or managed VPS environments like those configured by aaPanel introduce complexity that local development often hides. The primary failure points are:
- User Context Mismatch: When deploying via a panel, the deployment script often runs with a generic user, and the subsequent service (e.g., Node.js-FPM, Supervisor) runs under a restricted system user. If the file permissions aren't explicitly mapped, the application runtime cannot access its own dependencies, leading to 'module not found' errors, even if the files physically exist.
- Stale Opcode Cache: If PHP or custom caching layers (like those used by Nginx/FPM setups) were involved, a deployment could leave behind stale opcode or dependency metadata. A clean installation forces a full re-evaluation of the autoloading paths.
- Deployment Isolation: Automated deployment scripts often fail to account for the service-specific user context required by the running application process.
Prevention: The Production Deployment Pattern
To prevent this class of error in future deployments, we need to enforce a strict, permission-aware deployment pattern that treats the application as a single, deployable unit.
- Dedicated Deployment User: Always use a dedicated, non-root deployment user (e.g., `deployer`) for running the deployment commands.
- Service User Mapping: Ensure the final running service (Node.js, Supervisor) runs under the same user that owns the application directory, eliminating permission traps.
- Atomic Module Installation: Never rely on the environment to handle `node_modules`. Always run dependency installation commands immediately after setting the correct ownership.
- Pre-Deployment Hooks: Integrate a mandatory pre-deployment hook in your CI/CD pipeline that runs the `chown` and `composer install` sequence before attempting to restart services.
Conclusion
Production debugging isn't about guessing; it's about forensic command-line inspection. The 'Cannot find module' error in a deployed NestJS application on a VPS is rarely a code error. It's almost always a permissions, caching, or environment mismatch. Master your Linux commands, respect file ownership, and your deployment pipeline will stop producing silent, painful production crashes.
No comments:
Post a Comment