By Moumou (Zhou Feiyu)
The following is how we load a module in Node.js:
const fs = require('fs');
const express = require('express');
const anotherModule = require('./another-module');
Yes, require
is the API for loading CJS modules, but V8 does not have a CJS module system, so how does the node find modules through require
and load them? Today, we will explore the Node.js source code to understand the loading process of CJS modules. The version of the node code we read is v17.x:
In order to know the working logic of the require
, we need to first understand how the built-in modules are loaded into the node (such as 'fs', 'path', and 'child_process', which also includes some internal modules that cannot be referenced by users). After preparing the code, we first need to start reading from the node. The main function of the node enables a node instance by calling the method:
node::Start
in the src/node_main.cc
:
int Start(int argc, char** argv) {
InitializationResult result = InitializeOncePerProcess(argc, argv);
if (result.early_return) {
return result.exit_code;
}
{
Isolate::CreateParams params;
const std::vector<size_t>* indices = nullptr;
const EnvSerializeInfo* env_info = nullptr;
bool use_node_snapshot =
per_process::cli_options->per_isolate->node_snapshot;
if (use_node_snapshot) {
v8::StartupData* blob = NodeMainInstance::GetEmbeddedSnapshotBlob();
if (blob != nullptr) {
params.snapshot_blob = blob;
indices = NodeMainInstance::GetIsolateDataIndices();
env_info = NodeMainInstance::GetEnvSerializeInfo();
}
}
uv_loop_configure(uv_default_loop(), UV_METRICS_IDLE_TIME);
NodeMainInstance main_instance(¶ms,
uv_default_loop(),
per_process::v8_platform.Platform(),
result.args,
result.exec_args,
indices);
result.exit_code = main_instance.Run(env_info);
}
TearDownOncePerProcess();
return result.exit_code;
}
Here, an event loop and a NodeMainInstance
instance main_instance
are created, and its Run method is called:
int NodeMainInstance::Run(const EnvSerializeInfo* env_info) {
Locker locker(isolate_);
Isolate::Scope isolate_scope(isolate_);
HandleScope handle_scope(isolate_);
int exit_code = 0;
DeleteFnPtr<Environment, FreeEnvironment> env =
CreateMainEnvironment(&exit_code, env_info);
CHECK_NOT_NULL(env);
Context::Scope context_scope(env->context());
Run(&exit_code, env.get());
return exit_code;
}
The CreateMainEnvironment
is called in the Run
method to create and initialize the environment:
Environment* CreateEnvironment(
IsolateData* isolate_data,
Local<Context> context,
const std::vector<std::string>& args,
const std::vector<std::string>& exec_args,
EnvironmentFlags::Flags flags,
ThreadId thread_id,
std::unique_ptr<InspectorParentHandle> inspector_parent_handle) {
Isolate* isolate = context->GetIsolate();
HandleScope handle_scope(isolate);
Context::Scope context_scope(context);
// TODO(addaleax): This is a much better place for parsing per-Environment
// options than the global parse call.
Environment* env = new Environment(
isolate_data, context, args, exec_args, nullptr, flags, thread_id);
#if HAVE_INSPECTOR
if (inspector_parent_handle) {
env->InitializeInspector(
std::move(static_cast<InspectorParentHandleImpl*>(
inspector_parent_handle.get())->impl));
} else {
env->InitializeInspector({});
}
#endif
if (env->RunBootstrapping().IsEmpty()) {
FreeEnvironment(env);
return nullptr;
}
return env;
}
Create an Environment
object env
and call its RunBootstrapping
method:
MaybeLocal<Value> Environment::RunBootstrapping() {
EscapableHandleScope scope(isolate_);
CHECK(!has_run_bootstrapping_code());
if (BootstrapInternalLoaders().IsEmpty()) {
return MaybeLocal<Value>();
}
Local<Value> result;
if (!BootstrapNode().ToLocal(&result)) {
return MaybeLocal<Value>();
}
// Make sure that no request or handle is created during bootstrap -
// if necessary those should be done in pre-execution.
// Usually, doing so would trigger the checks present in the ReqWrap and
// HandleWrap classes, so this is only a consistency check.
CHECK(req_wrap_queue()->IsEmpty());
CHECK(handle_wrap_queue()->IsEmpty());
DoneBootstrapping();
return scope.Escape(result);
}
The BootstrapInternalLoaders
here implements a very important step in the node module loading process. The nativeModulerequire
function is obtained by wrapping and executing the internal/bootstrap/loaders.js
to load the built-in js module, and internalBinding
is obtained to load the built-in C++ module. The NativeModule
is a small module system specially used for the built-in module.
function nativeModuleRequire(id) {
if (id === loaderId) {
return loaderExports;
}
const mod = NativeModule.map.get(id);
// Can't load the internal errors module from here, have to use a raw error.
// eslint-disable-next-line no-restricted-syntax
if (!mod) throw new TypeError(`Missing internal module '${id}'`);
return mod.compileForInternalLoader();
}
const loaderExports = {
internalBinding,
NativeModule,
require: nativeModuleRequire
};
return loaderExports;
It should be noted that this require
function will only be used for loading built-in modules, not for loading user modules. (This is also why we can see all user modules through the print require('module')._cache
, but we cannot see the built-in modules (such as fs) because the loading and cache maintenance methods are different.)
Next, let's look back at the NodeMainInstance::Run
function:
int NodeMainInstance::Run(const EnvSerializeInfo* env_info) {
Locker locker(isolate_);
Isolate::Scope isolate_scope(isolate_);
HandleScope handle_scope(isolate_);
int exit_code = 0;
DeleteFnPtr<Environment, FreeEnvironment> env =
CreateMainEnvironment(&exit_code, env_info);
CHECK_NOT_NULL(env);
Context::Scope context_scope(env->context());
Run(&exit_code, env.get());
return exit_code;
}
We have created an env
object through the CreateMainEnvironment
function. This Environment
instance already has a module system NativeModule
to maintain the built-in module. Then, the code runs to another overloaded version of the Run
function:
void NodeMainInstance::Run(int* exit_code, Environment* env) {
if (*exit_code == 0) {
LoadEnvironment(env, StartExecutionCallback{});
*exit_code = SpinEventLoop(env).FromMaybe(1);
}
ResetStdio();
// TODO(addaleax): Neither NODE_SHARED_MODE nor HAVE_INSPECTOR really
// make sense here.
#if HAVE_INSPECTOR && defined(__POSIX__) && !defined(NODE_SHARED_MODE)
struct sigaction act;
memset(&act, 0, sizeof(act));
for (unsigned nr = 1; nr < kMaxSignal; nr += 1) {
if (nr == SIGKILL || nr == SIGSTOP || nr == SIGPROF)
continue;
act.sa_handler = (nr == SIGPIPE) ? SIG_IGN : SIG_DFL;
CHECK_EQ(0, sigaction(nr, &act, nullptr));
}
#endif
#if defined(LEAK_SANITIZER)
__lsan_do_leak_check();
#endif
}
Here, call the LoadEnvironment
:
MaybeLocal<Value> LoadEnvironment(
Environment* env,
StartExecutionCallback cb) {
env->InitializeLibuv();
env->InitializeDiagnostics();
return StartExecution(env, cb);
}
Then, execute the StartExecution
:
MaybeLocal<Value> StartExecution(Environment* env, StartExecutionCallback cb) {
// Here we only look at the "node index.js" situation without paying attention to other running situations, which does not affect our understanding of the module system.
if (!first_argv.empty() && first_argv != "-") {
return StartExecution(env, "internal/main/run_main_module");
}
}
In the call StartExecution(env, "internal/main/run_main_module")
, we will wrap a function, pass it to the require
function exported from loaders just now, and run the code in the lib/internal/main/run_main_module.js
:
'use strict';
const {
prepareMainThreadExecution
} = require('internal/bootstrap/pre_execution');
prepareMainThreadExecution(true);
markBootstrapComplete();
// Note: this loads the module through the ESM loader if the module is
// determined to be an ES module. This hangs from the CJS module loader
// because we currently allow monkey-patching of the module loaders
// in the preloaded scripts through require('module').
// runMain here might be monkey-patched by users in --require.
// XXX: the monkey-patchability here should probably be deprecated.
require('internal/modules/cjs/loader').Module.runMain(process.argv[1]);
The so-called wrapper function is passed to the require
. The pseudo-code is listed below:
(function(require, /* other input parameters */) {
// Here is the file content of internal/main/run_main_module.js.
})();
Therefore, the runMain
method on the Module object exported by the lib/internal/modules/cjs/loader.js
is loaded through the require
function of the built-in module. However, we did not find runMain
function in the loader.js
. This function is defined onto the Module
object in the lib/internal/bootstrap/pre_execution.js
:
function initializeCJSLoader() {
const CJSLoader = require('internal/modules/cjs/loader');
if (!noGlobalSearchPaths) {
CJSLoader.Module._initPaths();
}
// TODO(joyeecheung): deprecate this in favor of a proper hook?
CJSLoader.Module.runMain =
require('internal/modules/run_main').executeUserEntryPoint;
}
Find the executeUserEntryPoint
method in the lib/internal/modules/run_main.js
:
function executeUserEntryPoint(main = process.argv[1]) {
const resolvedMain = resolveMainPath(main);
const useESMLoader = shouldUseESMLoader(resolvedMain);
if (useESMLoader) {
runMainESM(resolvedMain || main);
} else {
// Module._load is the monkey-patchable CJS module loader.
Module._load(main, null, true);
}
}
The parameter main
is the entry file index.js that we pass in. As you can see, index.js
, as a CJS module, should be loaded by Module._load
. What did _load
do? This function is the most important function in the CJS module loading process and is worth reading carefully:
// The '_load' function checks the cache of the requested file.
// 1. If the module already exists, the cached exports object is returned.
// 2. If the module is a built-in module, call "NativeModule.prototype.compileForPublicLoader()"
// to obtain the exports object of the built-in module. The compileForPublicLoader function has a whitelist and can only obtain the public
// The exports of the built-in module
// 3. If the above 2 situations all fail, create a new Module object and save it to the cache. Then, load the file through it and return its exports.
// request: the requested module, such as 'fs','./another-module','@pipcook/core', etc.
// parent: the parent module. For example, 'require('b.js')'in 'a.js', the request here is 'b.js',
Module object with 'a.js' as the parent module
// isMain: The entry file is 'true', and all other modules are 'false'.
Module._load = function(request, parent, isMain) {
let relResolveCacheIdentifier;
if (parent) {
debug('Module._load REQUEST %s parent: %s', request, parent.id);
// The relativeResolveCache is the module path cache,
// It is used to accelerate the requests for the current modules from all modules in the directory where the parent module is located.
// You can directly query the actual path without searching for files through _resolveFilename.
relResolveCacheIdentifier = `${parent.path}\x00${request}`;
const filename = relativeResolveCache[relResolveCacheIdentifier];
if (filename !== undefined) {
const cachedModule = Module._cache[filename];
if (cachedModule !== undefined) {
updateChildren(parent, cachedModule, true);
if (!cachedModule.loaded)
return getExportsForCircularRequire(cachedModule);
return cachedModule.exports;
}
delete relativeResolveCache[relResolveCacheIdentifier];
}
}
// Try to find the path of the module file. If the module cannot be found, an exception is thrown.
const filename = Module._resolveFilename(request, parent, isMain);
// If it is a built-in module, load it from 'NativeModule'.
if (StringPrototypeStartsWith(filename, 'node:')) {
// Slice 'node:' prefix
const id = StringPrototypeSlice(filename, 5);
const module = loadNativeModule(id, request);
if (!module?.canBeRequiredByUsers) {
throw new ERR_UNKNOWN_BUILTIN_MODULE(filename);
}
return module.exports;
}
// If the cache already exists, push the current module to the children field of the parent module.
const cachedModule = Module._cache[filename];
if (cachedModule !== undefined) {
updateChildren(parent, cachedModule, true);
// Process circular references.
if (!cachedModule.loaded) {
const parseCachedModule = cjsParseCache.get(cachedModule);
if (!parseCachedModule || parseCachedModule.loaded)
return getExportsForCircularRequire(cachedModule);
parseCachedModule.loaded = true;
} else {
return cachedModule.exports;
}
}
// Try to load from the built-in module.
const mod = loadNativeModule(filename, request);
if (mod?.canBeRequiredByUsers) return mod.exports;
const mod = loadNativeModule(filename, request);
if (mod?.canBeRequiredByUsers) return mod.exports;
// Don't call updateChildren(), Module constructor already does.
const module = cachedModule || new Module(filename, parent);
if (isMain) {
process.mainModule = module;
module.id = '.';
}
// Add the module object to the cache.
Module._cache[filename] = module;
if (parent !== undefined) {
relativeResolveCache[relResolveCacheIdentifier] = filename;
}
// Try to load the module. If the module fails to be loaded, delete the module object in the cache.
// Delete the module object in the children of the parent module.
let threw = true;
try {
module.load(filename);
threw = false;
} finally {
if (threw) {
delete Module._cache[filename];
if (parent !== undefined) {
delete relativeResolveCache[relResolveCacheIdentifier];
const children = parent?.children;
if (ArrayIsArray(children)) {
const index = ArrayPrototypeIndexOf(children, module);
if (index !== -1) {
ArrayPrototypeSplice(children, index, 1);
}
}
}
} else if (module.exports &&
!isProxy(module.exports) &&
ObjectGetPrototypeOf(module.exports) ===
CircularRequirePrototypeWarningProxy) {
ObjectSetPrototypeOf(module.exports, ObjectPrototype);
}
}
// Return the exports object.
return module.exports;
};
The load
function on the module
object is used to load a module:
Module.prototype.load = function(filename) {
debug('load %j for module %j', filename, this.id);
assert(!this.loaded);
this.filename = filename;
this.paths = Module._nodeModulePaths(path.dirname(filename));
const extension = findLongestRegisteredExtension(filename);
// allow .mjs to be overridden
if (StringPrototypeEndsWith(filename, '.mjs') && !Module._extensions['.mjs'])
throw new ERR_REQUIRE_ESM(filename, true);
Module._extensions[extension](this, filename);
this.loaded = true;
const esmLoader = asyncESM.esmLoader;
// Create module entry at load time to snapshot exports correctly
const exports = this.exports;
// Preemptively cache
if ((module?.module === undefined ||
module.module.getStatus() < kEvaluated) &&
!esmLoader.cjsCache.has(this))
esmLoader.cjsCache.set(this, exports);
};
The actual loading operation is performed in the Module._extensions[extension](this, filename);
. There will be different loading strategies with different extension names:
fs.readFileSync
to read the file content and wrap the file content in the wrapper. It should be noted that the require
here is the require
method of Module.prototype.require
rather than that of built-in modules.const wrapper = [
'(function (exports, require, module, __filename, __dirname) { ',
'\n});',
];
fs.readFileSync
to read the file content and convert it to an object.dlopen
to open the node extension.The Module.prototype.require
function also calls the static method Module._load
to load modules:
Module.prototype.require = function(id) {
validateString(id, 'id');
if (id === '') {
throw new ERR_INVALID_ARG_VALUE('id', id,
'must be a non-empty string');
}
requireDepth++;
try {
return Module._load(id, this, /* isMain */ false);
} finally {
requireDepth--;
}
};
The loading process of the CJS module is clearer now:
run_main
run_main
module _load
method is used to load the entry file. During the process, pass the module.require
and module.exports
to allow entry files to require
other dependency modules and recursively allow the entire dependency tree to be fully loaded.After knowing the complete process of CJS module loading, we can also read other codes along this procedure (such as initialization of global
variables and management methods of esModule) to have a deeper understanding of various implementations in nodes.
Intuitive Comparison of Four NLP Models - Neural Network, RNN, CNN, and LSTM
66 posts | 3 followers
FollowAlibaba F(x) Team - September 10, 2021
Alibaba F(x) Team - December 14, 2020
Alibaba Clouder - June 23, 2020
hyj1991 - June 20, 2019
Alibaba F(x) Team - August 29, 2022
Alex - August 16, 2018
66 posts | 3 followers
FollowA low-code development platform to make work easier
Learn MoreHelp enterprises build high-quality, stable mobile apps
Learn MoreAlibaba Cloud (in partnership with Whale Cloud) helps telcos build an all-in-one telecommunication and digital lifestyle platform based on DingTalk.
Learn MoreMore Posts by Alibaba F(x) Team