How does Node.js load its built-in/native modules?
This post stems from a recent Twitter conversation and a bug I was trying to fix. I’ve also seen some questions asked in the issue tracker about this from time to time. A lot of people are already aware that a substantial part of Node.js is implemented in JavaScript, and many naturally think that Node.js loads its builtins from separate JS files on disk when the process is launched (which is very intuitive, but not true unless you use a special build-time flag). So in this post I’ll try to explain how the whole thing comes together at the moment, hopefully it can be helpful to someone searching about it on the Internet or to code archeologists in the future.
Note: The builtins are sometimes called “native modules” internally, but in this post I’ll call them “builtins” so they are not confused with C++ addons - the some_library.node
dynamic libraries that npm modules load via require()
.
How the Node.js binary locates its JavaScript internals
The majority of the builtins are located under the lib/
directory in the source code (the remaining third-party ones mostly come from deps/
and they are usually only accessible to other internal builtins, not to the users). When the node
executable is built, these builtins are read by a python script tools/js2c.py
and written into a C++ source file (${OUT_DIR}/gen/node_javascript.cc
) as static const uint16_t[]
and static const uint8_t[]
literals, like these:
1 | static const uint8_t fs_raw[] = { |
Each .js
file in the source is encoded into one array (e.g. the fs_raw
array above comes from lib/fs.js
). Files with only ASCII characters are encoded into uint8_t
arrays with ASCII and others are encoded into uint16_t
arrays with UTF-16 Little Endian. Then this file gets compiled into the binary. At runtime these arrays are put into a std::map<std::string, node::UnionBytes>
with the module identifier as keys (node::UnionBytes
is an encapsulation over uint8_t[]
and uint16_t[]
which also offers some helper methods to convert them into external V8 strings). When the Node.js instance finishes the initial bootstrapping and is ready to run the internal JavaScript to complete the initialization, it will load these JavaScript sources from the map, and compile them into v8::Function
s that get invoked and perform the necessary initialization.
So if your Node.js executables are downloaded from somewhere (e.g. using nvm
or nvs
, or from a installer available on the Node.js website) instead of being built by yourself, the simplified answer to the question “Where are the builtins located (on disk)?” is probably “They are in the read only data section of your executable”. Technically speaking, they are loaded from disk too, just not as separate files, but bundled into the executable, so their contents are already determined at build time and not mutable during execution (unless you build the binary yourself and pass --node-builtin-modules-path
to the configure script, which we will cover later in this post).
How the JS internals get compiled in Node.js (or don’t)
To speed up the bootstrap, most of the builtins in Node.js are (usually) not even compiled from scratch when you spin up a Node.js instance (e.g. a main instance in a new process, or a worker thread instance). When building the Node.js binary, we pre-compile the builtins to generate V8 code cache for them (which may include things like bytecode and information regarding how the scopes look like, etc.), then write the content of the code cache as static const uint8_t[]
literals into ${OUT_DIR}/gen/node_code_cache.cc
, which would get compiled into the final binary. At run time we would pass the code cache to V8, which, after some validity checks, would deserialize the information it needs from the code cache instead of parsing and recompiling the built-ins from scratch.
And that’s not the end of the story - to save more time in the bootstrap, we also pre-execute some of the bootstrap routines (which usually involves setting up globals like process
, buffer
, setTimeout
and whatnot) and snapshot the resulting V8 execution context into a blob. This data again is written as some C++ literals into ${OUT_DIR}/gen/node_snapshot.cc
, built into the binary, and loaded at run time to deserialize the context so that Node.js does not need to execute this part of the bootstrap at all.
How do I debug the JS internals that are included in the built-in startup snapshot?
The builtins snapshotted during the bootstrap process (e.g. fs.js
mentioned above) would be marked as native scripts by V8, so with the normal start up configuration you won’t be able to debug them through the V8 inspector protocol at runtime.
The --no-node-snapshot
runtime option would instruct Node.js not to load its built-in snapshot, but instead perform all the initialization from scratch, then these builtins would be marked as normal scripts again by V8 and thus debuggable. This is useful if you want to step into the internals when debugging with the offical build.
If you build your own Node.js binaries, there is also a --without-node-snapshot
configure option that does a similar job by not building any built-in snapshot at all. This could be useful when you see weird build failures from the JS side that come out of ${OUT_DIR}/node_mksnapshot
- oftentimes the bug isn’t in the snapshot building per-se, it only comes out of the snapshot builder because that’s where JavaScript is executed for the first time during the building process.
How do I modify Node.js by editing its JS source?
At the time of writing, modifying the JS part of Node.js usually means downloading the source code, building Node.js yourself, editing the JS source code and then recompiling to see the effect. It is, however, possible to bypass the recompilation (which usually only takes a few seconds, but still, it needs a few extra steps) by passing --node-builtin-modules-path
to Node.js at configuration time. So on Linux or macOS, I’d do:
1 | cd /path/to/node/source |
--node-builtin-modules-path
configures Node.js to look up the builtins from disk at runtime, with the caveat that you cannot add or delete files after the configuration is done. If there is any modification to the set of files (instead of just the contents of them), you’d have to reconfigure, at least that’s how it works at the time.