This year I spent some time fixing a few long-standing issues in the Node.js vm APIs that had been blocking users from upgrading away from Node.js v16 (End-of-Life). It had been an interesting journey, so I decided to write a few blog posts about it. Hopefully they can be helpful for posterity.

The issue

The story began when I was approached by a user with a long-standing memory leak in vm.compileFunction(). This could be easily reproduced with the following snippet:

1
2
3
4
5
6
7
const vm = require('vm');
const code = `console.log("${'hello world '.repeat(1e5)}");`;
for (let i = 0; i < 10000; i++) {
const foo = vm.compileFunction(code, [], {
importModuleDynamically: () => {},
});
}

Being a long-time watcher of the Node.js issue tracker, I did remember seeing this bug around. I didn’t normally work on the vm APIs, however, so I had never really looked into the details of this bug. I did some basic triaging for it, and a quick search in the issue tracker led me to a duplicate issue. It seemed there was already a proof-of-concept fix that was waiting for a V8 patch to work. So at that point, my suggestion was to wait until the V8 patch make progress first.

A leaky memory graph

Some time passed since my initial research into the issue. Then, I started working on the cppgc (Oilpan) integration. As part of the design process, I walked through the memory management of existing embedder objects in Node.js. It came to my notice that the node::CompiledFnEntry internal class used to manage vm.compileFunction() with importModuleDynamically did not look quite right.

For background, in Node.js v16, an importModuleDynamically option was introduced to vm.compileFunction()/vm.SourceTextModule()/vm.Script (and its shortcuts), which allows users to customize how ES modules should be loaded when a script/function/module - or, a referrer - invokes dynamic import(). Internally, when compiling the JavaScript code for these, V8 takes an array from Node.js that contains data (which can only be primitives) associated with the particular referrer being compiled. Whenever dynamic import() is invoked in a V8 instance, V8 would invoke a global v8::HostImportModuleDynamicallyCallback provided by the embedder (in this case Node.js), passing in the data associated with the referrer where this import() call is initiated.

Take this use case for example:

1
2
3
const foo = require('vm').compileFunction(`import('fs')`, [], {
importModuleDynamically(specifier) { return import(specifier); }
});

It should be easier to describe what Node.js did internally for this with some heavily commented code snippets, so here is what Node.js roughly did in C++ (with exception-handling simplified) for vm.compileFunction():

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
void ContextifyContext::CompileFunction(
const FunctionCallbackInfo<Value>& args) {
// Get the node::Environment which represents the running Node.js instance.
Environment* env = Environment::GetCurrent(args);

// Generate a numeric ID for the function.
uint32_t id = env->get_next_function_id();

// Create an array, store the type and the ID in it.
Local<PrimitiveArray> host_defined_options =
PrimitiveArray::New(isolate, loader::HostDefinedOptions::kLength);
host_defined_options->Set(
isolate,
loader::HostDefinedOptions::kType,
Number::New(isolate, loader::ScriptType::kFunction));
host_defined_options->Set(
isolate, loader::HostDefinedOptions::kID, Number::New(isolate, id));

// Compile the function with the generated host-defined options.
ScriptOrigin origin(..., host_defined_options);
ScriptCompiler::Source source(code, origin, cached_data);
Local<Function> fn = ScriptCompiler::CompileFunction(
...
&source
...).ToLocalChecked();

// Generate a node::CompiledFnEntry object with a JS CompiledFnEntry wrapper.
Local<Object> cache_key =
env->compiled_fn_entry_template()->NewInstance(context).ToLocalChecked();
// The constructor creates a weak v8::Global referencing the compiled function,
// whose weak callback deletes node::CompiledFnEntry.
CompiledFnEntry* entry = new CompiledFnEntry(env, cache_key, id, fn);
// Save the node::CompiledFnEntry in a map in node::Environment for future lookup.
env->id_to_function_map.emplace(id, entry);

// Create an object with { function, cacheKey } to return to JS land.
Local<Object> result = Object::New(isolate);
result->Set(parsing_context, env->function_string(), fn).Check();
result->Set(parsing_context, env->cache_key_string(), cache_key).Check();
args.GetReturnValue().Set(result);
}

In the JS land Node.js roughly did this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
function compileFunction(code, params, options) {
// Invoke the C++ binding to compile the function
const result = nativeCompileFunction(...);

if (options.importModuleDynamically !== undefined) {
// Use an utility to wrap the user-provided callback with error-handling.
const wrapped = importModuleDynamicallyWrap(importModuleDynamically);
const func = result.function;

// Create a reference from the cacheKey (JS wrapper of the native
// node::CompiledFnEntry object) to the wrapped callback with a closure
// capturing the compiled function.
callbackMap.set(result.cacheKey, {
importModuleDynamically: (s, _k, i) => wrapped(s, func, i),
});
}
}

(For more explanation on how the C++ and JavaScript talk to each other in Node.js core, check out the documentation or this talk I gave at NodeConf EU).

When user executed the compiled function and that ended up calling import(), V8 would invoke the global v8::HostImportModuleDynamicallyCallback configured by Node.js, passing in the numeric ID that Node.js associated with that function when it was compiled. Node.js then looked up the native env->id_to_function_map to get the node::CompileFnEntry native object, followed it to get to the CompiledFnEntry JavaScript wrapper object, used that to look up the importModuleDynamically closure wrapping the user-provided callback using the aforementioned JavaScript WeakMap, and invoked it.

Consider this script:

1
2
3
const fn = require('vm').compileFunction(`import('fs')`, [], {
importModuleDynamically(specifier) { return import(specifier); }
});

The memory graph keeping the importModuleDynamically callback (in the diagram let’s call it iMD for brevity) working looked like this:

Now, there was a problematic link in all this - it may not have been obvious when this was originally implemented, but if you take a step back and analyze the overall memory management like what I happened to do, it would be easier to spot - the reference from the compiled JavaScript function to the node::CompiledFnEntry native object was expressed using an out-of-band v8 weak callback (denoted in the diagram as dashed lines with a diamond at the end), while the node::CompiledFnEntry itself was effectively kept alive by a strong v8::Global reference to the CompiledFnEntry JavaScript wrapper:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
void CompiledFnEntry::WeakCallback(
const WeakCallbackInfo<CompiledFnEntry>& data) {
CompiledFnEntry* entry = data.GetParameter();
delete entry;
}

CompiledFnEntry::CompiledFnEntry(Environment* env,
Local<Object> object,
uint32_t id,
Local<Function> fn)
// The `BaseObject` base class constructor creates a strong
// `v8::Global` referencing `object`.
: BaseObject(env, object), id_(id), fn_(env->isolate(), fn) {
fn_.SetWeak(this, WeakCallback, v8::WeakCallbackType::kParameter);
}

Here Node.js tried to inform V8 that node::CompiledFnEntry should only be kept alive by the compiled JavaScript function. Once node::CompiledFnEntry got deleted by the weak callback, the strong v8::Global it used to keep the JavaScript CompiledFnEntry wrapper alive would be gone too. When the compiled function became unreachable in the JavaScript heap outside this cycle, the entire cycle should be considered unreachable and could be collected.

This was problematic because V8 only understands references expressed via pointers that could be discovered in the V8 heap (such as internal fields of a JS object, or key-value pairs in a JS map). In the eyes of V8, the weak callbacks are only native pointers to some code compiled by a C++ compiler, living outside of the V8 heap. V8 does not understand whatever lifetime relationship expressed by the semantics of that C++ code. Meanwhile, it also saw that there was a strong v8::Global to the JavaScript CompiledFnEntry wrapper, and V8 did not get enough comprehensible hint regarding when the wrapper could be considered unreachable. As a result, only the node::Environment shutdown (which happens when the Node.js instance is shut down) would destroy the node::CompiledFnEntry. For users compiling many functions in a single process and expecting, they would effectively get a leak.

Prior work

Knowing that there were some proposed fixes, I became curious about them. My starting point was doing a summary of the existing memory management models to figure out how to migrate them to use cppgc, then it seemed useful to me to know what the future memory management would look like, and see how that could be migrated to use cppgc later. So I went back to the issue with proposed fixes. At that time, two fixes were proposed:

  1. The first proposed fix argued that he compiled function could live for a very long time, so it being alive would keep the closure alive. Therefore to prevent a memory leak, we should exclude it from the closure - even though that would be a breaking change by removing the second parameter (the referrer) of the importModuleDynamically callback, and lead to a mismatch with the hook specified by ECMA262.
  2. The second proposed fix pointed out that the if V8 allowed arbitrary objects (instead of just primitives) as host-defined options (which was being implemented by the V8 CL it was waiting for), we could just use the CompiledFnEntry wrapper as a host-defined option and get rid of the env->id_to_function_map, and this simplification should be enough to fix the leak.

After some investigation my conclusion was:

  1. The compiled function being alive was less of an issue. As long as the cycle was discoverable by V8, when the function became the only thing keeping the rest of the graph alive and when no other user JavaScript held the function alive (e.g. when they invoked it and only made use of the returned result), V8’s garbage collector should be capable of just collecting the entire cycle.
  2. The second fix was close to the root cause but the proof-of-concept PR didn’t actually remove the most problematic link - the v8::Global reference holding the CompiledFnEntry wrapper alive. So when I tested it locally, the leak still persisted.

First attempted fix

At this point, it looked like there could be an easy alternative fix: we just needed to find a way to make this v8::Global reference weak, and use a different reference that V8 could actually understand to make sure that as long as the import() calls in the compiled function could still be invoked, the CompiledFnEntry wrapper would be kept alive. This meant that we need to find an object that’s alive as long as the compiled function can be executed, and set up a GC-discoverable reference from it to the CompiledFnEntry wrapper.

To me it seemed natural that this object should be the compiled function itself. After all, when the import() call inside the function can still be invoked, it must mean that someone is still holding on to the compiled function to execute that import() call, right? So the solution seemed simple - just store the CompiledFnEntry wrapper as a private symbol property - which could be discovered by V8’s garbage collector - in the compiled function, and we would be safe to make that v8::Global weak.

In my brain, the updated graph should look like this, and the cycle should be understood by V8 now (I marked the updates with a ★):

Regression

I opened a PR with my fix, and The CI seemed to be happy with it, so we landed it on the main branch. But as soon as it got released, we received reports of regressions about use-after-free crashes in the user land, so we quickly reverted the fix, and revisited the fix to find out what was wrong.

Recall our previous assumption:

If import() call inside the function can still be invoked, it must mean that someone is still holding on to the compiled function to execute that import() call, right?

It turned out that this assumption was incorrect. This only applied to top-level import() calls. For a nested import() call like this:

1
2
3
4
5
6
7
8
9
10
11
12
function getNested() {
const fn = require('vm').compileFunction(
`function nested() { return import('fs'); } return nested;`,
[],
{
importModuleDynamically(specifier) { return import(specifier); }
});
return fn();
}

const nested = getNested();
nested(); // Calls import()

The memory management was actually more nuanced. It roughly looked like the diagram below - specifically, look at the bottom-right corner that is now corrected, now we know that the context of getNested(), which kept fn() alive, could be garbage collected independently from nested().

Here the function nested(), which was produced by an invocation of getNested() and could invoke import() inside fn(), did not keep the top-level fn() alive, unlike what we previously assumed, and simply lived outside the cycle that we tried to fix. If user code discarded references to the top-level fn() and only retained a reference to the inner nested(), V8 could garbage collect fn() too early and along with it, the entire cycle, including the node::CompileFnEntry in env->id_to_function_map. When import() was invoked from nested(), Node.js would attempt to look that node::CompileFnEntry up from env->id_to_function_map, and, seeing that there was no entry corresponding to the given ID, it knew that there was a use-after-free, and crashed with the assertion.

At this point, we had learned that the compiled function wasn’t the right owner of the wrapper. Do we have a better alternative? After some digging, my conclusion was that it should just be whatever V8 internally used to store the host-defined options. V8 had been trying to move the host-defined options to different places in order to deal with other cache implications (which we’ll take about in a later post). At the time of the regression, for a compiled function, the owner of the host-defined options was the v8::internal::Script associated with the compiled function.

This storage site was completely encapsulated by V8 and was subject to change, so it seemed unwise to develop a fix specific to that. Well, why don’t we just wait for the V8 CL that would allow arbitrary object as host-defined option again? At the end of the day, as long as import() could be called, V8 should give us the right host-defined options that we associated with the compilation result, per the API contract. Then we would not need to care about where the host-defined options are stored at all. So my conclusion was, once again, let’s wait for the V8 CL, even though it needed quite a bit of internal refactoring in V8 to work, so it would take a while.

Another idea

A few months later, when I moved along the path of cppgc migration and looked into migration strategies for all kinds of Node.js embedder objects, I set my eyes on CompileFnEntry again. And this time, another fix came up to me, which I’ll talk about in the next post.