Fixing Node.js vm APIs, part 4 - hitting the compilation cache again

In the last post I wrote about how I finally managed to fix the memory problems in the vm APIs, and it turned out that there was another issue blocking users from upgrading away from the End-of-Life Node.js v16 - vm.Script compilation could be ~100x slower in v16 in some cases. In this post, let’s look at what was causing the performance regression, and the fix/workaround developed for it.

The performance regression

The regression could be reproduced with this snippet - this should be run in the Node.js source directory, where test/fixtures/snapshot/typescript.js contains a built of the TypeScript library that’s as big as 10MB, since we need a giant JavaScript file to make the regression really show:

const { readFileSync } = require('fs');
const { Script, compileFunction } = require('vm');

const fileSource = readFileSync('./test/fixtures/snapshot/typescript.js', 'utf8');

const start = process.hrtime.bigint();
for (let i = 0; i < 100; i++) {
  new Script(fileSource, {
    async importModuleDynamically() {}
  });
}
console.log(Number((process.hrtime.bigint() - start) / 1_000_000n) + 'ms');

On Node.js versions <= 16.10.0, compiling this big file for 100 times took about 56ms on a beefy laptop, while on Node.js versions >= 16.11.0, this took about 5600ms - so it appeared that there was a 100x performance regression in repeated vm.Script compilation.

Analysis of the regression

As mentioned in the previous posts, compilation via the vm.Script basically comes down to compiling a v8::UnboundScript. This usually involves parsing the given source code and generating bytecode for the top-level JavaScript code (bytecode for inner functions is usually only generated when the inner functions are actually invoked). As fast as V8’s parser and bytecode generator could be, compiling a 10MB script for 100 times for real would definitely take more than 56ms. Then why was it so fast prior to v16.11.0? Actually, in the example above, V8 used to only compile the script for real when vm.Script was invoked for first time (which took about 54ms). The other 99 compilations were not actually done from scratch, and they only took about 2ms in total. So technically the example above looked ~100x (or, 99x) slower because for the 99 subsequent compilations, we were almost comparing “compiling 99 scripts for real” v.s. “99 hash table lookups”. If we increased the iteration to n, it would even appear to be n - 1 times slower.

Now let’s look at how the compilation cache works. When V8 successfully compiles a script, it stores the corresponding v8::internal::SharedFunctionInfo inside a compilation cache table. v8::internal::SharedFunctionInfo is the internal equivalent to v8::UnboundScript and references the v8::internal::Script we mentioned in previous posts. The next time V8 is told to compile a script, it first looks up this table to see if there is already a compiled script with the same source code and the same metadata. The metadata includes, among others, the host-defined options. If V8 can find a script with matching source and matching metadata in the cache, it would just return the already compiled script; otherwise it would compile a new script from script, and put it into the cache for later lookups.

Prior to v16.11.0, Node.js did not configure any host-defined options to vm.Script, so the compilation cache could be hit, and the subsequent 99 compilations reduced to just 99 hash table lookups, which as we saw could be incredibly fast. Later, the importModuleDynamically option was introduced in Node.js 16, and the implementation of it configured the host-defined options for every vm.Script even if users do not use the importModuleDynamically at all. As a result, compiling the same source code using vm.Script no longer hit the compilation cache, so it became significantly slower, since V8 would have to compile the scripts for real every time. In addition, without hitting the compilation cache, V8 had to regenerate a script every time and put it in the cache, so the cache would be full of the scripts with the same source code but slightly different host-defined options, until the scripts proved to be useless after 4 non-forced garbage collection cycles. This could lead to a significant waste of memory.

After the memory issues in vm was fixed, I got a lot of requests asking for help with this issue. Apparently this regression turned out to be critical for part of the Node.js ecosystem. Among others, Jest users were quite heavily hit by this issue, since Jest used vm.Script a lot and relied on the cache hits to be performant. If we let this go, many users would get stuck with the now End-of-Life v16 for an indefinite amount of time. So it seemed urgent enough to develop a fix/workaround for it.

Working around the performance regression

Recall that I mentioned:

Later, the importModuleDynamically option was introduced in Node.js 16, and the implementation of it configured the host-defined options for every vm.Script even if users do not use the importModuleDynamically at all.

While it could be tricky to fix the cache miss when importModuleDynamically was actually used, that was not actually necessary to unblock most users in the upgrade, because importModuleDynamically was only part of an experimental API. The majority of users complaining were not actually using that option. So at least we could first fix the regression for the case where importModuleDynamically was unused.

Now, after the memory fixes, we had switched to symbols as the host-defined options identifying different importModuleDynamically settings. It occurred to me that we could just use a constant symbol to indicate that importModuleDynamically was not even configured by users. Then all the vm.Script compilations not using importModuleDynamically could have the same host-defined options, and hit the compilation cache again.

So I opened another PR that implemented this idea and that indeed made the regression go away for the simple reproduction above. But for Jest users, this was not enough, because Jest actually did use this callback, even though when users did not use import() in the tests, it was only emitting a error saying that --experimental-vm-modules was not configured to enhance the debugging experience. Since we thought it should be impossible to return anything useful in the importModuleDynamically callback when --experimental-vm-modules was not enabled, I added another constant symbol to implement this error internally, and made it a requirement for users to use --experimental-vm-modules if they intend to use importModuleDynamically - this probably should’ve been done when importModuleDynamically was introduced as an experimental API anyway, and it was not too late to do it then.

Releasing and verifying the patches

Now with the memory issues and the performance regressions gone, at least users who do not actually need import() support in the vm API can upgrade to newer versions of Node.js. The patches were backported v21 and v20, and we got verification from Jest users that the memory consumption and test run time improved significantly after updating to the versions with the patches (1 2 3). So the really long Jest issue caused by the regressions could finally be closed.

It’s not certain whether the fixes would be released in v18 - this is up to the discretion of the release team, because v18 is now in maintenance mode - though I’ve done the backport anyway.

What’s next?

As mentioned the patches for the performance regression were mostly workarounds. When --experimental-vm-modules goes out of experimental status, or when the flag and importModuleDynamically are actually used, the regression would come back again. There are a few things that could be done to fix the regressions for more use cases:

Recall that:

…we thought it should be impossible to return anything useful in the importModuleDynamically callback when --experimental-vm-modules was not enabled…

This assumption turned out to be incorrect. Users could still do:
1
function importModuleDynamically(specifier) { return import(specifier); }
to polyfill using the default loader in the main context, instead of using any of the vm APIs enabled by --experimental-vm-modules. In this case it seems unnecessarily strict to require the use of --experimental-vm-modules. One workaround would be to simply add a global option that implements this behavior, which is probably useful on its own, even not for the cache implications. We can use another constant symbol for this global option, thus hitting the compilation cache with this constant. I’ve opened a work-in-progress PR for this.
A per-script importModuleDynamically() option doesn’t really seem to be that necessary for users. When I looked into the bug reports, it seemed most users of importModuleDynamically tended to reuse one callback for many different scripts. So, we could also look into an alternative API that allows users to customize dynamic import() handling for a group of scripts, thus allowing repeated compilation in this group to hit the compilation cache.
While the regression in vm.Script could be worked around, vm.compileFunction() is still not hitting the cache. After some investigation, I realized that there is no compilation cache support for the underlying ScriptCompiler::CompileFunction() at all - vm.compileFunction() was a later invention in the V8 API, so this was probably just an unintentional negligence. I uploaded a V8 CL to implement cache support for that.
Ultimately, if V8 does not cache the entire v8::internal::SharedFunctionInfo and only cache the bytecode (which is the most expensive to generate), it does not have to care about host-defined options anymore when hitting the cache. This also happens to be important for the “allowing arbitrary objects as host-defined options” patch that we were waiting for. This requires some internal refactoring in V8, which is being carried out by the V8 team (tracking issue). Once that is completed, how we manage host-defined options would no longer matter for caching, and the cache misses could be fixed automatically.

Final remarks

This has been a interesting ride. I started out working on an internal refactoring and ended up fixing bugs that affected a substantial chunk of the ecosystem. Some lessons I’ve learned along the way:

It’s useful to learn about random things in the code bases. Even if they don’t seem to matter in what you work on at the moment, they could come back being critical in fixing other stuff the future. This was what happened with my accidental knowledge of the WeakMap semantics, Symbol-as-WeakMap keys proposal, some garbage collection details, or how the internal fields are stored in the V8 embedder objects. I guess this is what people refer to as “experience”. :)
Don’t get too obsessed with a perfect fix. If a bug is damaging enough that it should be fixed sooner than later, workarounds are fine. I think I sometimes tend to wait for a more theoretically sound solution instead of thinking about a workaround for the time being, which probably also led to the delay of the fixes when others tried looking into the issues, until v16 reached End-of-Life and the issues became more urgent.

Though I did not set out to fix the vm issues, I would not have had enough time to look into them without support from Bloomberg & Igalia, for which I am very grateful. For other work sponsored by Bloomberg that is related to Node.js, check out this post by Daniel Ehrenberg.