Like many other relatively big piece of software, Node.js is no stranger to memory leaks, and with them, fixes and regression tests. Testing against memory leak regressions, however, can be particularly tricky in a runtime with a garbage-collected heap, and quite a few of these tests became source of flakes in the Node.js CI. In the past few months, I’ve been doing some work to improve the reliability of these tests. I’ve also come across a few bug reports in the Node.js issue tracker memory leaks that turn out to be false alarms because the reproductions made incorrect assumptions. Here are my notes about the testing strategies Node.js uses against memory leak regression, my observations about them, and why I added a new testing strategy with a new V8 API. Hopefully this can help the readers write less unreliable memory regression tests/memory leak reproductions.

First, let’s look at one of the earliest strategy used by Node.js to test against memory leaks is based on memory usage measurements. This probably is a result of receiving bug reports from users who found out about the leaks via memory usage monitoring in production. Naturally, their reproduction involved memory measurement and this then went into the test suites.

Measuring heap usage + gc()

This strategy is based on the assumption that gc() (a global function exposed by the --expose-gc V8 flag) should be able to reclaim memory used by objects that are already unreachable. If the tested operation leaks, the memory would not go down after gc(), and there should be a leak.

Take this test (which flaked and was revised to use another strategy we’ll talk about later) for example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
const { ok } = require('assert');
const { subscribe, unsubscribe } = require('diagnostics_channel');

function noop() {}

const heapUsedBefore = process.memoryUsage().heapUsed;

for (let i = 0; i < 1000; i++) {
subscribe(String(i), noop);
unsubscribe(String(i), noop);
}

global.gc();

const heapUsedAfter = process.memoryUsage().heapUsed;

ok(heapUsedBefore >= heapUsedAfter);

The testing procedure is basically:

  1. Measure the memory usage of the heap before allocation starts. In this case the value of heapUsed comes from v8::HeapStatistics::used_heap_size() and the statistics come from v8::Isolate::GetHeapStatistics().
  2. Do the operation that could be leaking (and to avoid false negatives, allocate multiple times to use a significant amount of memory)
  3. Run gc() and then measure memory usage of the heap again
  4. If the memory usage does not go down, it leaks, otherwise there is no leak.

There are several issues that can make the test unreliable, one of which is assuming that gc() would reclaim enough unreachable memory after immediately it returns. But that’s not actually how gc() works. The GC tasks that bring the actual memory usage down could be delayed until the thread is idle i.e. not executing JavaScript (or, one could say it’s asynchronous, conceptually).

gc() multiple times asynchronously

To deal with the delayed effect of gc(), Node.js core’s test suite has a utility which runs the gc() function 10 times via setImmediate() until a condition is true. setImmediate() is chosen because the callback is run in the next iteration of the event loop. By that time the thread has already finished execution of JavaScript on the stack and likely has processed some GC tasks.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
function gcUntil(name, condition) {
return new Promise((resolve, reject) => {
let count = 0;

function gcAndCheck() {
setImmediate(() => {
count++;
global.gc();
if (condition()) {
resolve();
} else if (count < 10) {
gcAndCheck();
} else {
reject(name);
}
});
}

gcAndCheck();
});
}

So in the step 3 mentioned above, instead of doing:

  1. Run gc() and then measure memory usage of the heap again
  2. If the memory usage does not go down, it leaks, otherwise there is no leak.

We do

  1. Run gc(), then measure memory usage again after the current JavaScript execution completes & pending GC tasks are run.
  2. If the usage does not go down, run again, repeat this for up to 10 times. If the memory usage does not go down (enough) within the 10 attempts, it leaks, otherwise there is no leak.

So the test above would’ve been updated to something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
const { subscribe, unsubscribe } = require('diagnostics_channel');

function noop() {}

async function main() {
const heapUsedBefore = process.memoryUsage().heapUsed;

for (let i = 0; i < 1000; i++) {
subscribe(String(i), noop);
unsubscribe(String(i), noop);
}

await gcUntil('heap usage should go down', () => {
const heapUsedAfter = process.memoryUsage().heapUsed;
return heapUsedBefore >= heapUsedAfter;
});
}

main();

That was not what this test ended up looking like eventually, however, because it was still not reliable enough for this particular case. The only remaining example in Node.js core’s test suite that uses this pattern looks like below - it’s measuring RSS (resident set size) because the leak being tested came from the native side, and it was checking wether the memory overhead can go away by comparing the measurement with a multiplier that looked reasonable from local runs - so this is a pretty sketchy test, but it does the job and has not flaked enough to be updated:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

const v8 = require('v8');

const before = process.memoryUsage.rss();

for (let i = 0; i < 1000000; i++) {
v8.serialize('');
}

async function main() {
await gcUntil('RSS should go down', () => {
const after = process.memoryUsage.rss();
return after < before * 10;
});
}

main();

This test works reliably in the CI so far, but do note that it still relies on an shaky assumption - that if the native memory can be reclaimed by the OS, process.memoryUsage.rss() should eventually go down. Resident set size is the amount of physically memory allocated to the process. You might assume that, as long as the allocated memory is released, this is going to drop down immediately - but that’s not actually the case. It is mostly up to the memory allocator being used to decide when to actually return the memory to the system.

Sometimes, there can be a significant amount of fragmentation, and the system is not under memory pressure anyway, so memory allocator could think it’s too expensive to defragement to return the unused memory to the OS, and would rather keep it in case the process need it again. That happens quite a lot with the latest versions of glibc, for example. When that happens, detecting memory leaks based on whether resident set size goes down can produce false positives too. The same can be said about tests based on heapUsed as well. To address this issue, we can give V8 a bit more memory pressure and encourage it to reclaim more memory.

Small heap + pressure test for OOM failure

This is probably one of the most used strategy in Node.js for memory leak testing, though it is getting increasingly unreliable with V8 updates that contain major GC changes.

(If you are not familiar with the design of the V8 garbage collector and the generation layout, check out this blog post from V8).

The idea is essentially:

  1. Set the maxium heap size to a relatively small value.
    • With the default configurations, the V8 heap used by a minial Node.js instance is around 3-4 MB.
    • Typically tests that use this strategy limit the size of the old space to 16-20MB (when there are leaks, the leaking objects and the graph it retains usually end up in the old space).
  2. Repeat the tested operation and make sure that the total memory consumption from it is significantly higher the heap size limit.
    • To make the tests run relatively fast, the heap size set in 1 is usually small so that the test can reach the heap size limit quickly by running fewer operations.
  3. If the test crashes with a Out-Of-Memory failure, it means that the tested operation leave a reachable graph that the V8’s GC cannot purge memory from even under pressure, which indicates that there is a leak. Otherwise there is likely no leak.

An example using this strategy roughly looks like this:

1
2
3
4
5
6
// Flags: --experimental-shadow-realm --max-old-space-size=20

for (let i = 0; i < 100; i++) {
const realm = new ShadowRealm();
realm.evaluate('new TextEncoder(); 1;');
}

There is another issue caused by the way V8’s GC works here. Oftentimes, step 2 is done in a tight loop, which is also done in the example above. In V8, the garbage collection of the old generation is designed to kick in when the JS execution thread is idle to avoid hurting the performance of JS execution. It has been observed that allocating memory in a tight loop can leave very little room for the GC to kick in, leading to flaky tests.

Pressure test for OOM failure with room for GC

To give V8’s GC a bit of room to kick in and avoid false positives, another utility is introduced.

1
2
3
4
5
6
7
8
9
const wait = require('timers/promises').setTimeout;

// Repeat an operation and give GC some breathing room at every iteration.
async function runAndBreathe(fn, repeat, waitTime = 20) {
for (let i = 0; i < repeat; i++) {
await fn();
await wait(waitTime);
}
}

The updated test looks like this:

1
2
3
4
5
6
7
// Flags: --experimental-shadow-realm --max-old-space-size=20
'use strict';

runAndBreathe(() => {
const realm = new ShadowRealm();
realm.evaluate('new TextEncoder(); 1;');
}, 100);

Here we use setTimeout() to give GC sufficient time to kick in. This makes the test run slightly slower, but it is still acceptable and the updated test has been stable enough in the CI.

There is another caveat I’ve observed with this approach: once V8 native coverage collection - specifically, precises coverage collection, which tracks invocation count - is enabled (e.g. via NODE_V8_COVERAGE), the feedback vectors in newly complied code can live longer than usual, since V8 needs them to track the invocation count. If the repeated operation involves compiling new code, the heap size limit chosen in step 1 must be big enough to account for this overhead or the test can still go out of memory even if the tested operation produces a graph that’s ultimately collectable.

Next: finalizer-based testing

As it turns out, testing against memory leaks using memory usage measurements can sometimes be quite tricky. In the next post, I will talk about a different strategy used by Node.js for testing against memory leaks.