Reproducible Node.js built-in snapshots, part 2 - V8 code cache and snapshot blobs

In the previous post, we covered how the Node.js built-in snapshot is generated and embedded into the executable, and how I fixed the Node.js bits of the snapshot to make the executable more reproducible. Now we get to the harder part - the binary V8 startup snapshot blob and the code cache were still not reproducible after the aforementioned fix, so it’s time to dig into V8.

Important V8 flags for reproducible snapshot & code cache

To make the V8 startup snapshot and the code cache reproducible, from the top of my head I knew there were at least two V8 flags that needed to be set:

A fixed --random_seed (the value can be arbitrary as long as it’s fixed, for example --random_seed=42). In V8, many JavaScript objects (e.g. strings, maps) are built upon seeded hashes and by default, the seed is randomly chosen at startup, so the hashes would also vary from run to run. By fixing a --random_seed the hashes would be reproducible in snapshots. Note that at snapshot deserialization time, V8 would still choose a new hash seed, but it will also rehash all relevant fields to avoid hash flooding attack.
--predictable which guarantees predictable GC schedules and compilation.

The first thing that I noticed was that the snapshot generation code in Node.js only set --random_seed but not --predictable. After adding this flag to the path used by node_mksnapshot, the variability in the V8 startup snapshot and the code cache greatly reduced. But then I also noticed that there was a startup performance regression with this change.

To figure out why the performance regression happened, I added a bit more logs to --profile-deserialization. It turned out that V8 has a flag matching check for the code cache. If the V8 flags used to compile the code cache are different from the ones used to deserialize it, V8 would refuse to use the provided code cache - unless the flags are in a set that are known to be safe to ignore. --predictable wasn’t in that ignore set, so when the Node.js internal JavaScript code was compiled with --predictable during the snapshot/code cache generation process, but deserialized without --predictable at runtime, there would be cache misses and performance regressions.

--predictable and the flags it implies don’t really affect the validity of the code cache, so the right solution here should be adding it to the ignore set of the flag check. I upstreamed another patch to V8 to do this, and the performance regression went away.

V8 startup snapshot blob structure

After fixing the V8 flags, the code cache became reproducible and remaining moving bits were all in the v8_snapshot_blob_data. This was a lot harder to fix since I needed to do some anatomy on a binary blob to figure out where the moving bits came from, and there weren’t many tools to facilitate this process.

At the time of writing, the V8 startup snapshot blob roughly has the following layed out (see comment in src/snapshot/snapshot.cc):

[  uint32  ] number of contexts N
[  uint32  ] rehashability
[  uint32  ] checksum
[  uint32  ] read-only snapshot checksum
[ 64 bytes ] version string
[  uint32  ] offset to readonly snapshot
[  uint32  ] offset to shared heap snapshot
[  uint32  ] offset to context 0 snapshot
[  uint32  ] offset to context 1 snapshot
...
[  uint32  ] offset to context N-1 snapshot
------ HEADER ----
[   ...    ] startup (isolate) snapshot data
[   ...    ] read-only snapshot data (offset recorded in the header)
[   ...    ] shared heap snapshot data (offset recorded in the header)
[   ...    ] context 0 snapshot data (offset recorded in the header)
[   ...    ] context 1 snapshot data (offset recorded in the header)
...
[   ...    ] context N-1 snapshot data (offset recorded in the header)

In the case of Node.js, there are 4 context snapshots for different kinds of contexts that can be created in Node.js:

The default context without anything Node.js specific
The underlying V8 context of vm.Context
Main context of worker threads
Main context of the main (non-Worker) Node.js thread

Making the V8 startup snapshot diff easier to read

Notice in the summary of node_snapshot.cc above, the v8_snapshot_blob_data was generated as a static octal string literal. Previously similar data had been written as static const char v8_snapshot_blob_data[] (as a very old blog post of mine described). Long array literals like this could be very slow to compile for some compilers e.g. GCC/Clang, so Keyhan Vakil contributed a patch to encode them as string literals to speed up the compilation. For diffing the snapshots though, array literals would still be a bit easier to diff than string literals, so I added another configure switch to optionally write them as array literals with some comments annotating the offsets.

$ ./configure --write-snapshot-as-array-literals
$ make V=
$ mv out/Release/obj/gen/node_snapshot.cc ./node_snapshot.cc
$ make V=
$ diff out/Release/obj/gen/node_snapshot.cc ./node_snapshot.cc

Then I would get diff output that looked like this

9c9
< static const char v8_snapshot_blob_data[] = {4,0,0,0,1,0,0,0,52,90,-119,-9,49,49,46,51,46,50,52,52,46,56,45,110,111,100,101,46,49,52,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,  // 0
---
> static const char v8_snapshot_blob_data[] = {4,0,0,0,1,0,0,0,-125,90,-8,1,49,49,46,51,46,50,52,52,46,56,45,110,111,100,101,46,49,52,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,  // 0
13318c13318
< 16,75,98,0,0,0,0,16,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,44,3,-39,3,-128,93,68,102,0,0,0,0,0,0,0,0,49,72,-16,-115,-31,32,0,0,0,0,0,0,0,0,0,0,0,0,0,0,  // 13309
---
> 16,75,98,0,0,0,0,16,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,44,3,-39,3,-128,93,68,102,0,0,0,0,0,0,0,0,-55,17,-16,-115,-31,32,0,0,0,0,0,0,0,0,0,0,0,0,0,0,  // 13309

Later I realized that since we also have implemented --build-snapshot as a feature for Node.js to generate a snapshot from user scripts at runtime, we can also just add a special script ID (I made it node:generate_default_snapshot) that Node.js recognizes to generate the snapshot using the same set of initialization scripts run by the building process, instead of running some user-provided script. Then the snapshot can be diffed like this:

function generateSnapshot() {
  child_process.spawnSync(
    process.execPath,
    [
      '--random_seed=42',
      '--predictable',
      '--build-snapshot',
      'node:generate_default_snapshot',
    ],
    {
      env: { ...process.env, NODE_DEBUG_NATIVE: 'SNAPSHOT_SERDES' },
    });
  return fs.readFileSync('./snapshot.blob');
}

const snapshot1 = generateSnapshot();
const snapshot2 = generateSnapshot();

// Diff snapshot1 and snapshot2 with code

So I later added this special default snapshot generation switch and a test in the core test suite to detect reproducibility regressions in the CI.

Identifying moving bits in the V8 startup snapshot

With some better diffing tools at my disposal, the process of fixing the moving bits went like this:

Find the offset of the next moving bits (skipping the checksum in the snapshot header - it would only stop moving when the rest of the snapshot become fixed, anyway) in the V8 startup snapshot blob.
Compare the offset with the offsets recorded in the V8 startup snapshot header to figure out which part of the snapshot it was in. In my investigation they all came from the context snapshot 3 i.e. the main Node.js context.
Put some breakpoints in the v8::internal::SnapshotByteSink::Put* methods in a debugger that can be hit when the snapshot serialization code is writing to the target offset, then go to the caller frames (usually somewhere in ContextSerializer::SerializeObjectImpl()) to print out the object being serialized in order to figure out what it is.

In a debug build there is also a Serializer::PrintStack() method available to print out the whole reference stack from within the snapshot serializers. Although during my investigation navigating in selected frames was already good enough for me.

Knowing what object was changing during snapshot generation, go back to the Node.js/V8 internals to get them fixed.

For example, from the logs enabled by SNAPSHOT_SERDES and some additional print statements I put locally in V8, I could tell that:

The v8 startup snapshot started from 0x3a in the whole Node.js snapshot data, with the first 4 bytes being the size of the V8 startup snapshot
From within the V8 snapshot blob, 0x1016e4...0x1818a4 contained context #3
The first non-checksum diff started from 0x1472fe (from the whole snapshot)

Then context #3 started from 0x1016e4 + 0x3a + 4 = 0x101722 in the entire snapshot blob, and the diff starts from 0x1472fe - (0x1016e4 + 0x3a + 4) = 0x45bdc in the snapshot bytes sink of context #3.

....Node.js data in the snapshot....
[   0x3a   ] V8 snapshot blob size
[   0x3e   ] number of contexts (4)
...
[   0x9a   ] offset to context 3 snapshot (0x1016e4)
....
[ 0x101722 ] context 3 snapshot data
...
[ 0x1472fe ] diff starts (0x45bdc from within the context 3 snapshot)

I could just add some code in v8::internal::SnapshotByteSink::Put* methods so that my breakpoint would hit when data_.size() was around 0x45bdc, and debug from there.

Up next: fixing the V8 startup snapshot

In the next post, I’ll cover how I used the steps mentioned above to fix the V8 startup snapshot data in the Node.js built-in snapshot.