Teaching gdb to Unwind V8 JIT Frames on x64
Recently I landed a custom Python-gdb unwinder in V8 that allows gdb to unwind through V8’s JIT-compiled frames on x64. During the process, I ended up digging into gdb’s internals to figure out how to properly implement the unwinder. Here are some notes about it and hopefully it can be useful for others who want to implement similar things for their own JITs.
Background
A team I work with recently ran into trouble debugging V8 crashes in production. The backtraces they got out of the core dumps using gdb were not very useful - while the actual call stack might only be around a dozen frames deep, gdb would produce a huge amount of ?? entries because it could not figure out how to walk through the frames generated by V8’s JIT compiler.
This was reproducible in any program embedding V8. Let’s use Node.js as an example:
1 | $ echo "throw new Error" >> throw.js |
This issue was specific to gdb - lldb was able to unwind through the JIT frames just fine. But in this case the system had to work with gdb, so I started looking into how to teach gdb to unwind these frames correctly.
How stack unwinding usually works
There are many good write-ups about how stack unwinding typically works, for example this one that I found very helpful.
For compiled-ahead-of-time code (like Node.js/V8’s C++ code), the compiler typically emits some useful information in the binary to help debuggers unwind the stack. readelf --debug-dump=frames,frames-interp /path/to/binary would tell you what kind of unwinding information is present in the binary. Unless you strip your binaries or use the most aggressive optimization flags, usually there’s some information left that allows debuggers to unwind the stack of the C++ part.
But for JIT-compiled code, V8 does not, by default, emit any useful information or register the generated code with gdb’s symbol tables (which comes with a runtime overhead). gdb has a JIT compilation interface (gdbjit) that allows a JIT compiler to register its generated code with gdb at runtime. V8 has support for this interface, so that was the first thing I looked at, but it turned out to be non-viable for several reasons.
- The gdbjit support in V8 was broken at the time.
- It requires V8 to be built with gdbjit enabled and the process to be run with the
--gdbjitflag at runtime, which has some overhead.
2 in particular means it is not always practical for post-mortem diagnostics.
The Python unwinder
So we needed something that could work purely without requiring any special build flags or runtime flags to have been set before the crash happened.
Fortunately, V8’s JIT compiler always preserves the rbp chain on x64 by emitting a prologue like this:
1 | pushq %rbp |
You can think of it as - V8 always compiles code as if it had the -fno-omit-frame-pointer flag enabled, which is a common way to make unwinding easier for debuggers. So a prologue analysis that follows the rbp chain should be able to unwind through the JIT frames, and this is how lldb is usually able to unwind these frames. However, something seems off with gdb’s built-in unwinders, and it appears to be taking stack addresses as return addresses when looking at the JIT frames.
Other than the heavier gdbjit interface, gdb allows registering custom frame unwinders written with the Python unwinder API and there was an attempt a few years ago to add one in V8’s shared tools/gdbinit, but that was reverted as it broke step-debugging in gdb. I ended up revisiting the idea and tried a few different design choices. One seemed to work reasonably well and got merged - or at least, it has not yet been reverted again ;)
How to write a custom unwinder
Once the unwinder is registered, whenever gdb encounters a frame, it will call the registered custom unwinders to give them a chance to handle it. At a high level, a custom unwinder looks something like this:
1 | from gdb.unwinder import Unwinder, FrameId |
Since V8 preserves the rbp chain in its JIT-compiled code on x64, we just need to figure out:
- Based on the value of rbp, how to find the caller’s frame and other register values, and pack them into the
UnwindInfoobject returned to gdb. - How to detect whether a given frame is a V8 JIT frame that we can handle (vs a regular C++ frame that gdb can already handle on its own)
Figuring out the FrameId and registers
The trickiest part of writing this unwinder was figuring out what to pass to the FrameId constructor. The gdb documentation says the constructor takes a sp (stack pointer) and an optional pc (program counter), but it was rather ambiguous about what these values should actually be. I ended up having to read the gdb source itself to understand the semantics.
The frame-id.h header in gdb explains that a frame ID consists of a stack address and a code address, both of which must be stable across the lifetime of the frame. gdb uses this pair to uniquely identify frames and to determine the ordering of frames on the stack (i.e. which frame is the caller and which is the callee).
For the stack address, gdb’s own amd64 unwinder uses the CFA (Canonical Frame Address), which is defined as the value of rsp just before the call instruction in the caller. In gdb this was computed as rbp + 16. It took me a while to visualize it in my head. Let’s go back to this prologue again:
1 | pushq %rbp |
So after the prologue, the stack layout looks like this:
1 | rbp+16 <-- caller's rsp before `call` |
If we trace back from the value of rbp in the current frame, we need to step through the saved rbp which occupies 8 bytes, then the return address which occupies another 8 bytes, and above that is where the caller’s rsp was pointing.
This looked about right, so I just followed the same convention and used rbp + 16 as first argument of the frame ID. The previously reverted version used rbp for this, which would not work as well because rbp can change in the middle of the function prologue (after pushq %rbp but before movq %rsp, %rbp), so it would not be stable for the entire lifetime of the frame.
For the code address, ideally one should use the start address of the current function. But since there are no symbols for JIT-compiled functions, there is no easy way to find out about that. I ended up using the return address (dereferenced from rbp + 8) as an approximation instead - it is not perfect (it points into the middle of the caller function rather than the current function’s start), but gdb accepts it and it is sufficient for producing usable backtraces.
So the computation roughly looks like this:
1 | PTR_SIZE = 8 |
Skipping non-JIT frames
I did some logging with gdb (set debug frame 1) to verify whether the unwinder was working correctly, and it turned out that for innocent C++ frames, the Python unwinder was being called before others that are more capable of finding the function start address. That can also be another source of breakage in step-debugging (since then gdb would not be able to understand whether you are stepping into a C++ frame correctly). The earlier version tried to check whether the pc has source-level debug info, which appeared to be too eager - even without it, gdb can still resolve the function name for C++ frames using e.g. the symbol tables, which can also be used by gdb to unwind better than us. So I used a more conservative check: if gdb can already symbolicate the frame somehow, we should just skip it and let gdb’s built-in unwinders handle it:
1 | if pending_frame.name() is not None: |
After the fix
With the unwinder loaded, the commands produce a much more useful backtrace:
1 | gdb -x /path/to/v8/tools/gdbinit --args node --abort-on-uncaught-exception throw.js |
The V8 JIT frames still show up as ?? in terms of function names (it will take a lot more effort to symbolicate them), but at least now gdb can walk through them correctly and reach the C++ frames on the other side. This makes it clearer what’s causing the crash.
Since this is now merged, to use it, you can source V8’s tools/gdbinit (which now registers the unwinder) when opening a core dump on x64 Linux. For example, to print all the stack traces quickly:
1 | gdb -q -x /path/to/v8/tools/gdbinit -ex 'thread apply all bt full' -ex 'quit' /path/to/executable /path/to/core |