Recently I landed a custom Python-gdb unwinder in V8 that allows gdb to unwind through V8’s JIT-compiled frames on x64. During the process, I ended up digging into gdb’s internals to figure out how to properly implement the unwinder. Here are some notes about it and hopefully it can be useful for others who want to implement similar things for their own JITs.

Background

A team I work with recently ran into trouble debugging V8 crashes in production. The backtraces they got out of the core dumps using gdb were not very useful - while the actual call stack might only be around a dozen frames deep, gdb would produce a huge amount of ?? entries because it could not figure out how to walk through the frames generated by V8’s JIT compiler.

This was reproducible in any program embedding V8. Let’s use Node.js as an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
$ echo "throw new Error" >> throw.js
$ gdb --args node --abort-on-uncaught-exception throw.js
(gdb) run
(gdb) bt
#0 operator() (__closure=<optimized out>) at ../../deps/v8/src/base/platform/platform-posix.cc:801
#1 v8::base::OS::Abort () at ../../deps/v8/src/base/platform/platform-posix.cc:801
#2 0x0000555556ed1251 in v8::internal::Isolate::CreateMessageOrAbort (this=this@entry=0x55555eb38000, exception=..., exception@entry=..., location=location@entry=0x7fffffffd080)
at ../../deps/v8/src/execution/isolate.cc:2151
#3 0x0000555556ed17f0 in v8::internal::Isolate::Throw (this=this@entry=0x55555eb38000, raw_exception=..., location=<optimized out>, location@entry=0x0) at ../../deps/v8/src/execution/isolate.cc:2250
#4 0x000055555789f921 in v8::internal::__RT_impl_Runtime_Throw (isolate=0x55555eb38000, args=...) at ../../deps/v8/src/runtime/runtime-internal.cc:63
#5 v8::internal::Runtime_Throw (args_length=<optimized out>, args_object=0x7fffffffd148, isolate=0x55555eb38000) at ../../deps/v8/src/runtime/runtime-internal.cc:60
#6 0x00007fffd7e16a36 in ?? ()
#7 0x00000bc71f7dd791 in ?? ()
#8 0x00007fffffffd120 in ?? ()
#9 0x0000000000000006 in ?? ()
... hundreds of ?? frames ...
--Type <RET> for more, q to quit, c to continue without paging--

This issue was specific to gdb - lldb was able to unwind through the JIT frames just fine. But in this case the system had to work with gdb, so I started looking into how to teach gdb to unwind these frames correctly.

How stack unwinding usually works

There are many good write-ups about how stack unwinding typically works, for example this one that I found very helpful.

For compiled-ahead-of-time code (like Node.js/V8’s C++ code), the compiler typically emits some useful information in the binary to help debuggers unwind the stack. readelf --debug-dump=frames,frames-interp /path/to/binary would tell you what kind of unwinding information is present in the binary. Unless you strip your binaries or use the most aggressive optimization flags, usually there’s some information left that allows debuggers to unwind the stack of the C++ part.

But for JIT-compiled code, V8 does not, by default, emit any useful information or register the generated code with gdb’s symbol tables (which comes with a runtime overhead). gdb has a JIT compilation interface (gdbjit) that allows a JIT compiler to register its generated code with gdb at runtime. V8 has support for this interface, so that was the first thing I looked at, but it turned out to be non-viable for several reasons.

  1. The gdbjit support in V8 was broken at the time.
  2. It requires V8 to be built with gdbjit enabled and the process to be run with the --gdbjit flag at runtime, which has some overhead.

2 in particular means it is not always practical for post-mortem diagnostics.

The Python unwinder

So we needed something that could work purely without requiring any special build flags or runtime flags to have been set before the crash happened.

Fortunately, V8’s JIT compiler always preserves the rbp chain on x64 by emitting a prologue like this:

1
2
pushq %rbp
movq %rsp, %rbp

You can think of it as - V8 always compiles code as if it had the -fno-omit-frame-pointer flag enabled, which is a common way to make unwinding easier for debuggers. So a prologue analysis that follows the rbp chain should be able to unwind through the JIT frames, and this is how lldb is usually able to unwind these frames. However, something seems off with gdb’s built-in unwinders, and it appears to be taking stack addresses as return addresses when looking at the JIT frames.

Other than the heavier gdbjit interface, gdb allows registering custom frame unwinders written with the Python unwinder API and there was an attempt a few years ago to add one in V8’s shared tools/gdbinit, but that was reverted as it broke step-debugging in gdb. I ended up revisiting the idea and tried a few different design choices. One seemed to work reasonably well and got merged - or at least, it has not yet been reverted again ;)

How to write a custom unwinder

Once the unwinder is registered, whenever gdb encounters a frame, it will call the registered custom unwinders to give them a chance to handle it. At a high level, a custom unwinder looks something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from gdb.unwinder import Unwinder, FrameId

class MyUnwinder(Unwinder):
def __init__(self):
super().__init__("MyUnwinder")

def __call__(self, pending_frame):
if cannot_handle(pending_frame):
return None

# Read registers and memory from the pending frame to figure out
# the caller's register values.
frame_id = FrameId(callers_sp, callers_pc)
unwind_info = pending_frame.create_unwind_info(frame_id)
unwind_info.add_saved_register("rsp", ...)
unwind_info.add_saved_register("rip", ...)
unwind_info.add_saved_register("rbp", ...)
return unwind_info

gdb.unwinder.register_unwinder(None, MyUnwinder(), replace=True)

Since V8 preserves the rbp chain in its JIT-compiled code on x64, we just need to figure out:

  1. Based on the value of rbp, how to find the caller’s frame and other register values, and pack them into the UnwindInfo object returned to gdb.
  2. How to detect whether a given frame is a V8 JIT frame that we can handle (vs a regular C++ frame that gdb can already handle on its own)

Figuring out the FrameId and registers

The trickiest part of writing this unwinder was figuring out what to pass to the FrameId constructor. The gdb documentation says the constructor takes a sp (stack pointer) and an optional pc (program counter), but it was rather ambiguous about what these values should actually be. I ended up having to read the gdb source itself to understand the semantics.

The frame-id.h header in gdb explains that a frame ID consists of a stack address and a code address, both of which must be stable across the lifetime of the frame. gdb uses this pair to uniquely identify frames and to determine the ordering of frames on the stack (i.e. which frame is the caller and which is the callee).

For the stack address, gdb’s own amd64 unwinder uses the CFA (Canonical Frame Address), which is defined as the value of rsp just before the call instruction in the caller. In gdb this was computed as rbp + 16. It took me a while to visualize it in my head. Let’s go back to this prologue again:

1
2
pushq %rbp
movq %rsp, %rbp

So after the prologue, the stack layout looks like this:

1
2
3
rbp+16  <-- caller's rsp before `call`
rbp+8 [ return address (saved rip) ]
rbp+0 [ saved rbp ] <-- current frame pointer

If we trace back from the value of rbp in the current frame, we need to step through the saved rbp which occupies 8 bytes, then the return address which occupies another 8 bytes, and above that is where the caller’s rsp was pointing.

This looked about right, so I just followed the same convention and used rbp + 16 as first argument of the frame ID. The previously reverted version used rbp for this, which would not work as well because rbp can change in the middle of the function prologue (after pushq %rbp but before movq %rsp, %rbp), so it would not be stable for the entire lifetime of the frame.

For the code address, ideally one should use the start address of the current function. But since there are no symbols for JIT-compiled functions, there is no easy way to find out about that. I ended up using the return address (dereferenced from rbp + 8) as an approximation instead - it is not perfect (it points into the middle of the caller function rather than the current function’s start), but gdb accepts it and it is sufficient for producing usable backtraces.

So the computation roughly looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
PTR_SIZE = 8
fp = int(pending_frame.read_register("rbp"))
prev_rsp = fp + PTR_SIZE * 2
ret_addr = read_u64(fp + PTR_SIZE)
prev_rbp = read_u64(fp)
frame_id = FrameId(prev_rsp, ret_addr)

# Unpacking the saved registers..
unwind_info = pending_frame.create_unwind_info(frame_id)
unwind_info.add_saved_register("rsp", gdb.Value(prev_rsp))
unwind_info.add_saved_register("rip", gdb.Value(ret_addr))
unwind_info.add_saved_register("rbp", gdb.Value(prev_rbp))

Skipping non-JIT frames

I did some logging with gdb (set debug frame 1) to verify whether the unwinder was working correctly, and it turned out that for innocent C++ frames, the Python unwinder was being called before others that are more capable of finding the function start address. That can also be another source of breakage in step-debugging (since then gdb would not be able to understand whether you are stepping into a C++ frame correctly). The earlier version tried to check whether the pc has source-level debug info, which appeared to be too eager - even without it, gdb can still resolve the function name for C++ frames using e.g. the symbol tables, which can also be used by gdb to unwind better than us. So I used a more conservative check: if gdb can already symbolicate the frame somehow, we should just skip it and let gdb’s built-in unwinders handle it:

1
2
if pending_frame.name() is not None:
return None # gdb can already symbolicate this, likely not a JIT frame.

After the fix

With the unwinder loaded, the commands produce a much more useful backtrace:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
gdb -x /path/to/v8/tools/gdbinit --args node --abort-on-uncaught-exception throw.js
(gdb) run
(gdb) bt
#0 0x0000000002385c4f in v8::base::OS::Abort() ()
#1 0x0000000000d5b521 in v8::internal::Isolate::CreateMessageOrAbort(v8::internal::DirectHandle<v8::internal::Object>, v8::internal::MessageLocation*) ()
#2 0x0000000000d5a7f6 in v8::internal::Isolate::Throw(v8::internal::Tagged<v8::internal::Object>, v8::internal::MessageLocation*) ()
#3 0x000000000129d35e in v8::internal::Runtime_Throw(int, unsigned long*, v8::internal::Isolate*) ()
#4 0x00007bf7e3e7aa36 in ?? ()
#5 0x00007bf7e3f8a477 in ?? ()
#6 0x00007bf7e3dce0d3 in ?? ()
#7 0x00007bf7e3dce0d3 in ?? ()
#8 0x00007bf7e3dce0d3 in ?? ()
#9 0x00007bf7e3dce0d3 in ?? ()
#10 0x00007bf7e3dce0d3 in ?? ()
#11 0x00007bf7e3dce0d3 in ?? ()
#12 0x00007bf7e3dce0d3 in ?? ()
#13 0x00007bf7e3dce0d3 in ?? ()
#14 0x00007bf7e3dce0d3 in ?? ()
#15 0x00007bf7e3dcb25c in ?? ()
#16 0x00007bf7e3dcafa7 in ?? ()
#17 0x0000000000d47a20 in v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) ()
#18 0x0000000000d472bc in v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::DirectHandle<v8::internal::Object>, v8::internal::DirectHandle<v8::internal::Object>, v8::base::Vector<v8::internal::DirectHandle<v8::internal::Object> const>) ()
#19 0x0000000000bea39b in v8::Function::Call(v8::Isolate*, v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) ()
#20 0x00000000008e06e7 in node::builtins::BuiltinLoader::CompileAndCall(v8::Local<v8::Context>, char const*, node::Realm*) ()
#21 0x00000000009c1784 in node::Realm::ExecuteBootstrapper(char const*) ()
#22 0x00000000008b8fb5 in node::StartExecution(node::Environment*, char const*) ()
#23 0x00000000008b8f28 in node::StartExecution(node::Environment*, std::function<v8::MaybeLocal<v8::Value> (node::StartExecutionCallbackInfo const&)>) ()
#24 0x0000000000802cc6 in node::LoadEnvironment(node::Environment*, std::function<v8::MaybeLocal<v8::Value> (node::StartExecutionCallbackInfo const&)>, std::function<void (node::Environment*, v8::Local<v8::Value>, v8::Local<v8::Value>)>) ()
#25 0x000000000096cb21 in node::NodeMainInstance::Run() ()
#26 0x00000000008bcb29 in node::Start(int, char**) ()
#27 0x00007bf80482a1ca in __libc_start_call_main (main=main@entry=0x1d24db0 <main>, argc=argc@entry=3, argv=argv@entry=0x7ffc0f4c6998) at ../sysdeps/nptl/libc_start_call_main.h:58
#28 0x00007bf80482a28b in __libc_start_main_impl (main=0x1d24db0 <main>, argc=3, argv=0x7ffc0f4c6998, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc0f4c6988)
at ../csu/libc-start.c:360
#29 0x00000000007faf2e in _start ()

The V8 JIT frames still show up as ?? in terms of function names (it will take a lot more effort to symbolicate them), but at least now gdb can walk through them correctly and reach the C++ frames on the other side. This makes it clearer what’s causing the crash.

Since this is now merged, to use it, you can source V8’s tools/gdbinit (which now registers the unwinder) when opening a core dump on x64 Linux. For example, to print all the stack traces quickly:

1
gdb -q -x /path/to/v8/tools/gdbinit -ex 'thread apply all bt full' -ex 'quit' /path/to/executable /path/to/core