Swatinem Blog Resume

Relax and Unwind

Part 2: Tracing the Stack

— 5 min

Last time, we were looking at how calls are actually implemented in native assembly code, and what kind of instructions the CPU is executing and what registers are involved.

We do know that the rsp register (the stack pointer) points to the top of the stack, and we learned that the call instruction pushes the next instruction onto the stack, which then becomes the bottom of the new stack frame, and the ret instruction jumps to whatever instruction address is on top of the stack. In between though, the stack pointer can move as the function will push and pop things onto and from the stack. So the position of the return address relative to the current stack pointer will change. Most of the times statically, but sometimes even dynamically, but I will come back to that in a later part of this series.

Anyhow, it seems we are a bit stuck. We only know where the current stack frame ends (the stack pointer), we don’t really know where it began (where the return address is).


So lets go back a bit and look at the special purpose registers again. One of them is called rbp, which stands for base pointer. It turns out that in older times, this base pointer served exactly this purpose. Lets look at how it is used by passing the -Cforce-frame-pointers=yes option to our Rust compiler. This is the output that the Compiler Explorer gives me:

example::never:
        push    rbp
        mov     rbp, rsp
        call    example::gonna
        pop     rbp
        ret

example::gonna:
        push    rbp
        mov     rbp, rsp
        call    example::give
        pop     rbp
        ret

example::give:
        push    rbp
        mov     rbp, rsp
        pop     rbp
        ret

I will also try to visualize how the stack actually looks like in that case. Also note that I drew the stack top to bottom, as it actually grows high to low. Anyhow, what we see is that rbp points to the position in the stack that has the parents base pointer. And exactly below that (actually above when talking of addresses) is the return address.

      ┌────────────────────────┐
      │ return addr of parent  │
     ↱│ bp of parent           │
     │├────────────────────────┤
     ││ return addr `never`    │
     └│ bp of `never`          │↰
      ├────────────────────────┤│
      │ return addr `gonna`    ││
rbp → │ bp of `gonna`          │┘
      │ ... locals of `give`   │
rsp → │ ...                    │
      └────────────────────────┘

# Walking the Stack

Alright, lets try to actually walk that stack of a live program. First, we need to actually read the rsp, rbp (and rip) registers. For this I will play a bit with unsafe Rust features, namely the nightly-only asm feature.

    let mut sp: usize;
    let mut bp: usize;
    let mut ip: usize;
    unsafe {
        asm!(
            "mov {sp}, rsp",
            "mov {bp}, rbp",
            "lea {ip}, [rip]",
            sp = out(reg) sp,
            bp = out(reg) bp,
            ip = out(reg) ip,
            options(nomem, nostack),
        );
    }

Here we have a snippet of inline assembly which just moves the current values of the registers into local Rust variables. Reading rip is a bit more involved, but that is a completely different story.

Now that we have the base pointer, we can use it to walk up the stack using this simple code:

    for _ in 0..=5 {
        ip = unsafe { (bp as *const usize).offset(1).read() };
        bp = unsafe { (bp as *const usize).offset(0).read() };
        println!("bp: {:#018x}; ip: {:#018x}", bp, ip);
    }

That is basically all there is to walking up the stack, if you have a valid base pointer to work with.

I double-checked my output and it is the same as in produced by std::backtrace. You can run the full example in the compiler explorer as well.

While working on this, I had a super hard time and couldn’t get this to work on Windows. In the end, I think that I hit a miscompilation bug in Rust. I filed a bug about it.

Instead of writing the current stack pointer into the base pointer register on function start, and growing the stack afterward, the code generated on windows first adjusts the stack pointer, and then tries to write the base pointer, adjusting for the offset again, though it seems that it messes up that offset sometimes.

Update

A comment on the issue I filed said that on Windows, the base pointer might as well be offset. And that offset can be looked up in the Unwind Info for that particular DLL/function. We will take a look at how to read that unwind info next time.

# What if there is no base pointer

In my previous post, the examples that I showed did not use any base pointer. And also today I had to use an explicit -Cforce-frame-pointers=yes compiler flag to make it use a frame pointer. This shows clearly that you don’t really need to maintain a base pointer when running the program.

So Rust does not maintain one by default. And I remember the oblivious -fomit-frame-pointer compiler options back when I didn’t know what it actually does. So now we know. It avoids a few instructions per function call, saves a bit of space on the stack, and frees up the rbp register. Which I would argue is the main reason, since general purpose registers are really scarce on x64, it makes sense to have one more available for use.

But from a stack unwinding perspective, it complicates things, since we don’t have all the information we need to unwind right there in the registers or on the stack.

Essentially all we need is the position of the return address relative to the stack pointer (which still is a special purpose register). And we will look at where to get this information the next time.