Relax and Unwind

Part 1: Understanding Native Calls

15 February 2021 — 6 min

I have been working on the the Sentry Native Team for a bit more than a year now. One of the most important things that helps Engineers to find the cause of an Error is a stack trace. This is also a really challenging topic, especially for native code.

In my opinion, the best way to learn and understand a complex topic, and to appreciate the existing solutions, is to try to implement it yourself. Along that way, I will try to implement my own stack unwinder in order to learn more about how a native call stack really looks like and how to extract a stack trace. This is just my personal learning experiment and is not related to any specific things I do in my day job.

# A simple Stacktrace

Lets start by looking at a stacktrace from another language first. Take this simple example in JS:

function never() {
  gonna();
}
function gonna() {
  give();
}
function give() {
  you();
}
function you() {
  up();
}
function up() {
  console.log(new Error().stack);
}
never();

This will give me the following stacktrace when running in node:

Error
    at up ([...]rickroll.js:14:15)
    at you ([...]rickroll.js:11:3)
    at give ([...]rickroll.js:8:3)
    at gonna ([...]rickroll.js:5:3)
    at never ([...]rickroll.js:2:3)
    at Object.<anonymous> ([...]rickroll.js:16:1)
    at Module._compile (node:internal/modules/cjs/loader:1102:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1131:10)
    at Module.load (node:internal/modules/cjs/loader:967:32)
    at Function.Module._load (node:internal/modules/cjs/loader:807:14)

As you see, it gives the call stack top to bottom (the order is a language ecosystem convention), and it also includes a few frames that are outside of my code.

In this case, node is the runtime environment that executes the javascript code. Running this so called managed code means that node will keep track of what happens. This tracking comes with some overhead, but as shown it does provide us with a few benefits.

Native code is very different. Depending on your definition of Runtime, there is nothing that manages or drives your code, and the code itself is usually tuned for maximum performance, so it will avoid to do anything that is not strictly necessary to achieve its goals.

Since we have no runtime that we can just ask to give us a stacktrace, we have to create one ourselves.

# Native Instructions, Registers and Stack

In order to understand what a native call stack is, we have to first learn how the processor in our computer actually works and executes code.

We will take a look at the actual assembly code and at the x64 Instruction Set Architectures (ISA) to figure out what it does.

Lets start by doing some quick mafs in Rust:

fn one() -> usize {
    1
}
fn two() -> usize {
    2
}

fn plus(a: usize, b: usize) -> usize {
    a.wrapping_add(b)
}

fn minus(a: usize, b: usize) -> usize {
    a.wrapping_sub(b)
}

pub fn quick_mafs() -> usize {
    let four = plus(two(), two());
    minus(four, one())
}

You can look at this in detail in the Compiler Explorer

Lets look at a small snippet of assembly that is created for these functions:


example::two:
        mov     eax, 2                      // 1
        ret                                 // 2

example::quick_mafs:
        sub     rsp, 40                     // 3
        call    example::two                // 4
        mov     qword ptr [rsp + 32], rax   // 5
        call    example::two                // 6
        mov     qword ptr [rsp + 24], rax   // 7

Our function call, and the return from that function are clearly visible, they correspond to the call and the ret instruction respectively. We see some other things as well which need a bit more research to understand.

The Windows Documentation of the x64 Architecture is a really good resource to learn. I also very much enjoyed previous years Advent of Code which introduced its own instruction set and guided you along implementing a virtual processor around that.

Each architecture has its own calling conventions, and the document says that the return value is returned in the rax register, which is what we see above. All that our two function does is write its return value (1) to that register before returning (2). The call then moves that return value someplace else (5) to deal with it later.

The processor executes the instructions one ofter the other, but this example shows that it does need to jump around in the code a bit. In particular, this is the sequence in which the instructions are executed: 3, 4, 1, 2, 5, 6, 1, 2, 7. We have two calls to two, and we execute the instructions at the addresses 1 and 2 twice, but on the first go we jump back to 5, while we continue at 7 the second time. How does the processor know to do that?

Lets take a look at the documentation for the call and ret instructions. Again, the Windows Documentation for the x86 Instructions helps.

The call instruction pushes the return address onto the stack then jumps to the destination.
The ret instruction pops and jumps to the return address on the stack.

Okay, so we have to learn about something called the stack, and jumps.

We heard the term register already. These registers hold the values that the processor currently works with. They are extremely fast, but there is only a very limited number of them, depending on the architecture. Some of them are special purpose registers. We already learned about rax, the Accumulator register which is used for return values. We also see another one in the example above, rsp, the Stack Pointer register. It points to the top of the stack, and is changing when you push or pop things to and from the stack. It is kind of like Array#length in JS, which also changes when you call Array#.push and Array#.pop.

Another special register is the Instruction Pointer, rip, which holds the address of the next instruction in line after the one currently being executed.

Alright, so we know about the stack pointer and the instruction pointer, and we know what call and ret do, so lets try to visualize this.

(4) call: The next instruction is 5, which we push onto the stack and then change to 1. (Stack: [5], rip: 1)
(2) ret: Pop 5 from the stack and overwrite rip. (Stack: [], rip: 5)
(6) call: Push, overwrite rip. (Stack: [7], rip: 1)
(2) ret: Pop, overwrite rip. (Stack: [], rip: 7)
(7) ...

So we can now kind of follow how a processor executes native instructions, and what happens to the stack and the instruction pointer during that process.

The key takeaway here is that the processor is quite dumb and just executes instructions one after the other, and it follows whatever the next instruction (rip) is. Other things to note are that all the stack manipulations (push and pop) are balanced. Also, the stack contains the next instruction where we have to go to, not really where we came from. However, the call instruction always pushes the current rip, which happens to be the instruction after the call. Which makes things a bit simpler since the instruction where we came from is in most cases one place before our return address.

So we are done, right? Well not quite. As we see in instruction 3, the code itself can manipulate the stack pointer in any way it wants, and we don’t really know where on the stack our return address is. Figuring that out will be a story for another day.