Relax and Unwind
Part 1: Understanding Native Calls
— 6 minI have been working on the the Sentry Native Team for a bit more than a year now. One of the most important things that helps Engineers to find the cause of an Error is a stack trace. This is also a really challenging topic, especially for native code.
In my opinion, the best way to learn and understand a complex topic, and to appreciate the existing solutions, is to try to implement it yourself. Along that way, I will try to implement my own stack unwinder in order to learn more about how a native call stack really looks like and how to extract a stack trace. This is just my personal learning experiment and is not related to any specific things I do in my day job.
# A simple Stacktrace
Lets start by looking at a stacktrace from another language first. Take this simple example in JS:
function never() {
gonna();
}
function gonna() {
give();
}
function give() {
you();
}
function you() {
up();
}
function up() {
console.log(new Error().stack);
}
never();
This will give me the following stacktrace when running in node
:
Error
at up ([...]rickroll.js:14:15)
at you ([...]rickroll.js:11:3)
at give ([...]rickroll.js:8:3)
at gonna ([...]rickroll.js:5:3)
at never ([...]rickroll.js:2:3)
at Object.<anonymous> ([...]rickroll.js:16:1)
at Module._compile (node:internal/modules/cjs/loader:1102:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1131:10)
at Module.load (node:internal/modules/cjs/loader:967:32)
at Function.Module._load (node:internal/modules/cjs/loader:807:14)
As you see, it gives the call stack top to bottom (the order is a language ecosystem convention), and it also includes a few frames that are outside of my code.
In this case, node is the runtime environment that executes the javascript code. Running this so called managed code means that node will keep track of what happens. This tracking comes with some overhead, but as shown it does provide us with a few benefits.
Native code is very different. Depending on your definition of Runtime, there is nothing that manages or drives your code, and the code itself is usually tuned for maximum performance, so it will avoid to do anything that is not strictly necessary to achieve its goals.
Since we have no runtime that we can just ask to give us a stacktrace, we have to create one ourselves.
# Native Instructions, Registers and Stack
In order to understand what a native call stack is, we have to first learn how the processor in our computer actually works and executes code.
We will take a look at the actual assembly code and at the x64 Instruction Set Architectures (ISA) to figure out what it does.
Lets start by doing some quick mafs in Rust:
fn one() -> usize {
1
}
fn two() -> usize {
2
}
fn plus(a: usize, b: usize) -> usize {
a.wrapping_add(b)
}
fn minus(a: usize, b: usize) -> usize {
a.wrapping_sub(b)
}
pub fn quick_mafs() -> usize {
let four = plus(two(), two());
minus(four, one())
}
You can look at this in detail in the Compiler Explorer
Lets look at a small snippet of assembly that is created for these functions:
example::two:
mov eax, 2 // 1
ret // 2
example::quick_mafs:
sub rsp, 40 // 3
call example::two // 4
mov qword ptr [rsp + 32], rax // 5
call example::two // 6
mov qword ptr [rsp + 24], rax // 7
Our function call, and the return from that function are clearly visible, they
correspond to the call
and the ret
instruction respectively.
We see some other things as well which need a bit more research to understand.
The Windows Documentation of the x64 Architecture is a really good resource to learn. I also very much enjoyed previous years Advent of Code which introduced its own instruction set and guided you along implementing a virtual processor around that.
Each architecture has its own calling conventions, and the document says that
the return value is returned in the rax register
, which is what we see above.
All that our two
function does is write its return value (1) to that register
before returning (2). The call then moves that return value someplace else (5)
to deal with it later.
The processor executes the instructions one ofter the other, but this example
shows that it does need to jump around in the code a bit. In particular, this
is the sequence in which the instructions are executed:
3, 4, 1, 2, 5, 6, 1, 2, 7
. We have two calls to two
, and we execute the
instructions at the addresses 1
and 2
twice, but on the first go we jump
back to 5
, while we continue at 7
the second time. How does the processor
know to do that?
Lets take a look at the documentation for the call
and ret
instructions.
Again, the Windows Documentation for the x86 Instructions helps.
- The
call
instruction pushes the return address onto the stack then jumps to the destination. - The
ret
instruction pops and jumps to the return address on the stack.
Okay, so we have to learn about something called the stack, and jumps.
We heard the term register
already. These registers hold the values that the
processor currently works with. They are extremely fast, but there is only a
very limited number of them, depending on the architecture. Some of them are
special purpose registers. We already learned about rax
, the Accumulator
register which is used for return values. We also see another one in the example
above, rsp
, the Stack Pointer register. It points to the top of the stack,
and is changing when you push
or pop
things to and from the stack.
It is kind of like Array#length
in JS, which also changes when you call
Array#.push
and Array#.pop
.
Another special register is the Instruction Pointer, rip
, which holds the
address of the next instruction in line after the one currently being executed.
Alright, so we know about the stack pointer and the instruction pointer, and we
know what call
and ret
do, so lets try to visualize this.
- (4)
call
: The next instruction is5
, which we push onto the stack and then change to1
. (Stack:[5]
,rip
: 1) - (2)
ret
: Pop5
from the stack and overwrite rip. (Stack:[]
,rip
: 5) - (6)
call
: Push, overwrite rip. (Stack:[7]
,rip
: 1) - (2)
ret
: Pop, overwrite rip. (Stack:[]
,rip
: 7) - (7) ...
So we can now kind of follow how a processor executes native instructions, and what happens to the stack and the instruction pointer during that process.
The key takeaway here is that the processor is quite dumb and just executes
instructions one after the other, and it follows whatever the next instruction
(rip
) is. Other things to note are that all the stack manipulations
(push
and pop
) are balanced. Also, the stack contains the next instruction
where we have to go to, not really where we came from.
However, the call
instruction always pushes the current rip
, which happens
to be the instruction after the call
. Which makes things a bit simpler since
the instruction where we came from is in most cases one place before our return
address.
So we are done, right? Well not quite. As we see in instruction 3
, the code
itself can manipulate the stack pointer in any way it wants, and we don’t
really know where on the stack our return address is. Figuring that out will be
a story for another day.