Swatinem Blog Resume

Rewrite it in Rust

— 14 min

Early this year, I managed to mostly move away from JS development into native code, which in my case means a lot of C/C++, as well as Rust, hopefully more of that in the future.

Most of what I will write about comes from my experience with sentry-native, which will soon release a rewritten version in C. That being said, all of the opinions in this post are my own.

I also want to start this with a quote from Bruce Lee:

If I tell you I'm good, probably you will say I'm boasting. But if I tell you I'm not good, you'll know I'm lying.

# Imposter Syndrome

While it has been quite some time since I have actively dealt with C code, I can up to speed with anything you can throw at me pretty quickly.

I do make quite good progress with my work on sentry-native; my code compiles, runs and passes tests. But for some reason, I don’t really feel confident in it. I’m not really sure if the things that I do are really correct, or if it is just luck that it works. And I constantly have the feeling that I must be missing something, or that things will probably blow up at some point later.

This is just in my mind though, and a classic example of imposter syndrome. And surprisingly, I don’t have this when writing Rust. Writing Rust code really empowers me, in the literal sense that I feel powerful and confident when writing Rust code. I have the feeling that whatever I do is correct. Quite remarkable actually.

# Distractions and Explicitness

One reason that I don’t feel very productive with C is that there is a lot of boilerplate and ceremony around almost everything.

Dealing with allocations, strings, iterables and generics is very tedious. I sometimes have the feeling that I don’t even see the real application logic because it is so obfuscated and drowns among all the mallocs, NULL-checks, manual copying and pointer-chasing.

One of the big distractions is checking for NULL all the time. There is two issues with this. One is that obviously, these checks do come with a runtime cost.

The other one is about explicitness. Is returning NULL part of the API contract, like Option in Rust? Does it actually mean something? Or is it just cargo-culted boilerplate that people copy-paste, because its what everyone else is doing?

I actually had to deal with a bug where NULL had special meaning.

# Infallible allocation

Most of the checks however are just unnecessary boilerplate in my opinion. And this boilerplate multiplies btw. Say you have 3 allocations in a functions. When you assume that they can fail, you would have to make sure to free the ones before the failing one, right?

And what do you want to do anyway? Just return NULL from your function? What if the function has a different return function? Will it silently fail? Can you actually recover from a possible allocation failure? Your program needs memory to do its job. If it doesn’t, it can’t do its job, and it might be the best idea to just crash hard.

The other question is, will you ever get a NULL from malloc anyway? I have read some quite good blog posts about this topic in the past, but don’t have any links handy.

In any case. Nowadays, most software you run will be 64-bit, which means that virtual adress space is practically unlimited. And most systems, even smartphones have a lot of physical memory. A lot more than a typical program should allocate. If it does, it is very likely that it has some leaks anyway.

And it is not only about your own program. It is kind of the behavior of the OS. Some time ago, there was post about linux behaving horribly under low memory conditions, which I have also experienced some time. Your system will stall hard, up to the point of requiring you to power-cycle, long before your programm will get a NULL from malloc.

There has really been said enough about this, but C developers still cargo-cult these NULL-checks everywhere. Rust allocations are infallible. If they do fail, I think it raises a panic, which you can decide to recover from, or not. Anyway, from a developer point of view, the code looks a lot cleaner! You can actually start to see the business logic underneath all the boilerplate.

Oh, and the use of Option makes intentions very clear, which brings me to the next point.

# Documentation

This has been praised a lot, and for good reason. The Rust documentation is excellent! The format is awesome, and most of the docs have examples, which thanks to doctests will also never be out of sync. When looking for C docs, there are a ton of different websites, and most of them are just horrible.

Rustdoc itself is awesome, but the whole spectrum of rust documentation is a delight!

# Ownership and Mutability

Speaking of Documentation and Memory-management.

The ownership model of Rust actually makes so much sense! Working with C code, I often don’t know who is responsible for freeing some memory. And I would guess that there is a lot of unnecessary copying going on because of that. And not to mention memory leaks. Sure, you can also leak memory in Rust, but its a lot harder!

One kind-of way to guess this in C is the const keyword. If a function returns something const, it usually means that ownership is not transfered. But the other way around, ownership and mutability is something completely different. Maybe I return something that is mutable, but must not be freed!

# Strings

Another thing that deserves a lot of praise is the Rust &str type, which really is just a &[u8] slice, which is guaranteed to be valid utf8, which is a really awesome guarantee to have! For interfacing with the OS, there is OsStr, with appropriate conversion functions. I had to touch a bit of OS-specific string code in C recently, and it was horrible.

But the real power actually lies in the way that strings in Rust are represented as slices, as a pair of (pointer, length), whereas strings in C need to be \0 terminated. This makes Rust strings a lot more efficient.

In Rust, you can trivially get a sub-slice of the string, whereas in C, you have to copy the sub-slice, and \0-terminate it. To actually make a copy, you will also need the length of the string, which is a O(N) operation in C, but O(1) in Rust.

Apart from this, the &str API of Rust is very rich! I miss .lines() and .ends_with() so much!

On the other hand, I also made the experience that Rust strings are not as easy to deal with than for example JS. But now I think that maybe the way that I index into, and slice my JS strings is actually unsafe, considering unicode outside the ASCII range.

# API and ABI

Now that I have touched a bit on both memory allocation, and having to copy a lot when working with C, one way Rust avoids this is by better dealing with value-types and reference-types. In Rust, you can more easily return structs from functions, and move them into functions via arguments. Those will live on the stack and don’t require allocation, which makes it more efficient than in C. Most of the time though you will deal with references, as in C. And from a coding perspective, there is no difference, whereas in C, you will have to learn the difference between -> and ., which makes refactoring more annoying in some places. One of the reasons C has to allocate and return pointers in a lot of places is that there is no other way to make a struct opaque, hiding its members, and also making it extensible. In C, you can either expose your structs, making them public API and requiring breaking changes when touching them, or you use opaque pointers, which require allocation.

Rust decouples API and ABI, and really Rust has no stable ABI at all. This means that you can hide details of a struct, change its size without requiring major version bumps, and still have the advantages of stack allocation.

Speaking of stack allocation. I actually ran into uninitialized memory issues with structs on the stack already a couple of times. Very annoying, and for some reason, the compiler didn’t warn me of those.

# Generics and Traits

Another thing that came to my mind is that Rusts Traits, Iterators and Generics make it super easy to deal with streaming data, which can further improve performance, and avoid a ton of intermediate allocations.

I am actually considering to re-implement something like Write in C, which would abstract away serializing data either into an in-memory buffer, onto disk, or right onto the network, without having to allocate a lot of intermediate buffers. But I already know that the C-version can never be as fast as Rust, as it would likely involve dynamic function calls, whereas Rust can just specialize and inline everything.

# Dependency Management

A bit related to ABI is also the question of static vs dynamic linking. Rust does not really do dynamic linking (or does it?)

There are some technical differences between static and dynamic linking. Dynamic linking can better namespace things, and also share both memory, and disk space among programs. But seriously, in a world where our phones run Java, our Desktops run JavaScript, and the Cloud does heavy sandboxing and containerization, we are way past caring about memory usage.

Static linking has some performance advantages, with link-time-optimization and dead-code-elimination. And Rust has a good story on symbol mangling, avoiding some of the pitfalls of static linking. And since it has no stable ABI anyway, it pretty much can’t do dynamic linking anyway.

Anyhow, I recently asked colleagues about this, before I realized that I wanted their opinion on something completely different. I was actually refering to vendoring dependencies vs relying on OS provided libraries.

One of the only times I had problems compiling an older (unmaintained) rust app was because of openssl-sys, which was trying to compile and link against my OS provided version. Which got out of sync, prevented the already compiled version of my app from starting, and made it impossible for me to actually re-compile.

This is not a new problem either. There is a lot of talk about vendoring dependencies. That way you are independent of the libraries and the versions thereof, that your OS provides. As always, there are tradeoffs. It might be a good idea that the Distribution can update system libraries, to patch vulnerabilities, in case you don’t update your own vendored version. On the other hand, this limits the version of a library you can use, and also requiring your users to have that certain library installed in the first place.

Having to deal with such things in C again is a real throwback, and I would love to just be able to consume whatever version of a dependency that I want, and have it statically link and just work, no matter where I copy my resulting binary. This is true portability and “run everywhere”.

# Building

Speaking of portability and dependencies. Rust has a really awesome story around cross compilation. And the way it does feature-flags and platform specific conditional code is awesome! This is just so much better than having tons of inconsistent, platform and compiler specific define flags.

Oh, and it has a standard module system! And cargo!!!

Having dealt with CMake for the past week, I really can’t understand how it has ever gained such popularity. The configuration syntax is horrific! It is case-insensitive, functions have space separated, optional and variadic arguments. Strings don’t need to be quoted unless you want to use certain special chars (which ones?). And there is no clear distinction between plain strings, and lists, at least not that I can tell. It has a global namespace of artifacts, with frequent name clashes, and it is absolutely not obvious to me how variables are scoped when you are dealing with multiple files. But at least I have figured out that it is a good idea to set target-specific flags. Which is not really obvious in the first place. Oh, and have I mentioned that the documentation is also horrible.

How to best consume and integrate with external (vendored) dependencies is also absolutely not obvious.

Since I had to look at build systems again, I want to quote from the meson docs:

every moment a developer spends writing or debugging build definitions is a second wasted.

I am so happy that Rust has cargo and crates. It is so refreshing to work with! Things just work as they should, and as you would expect them to.

# The paradox of choice

Building C code is very much non-trivial, which explains the plethora of tools that exist out there. Not to mention that almost every project I know of has its own way of building, its own way of dealing with feature flags, etc.

While choice and competition are certainly a good thing to have, and to allow. Too much can lead to fragmentation, and is quite frankly overwhelming.

Rust on the other hand has one clear and obvious way of doing things. But it still offers the possibility to extend this if necessary.

Rust has one way of building things. It has one way of configuring your builds. It has one way of documenting things. It has one way of doing testing. Of doing benchmarks. Etc, etc.

And these are very good choices as well. IMO, it is not the case of Rust being too young to have fragmented. I have the impression that the things just work.

Less time spent dealing with all that, more time to actually getting stuff done.

# Onboarding and Confidence

Coming full circle to the beginning. One thing that people criticize about Rust is its learning curve. Well yes, Rust takes some time to learn. But I think that investment provides a great return. As I said in my #rust2020 post, I do think learning Rust makes you a better developer. And most of the time, when there is no obvious easy solution to a problem, Rust kind-of leads the way to a better and more correct solution. Hard things are still hard.

But once you have learned Rust, it is so much easier to get started and anboarded to a bigger project, and feel productive very quickly. This is important!

# Conclusion

In my short time being a C developer again, I have seen already seen logic errors, threading problems, memory unsafety problems, and just plain inefficient code, which could all have been avoided by using Rust. And some of that code has been written by engineers far better than me. So much for the argument that smart engineers don’t make mistakes.

And yes, I would love to rewrite everything is Rust, just because! I am also very much in favor of a completely libc-free Rust! Where we have completely self-contained binaries which do their own syscalls with their only dependency being a specific kernel version. I have too little knowledge about how this would look like on other platforms than linux, tbh. This could be a true cross-compile once, run everywhere language.

Especially this cross-compiling, and the good things that I have heard about cbindgen make me wish that I could just ship pre-built static and dynamic libraries for all the platforms for users who don’t want to deal with compiling rust themselves, instead of having to deal with building C on all kinds of systems and compilers.

There is just so many good things to say about Rust! I didn’t even mention things like enums, pattern-matching and the fact that it has integer types that make sense (what is an unsigned long long int anyway?)!