A Rustaceans view on gRPC and Cap'n Proto

17 April 2024 — 13 min

One of the things I’m trying to drive recently at Sentry is introducing some form as strictly typed RPC. To this end, I have experimented using both gRPC and Cap'n Proto within a small isolated service I am trying to introduce.

All of these experiments are open source, and you can look at the code in this PR.

# Why?

As things stand right now, Sentry does not have a single well established solution for RPC, and one effort to at least define payload schemas is using JSON Schema which comes with its own set of problems.

One limitation of JSON is lack of support for binary data. So there is a few other solutions floating around, including JSON in MessagePack, or the worst of all, JSON encoded as a byte array in JSON.

I am also not spilling any secrets here. (Almost) all of Sentry is open source code, so with some effort you can find all of those examples I have given.

Only parts of the codebase are written in Rust, and those parts are interfacing with a much larger portion written in Python, using either networked RPC, or in-process FFI, again with a wild mix of technologies for each.

And while we have the comfort of having strict type safety as well as an awesome #[derive] system for serialization, Python has neither. It is possible to progressively type Python code, but that in the end is only a mild suggestion and provides no hard guarantees.

The other problem is evolving and changing the schema in a forward/backward compatible way.

It happens quite often that Rust sends data to Python that it just can’t deal with, and the other way around as well.

# A Rustaceans view

But enough about Sentry, I promised to write about gRPC and Cap'n Proto from a Rust developers view, so lets go!

The first interesting thing to note here is that I believe both protobuf and capnproto were created by the same person, with capnproto supposedly improving on a ton of suboptimal things in protobuf.

# Code generation

The first thing to note here is that both serialization formats need a schema. They are not self describing like JSON or MessagePack.

That is not really such a big problem, as there is a bunch of binary serialization formats that are not self describing and have implementations based around the serde ecosystem and leveraging the #[derive] infrastructure.

Not for protobuf or capnproto though. As mentioned, both need a schema definition, in their own bespoke definition language.

This by itself is not really a problem. Rust has amazing support for build.rs scripts to do code generation at build time which is not based on proc-macros. The problem however is that both tools depend on tooling that has to be installed on the developers machine through some other ways, like a distribution package manager. This adds complexity and points of failures. It means that you do not have direct control over which version of the package is being used, and it means that a cargo invocation is not fully self-contained.

There is a reason for why things work like this. The serialization format for both are supposedly complex, and having one reference implementation of the schema parser, and code generator can better guarantee binary compatibility. This is a valid argument, but a really weak one to be honest. There is a ton of binary formats out there that solve this challenge by having a well defined specification and a conformance test suite.

One can certainly make a point that each language ecosystem tends to reinvent existing software in their own language just because. This is certainly true. It was true back when I was still a TypeScript developer, and it is true for Rust. However I would say that Rust has a unique opportunity and value proposition here. It is no surprise that even the TypeScript ecosystem is starting to rewrite its developer tooling in Rust (esbuild written in Go is the odd one out here).

Just to make my point of why Rust is a better choice than C(++) in this case: Can someone tell me how many bits a C long has?

Lets get back on track:

Both tools use a custom definition language, and use build scripts to generate code, and both require an external tool for that. Not great, but also not terrible.

# Zero Copy

So far so good. We end up with some auto-generated Rust structs.

protobuf (via the prost crate) generates fairly idiomatic Rust code. However, it generates copy and allocation heavy code. What I mean by that is that strings are turned into, well, a String. Which has its own underlying allocation. So the act of parsing a protobuf buffer means copying a lot of data around.

Ideally, it would represents strings and binary data as &'buffer str. The effect of that would be that no data would have to be copied, as everything points directly into the message buffer.

I have written previously about zero copy deserialization and creating a binary serialization format. I don’t claim to be the knows all expert when it comes to these things, but I have invested quite some effort along those lines.

So while the protobuf / prost way involves parsing and validating the whole payload, copying bytes into new containers, the capnproto way is the exact opposite.

capnproto does not parse and validate the payload directly, but it generates accessors which do all the validation on access. This is beneficial in certain scenarios when you are only interested in parts of the message payload, or when you want to parse as much as you can from a corrupted stream, instead of treating it as all or nothing. It also means that capnproto is truly zero copy, and it will not allocate Strings and copy bytes around.

The big downside of this is that code using accessors, in particular ones that parse / validate data on-demand can be very unergonomic. Take a look at what it takes to access a &str:

let config_name = pry!(pry!(request.get_config_name()).to_str());

That is a lot of ceremony for something as trivial as accessing a &str. Each access can potentially fail, and is async as well.

As mentioned, the “parse only what you need” approach is good if you are only interested in a subset of fields. But on the other hand, it can also be a footgun if you end up accessing the same field multiple times. In that case you are paying the cost of validation multiple times, not to mention the code it takes to do so.

# RPC

Lets move from the representation and deserialization of payloads to the actual RPC part.

Here, tonic as the go-to gRPC implementation integrates very well into the tower ecosystem. Creating a Server feels very natural and straight forward.

Similarly, implementing the RPC methods is straight forward as well. tonic uses the #[async_trait] proc-macro, and the method definitions look quite as you would expect. You have a request value that is already deserialized, and you have to return a Result with your response type.

async fn rpc_method(
    &self,
    request: Request<RequestTy>,
) -> Result<Response<ResponseTy>, Status>;

The capnproto story is a bit different. Lets first take a look, and then dissect this example:

fn rpc_method(
    &mut self,
    params: RequestParams,
    mut results: ResponseTypes,
) -> Promise<(), ::capnp::Error>;

What is immediately visible is this Promise type. It is an opaque type, but like with any other crate, we can directly refer to the code of capnp to figure out what it is:

enum Promise<T, E> {
    Immediate(Result<T, E>),
    Deferred(Pin<Box<dyn Future<Output = Result<T, E>> + 'static>>),
    Empty,
}

This just looks like Future (actually, futures::future::MaybeDone) with some extra steps to me. It also explains the weird pry!() macro we saw in the previous section.

We learned that tonic was using #[async_trait] which internally uses a boxed Future, and we are seeing the same things here as well. I’m not sure how the actual code looks like when it is generated, but I’m doubtful that chaining a bunch of pry!() macros would generate more efficient code than the compiler can for a Future.

I have written previously on this blog about the fact that async can be truly zero-cost. A future that does not actually await anything will immediately return with Poll::Ready(T), and the compiler is smart enough to inline and dead-code-eliminate everything else that is unreachable in such cases.

The second things I see here is the mut results along with the () return type. Out-parameters are a code smell no matter the language, but especially so in Rust. We have tuples and Result as a way of returning multiple, or mutually exclusive return values, unlike other languages which truly have no other choice than to use out-parameters.

The problem with an out-parameter is that its non-obvious what happens with it in case of an error. It might be half-initialized. Will it be discarded? Is it still being used afterwards? At least from a type system perspective, the bets are off as there is no guarantees at this point.

capnp is very focussed on optimizing allocations and data layout, and the reason for doing things like this is that one can just write directly into a pre-allocated output buffer, without the need to first return a proper Rust type with an owned String, just to copy those string bytes into a serialized message afterwards. capnp can avoid that copy by just writing directly. But it also results in a very unergonomic out-parameter with setters.

# Python

For this specific service I’m building, I want to expose an RPC Server in Rust, but use it from a Python Client.

Here, I have only done the client-side gRPC implementation in Python. And boy was I disappointed about gRPC in that case.

As Rust developers, we are very spoiled by extremely good tooling. One of these extremely good tools is rustdoc. It is lightyears ahead of any other documentation generation tool I have seen so far in any other language ecosystem.

The gRPC docs for Python are horrible. Everything is on a single webpage, there is no links between types. The argument types are not documented at all. It took me a while to write the client code for it, but I was successful in the end. The code is here for anyone interested.

On a positive note, the RPC layer has the possibility to define a timeout for each request which is definitely a plus considering that we are suffering from other networking calls stalling for an insane amount of time without any possibility to time them out and recover. However the generated code does not contain any type annotations at all. Sure, Python itself is a dynamically typed language, and type annotations are only a "fairly recent" thing in the ecosystem.

I would still have expected more here. The whole point of introducing a strongly typed RPC mechanism was to have these strict type checks for any language that interacts with the system to the extent that the language even allows it.

The generated Python code has also dubious code quality, and I got some expected pushback against committing generated Python code. Rust build scripts have advantage of not having to commit such code, but also the disadvantage of introducing yet another compile time dependency and contributing to compile times. Python does not seem to have a builtin mechanism for that, or at least I am not aware of anything.

Another point that I didn’t really make much experience with myself, but heard second-hand anecdotes about grpc being really slow and inefficient in Python.

# Schema Evolution

We have almost reached the end. The last point I want to touch on is the possibility of Schema evolution.

Both schema languages force you to enumerate all your properties. The reason is simple, as this fixes the ordering of fields in the binary serialization, and provides a limited safeguard that your schema is append-only.

However I haven’t found any validation in either tool that really ensures that schema changes are valid. There is no before / after comparison, no diff view, etc. To be honest, I haven’t really checked how to do these things properly, but looking at the surface, these tools look like straight code generators that have no built in notion of a previous schema version.

In theory I could just as well remove fields from the end, or remove a field from the middle and renumber all the rest, as well as change a fields type. All these operations would break the schema and break existing deserializer code.

Again, I haven’t really done these modifications myself to see how things would behave. But this is fundamental enough that there has to be a solution for it somewhere. Having an efficient serialization format with generated strict types for various languages is only half the story. The other half is about not breaking everything when changes are introduced.

# My Conclusion

In the end, I was quite disappointed by both alternatives. Having to install protoc or capnpc separately from the OS package manager is a pain for both. The schema language of Cap'n Proto seemed to be more flexible at first, and it is being advertised as being more performant than protobuf. That is certainly true also when looking at the generated Rust code and how it handles zero copy deserialization. But it does come at the cost of being extremely unergonomic to use.

gRPC was quite easy to get going from Rust, and it also ships with a grpcurl tool that lets you test your server directly from the command line. I really missed such a tool for Cap'n Proto. However the Python side of things was just horrible. Bad documentation, no types being generated at all, and second-hand reports about the grpc integration in Python being incredibly slow and inefficient.

But there is also some positive news. Both are well established projects which are open source. So there are third party tools out there that might do things a lot better than the reference implementation. And if not, we can also roll our own.

Finally, as luck would have it, just as I was doing these exploratory experiments, the upstream gRPC team at google announced work on another fully Rust native implementation. So the future looks even better, at least for the Rust side. Though they already look quite good in Rust, I was much more dissatisfied with the situation in Python.