A Rustaceans view on gRPC and Cap'n Proto
— 13 minOne of the things I’m trying to drive recently at Sentry is introducing some form as strictly typed RPC. To this end, I have experimented using both gRPC and Cap'n Proto within a small isolated service I am trying to introduce.
All of these experiments are open source, and you can look at the code in this PR.
# Why?
As things stand right now, Sentry does not have a single well established solution for RPC, and one effort to at least define payload schemas is using JSON Schema which comes with its own set of problems.
One limitation of JSON is lack of support for binary data. So there is a few other solutions floating around, including JSON in MessagePack, or the worst of all, JSON encoded as a byte array in JSON.
I am also not spilling any secrets here. (Almost) all of Sentry is open source code, so with some effort you can find all of those examples I have given.
Only parts of the codebase are written in Rust, and those parts are interfacing with a much larger portion written in Python, using either networked RPC, or in-process FFI, again with a wild mix of technologies for each.
And while we have the comfort of having strict type safety as well as an awesome #[derive]
system for serialization,
Python has neither. It is possible to progressively type Python code, but that in the end is only a mild suggestion and
provides no hard guarantees.
The other problem is evolving and changing the schema in a forward/backward compatible way.
It happens quite often that Rust sends data to Python that it just can’t deal with, and the other way around as well.
# A Rustaceans view
But enough about Sentry, I promised to write about gRPC and Cap'n Proto from a Rust developers view, so lets go!
The first interesting thing to note here is that I believe both protobuf and capnproto were created by the same person, with capnproto supposedly improving on a ton of suboptimal things in protobuf.
# Code generation
The first thing to note here is that both serialization formats need a schema. They are not self describing like JSON or MessagePack.
That is not really such a big problem, as there is a bunch of binary serialization formats that are not self describing
and have implementations based around the serde
ecosystem and leveraging the #[derive]
infrastructure.
Not for protobuf
or capnproto
though. As mentioned, both need a schema definition, in their own bespoke definition language.
This by itself is not really a problem. Rust has amazing support for build.rs
scripts to do code generation at build
time which is not based on proc-macros.
The problem however is that both tools depend on tooling that has to be installed on the developers machine through
some other ways, like a distribution package manager. This adds complexity and points of failures. It means that you
do not have direct control over which version of the package is being used, and it means that a cargo
invocation is
not fully self-contained.
There is a reason for why things work like this. The serialization format for both are supposedly complex, and having one reference implementation of the schema parser, and code generator can better guarantee binary compatibility. This is a valid argument, but a really weak one to be honest. There is a ton of binary formats out there that solve this challenge by having a well defined specification and a conformance test suite.
One can certainly make a point that each language ecosystem tends to reinvent existing software in their own language
just because. This is certainly true. It was true back when I was still a TypeScript developer, and it is true for Rust.
However I would say that Rust has a unique opportunity and value proposition here. It is no surprise that even the TypeScript
ecosystem is starting to rewrite its developer tooling in Rust (esbuild
written in Go is the odd one out here).
Just to make my point of why Rust is a better choice than C(++) in this case: Can someone tell me how many bits a C long
has?
Lets get back on track:
Both tools use a custom definition language, and use build scripts to generate code, and both require an external tool for that. Not great, but also not terrible.
# Zero Copy
So far so good. We end up with some auto-generated Rust struct
s.
protobuf
(via the prost
crate) generates fairly idiomatic Rust code.
However, it generates copy and allocation heavy code. What I mean by that is that strings are turned into, well, a String
.
Which has its own underlying allocation. So the act of parsing a protobuf
buffer means copying a lot of data around.
Ideally, it would represents strings and binary data as &'buffer str
. The effect of that would be that no data would
have to be copied, as everything points directly into the message buffer.
I have written previously about zero copy deserialization and creating a binary serialization format. I don’t claim to be the knows all expert when it comes to these things, but I have invested quite some effort along those lines.
So while the protobuf
/ prost
way involves parsing and validating the whole payload, copying bytes into new containers,
the capnproto
way is the exact opposite.
capnproto
does not parse and validate the payload directly, but it generates accessors which do all the validation on access.
This is beneficial in certain scenarios when you are only interested in parts of the message payload, or when you want to
parse as much as you can from a corrupted stream, instead of treating it as all or nothing.
It also means that capnproto
is truly zero copy, and it will not allocate String
s and copy bytes around.
The big downside of this is that code using accessors, in particular ones that parse / validate data on-demand can
be very unergonomic. Take a look at what it takes to access a &str
:
let config_name = pry!(pry!(request.get_config_name()).to_str());
That is a lot of ceremony for something as trivial as accessing a &str
.
Each access can potentially fail, and is async
as well.
As mentioned, the “parse only what you need” approach is good if you are only interested in a subset of fields. But on the other hand, it can also be a footgun if you end up accessing the same field multiple times. In that case you are paying the cost of validation multiple times, not to mention the code it takes to do so.
# RPC
Lets move from the representation and deserialization of payloads to the actual RPC part.
Here, tonic
as the go-to gRPC
implementation integrates very well into the tower
ecosystem.
Creating a Server
feels very natural and straight forward.
Similarly, implementing the RPC methods is straight forward as well. tonic
uses the #[async_trait]
proc-macro, and
the method definitions look quite as you would expect. You have a request value that is already deserialized, and you
have to return a Result
with your response type.
async fn rpc_method(
&self,
request: Request<RequestTy>,
) -> Result<Response<ResponseTy>, Status>;
The capnproto
story is a bit different. Lets first take a look, and then dissect this example:
fn rpc_method(
&mut self,
params: RequestParams,
mut results: ResponseTypes,
) -> Promise<(), ::capnp::Error>;
What is immediately visible is this Promise
type. It is an opaque type, but like with any other crate, we can
directly refer to the code of capnp
to figure out what it is:
enum Promise<T, E> {
Immediate(Result<T, E>),
Deferred(Pin<Box<dyn Future<Output = Result<T, E>> + 'static>>),
Empty,
}
This just looks like Future
(actually, futures::future::MaybeDone
) with some extra steps to me.
It also explains the weird pry!()
macro we saw in the previous section.
We learned that tonic
was using #[async_trait]
which internally uses a boxed Future
, and we are seeing the
same things here as well.
I’m not sure how the actual code looks like when it is generated, but I’m doubtful that chaining a bunch of pry!()
macros
would generate more efficient code than the compiler can for a Future
.
I have written previously on this blog about the fact that async can be truly zero-cost.
A future that does not actually await
anything will immediately return with Poll::Ready(T)
, and the compiler is smart enough
to inline and dead-code-eliminate everything else that is unreachable in such cases.
The second things I see here is the mut results
along with the ()
return type. Out-parameters are a code smell no matter
the language, but especially so in Rust. We have tuples and Result
as a way of returning multiple, or mutually exclusive
return values, unlike other languages which truly have no other choice than to use out-parameters.
The problem with an out-parameter is that its non-obvious what happens with it in case of an error. It might be half-initialized. Will it be discarded? Is it still being used afterwards? At least from a type system perspective, the bets are off as there is no guarantees at this point.
capnp
is very focussed on optimizing allocations and data layout, and the reason for doing things like this is that one
can just write directly into a pre-allocated output buffer, without the need to first return a proper Rust type with an
owned String
, just to copy those string bytes into a serialized message afterwards. capnp
can avoid that copy by
just writing directly. But it also results in a very unergonomic out-parameter with setters.
# Python
For this specific service I’m building, I want to expose an RPC Server in Rust, but use it from a Python Client.
Here, I have only done the client-side gRPC
implementation in Python. And boy was I disappointed about gRPC
in that case.
As Rust developers, we are very spoiled by extremely good tooling. One of these extremely good tools is rustdoc
.
It is lightyears ahead of any other documentation generation tool I have seen so far in any other language ecosystem.
The gRPC
docs for Python are horrible. Everything is on a single webpage, there is no links between types. The argument
types are not documented at all. It took me a while to write the client code for it, but I was successful in the end.
The code is here for anyone interested.
On a positive note, the RPC layer has the possibility to define a timeout for each request which is definitely a plus considering that we are suffering from other networking calls stalling for an insane amount of time without any possibility to time them out and recover. However the generated code does not contain any type annotations at all. Sure, Python itself is a dynamically typed language, and type annotations are only a "fairly recent" thing in the ecosystem.
I would still have expected more here. The whole point of introducing a strongly typed RPC mechanism was to have these strict type checks for any language that interacts with the system to the extent that the language even allows it.
The generated Python code has also dubious code quality, and I got some expected pushback against committing generated Python code. Rust build scripts have advantage of not having to commit such code, but also the disadvantage of introducing yet another compile time dependency and contributing to compile times. Python does not seem to have a builtin mechanism for that, or at least I am not aware of anything.
Another point that I didn’t really make much experience with myself, but heard second-hand anecdotes about grpc
being
really slow and inefficient in Python.
# Schema Evolution
We have almost reached the end. The last point I want to touch on is the possibility of Schema evolution.
Both schema languages force you to enumerate all your properties. The reason is simple, as this fixes the ordering of fields in the binary serialization, and provides a limited safeguard that your schema is append-only.
However I haven’t found any validation in either tool that really ensures that schema changes are valid.
There is no before / after comparison, no diff
view, etc.
To be honest, I haven’t really checked how to do these things properly, but looking at the surface, these tools look like
straight code generators that have no built in notion of a previous schema version.
In theory I could just as well remove fields from the end, or remove a field from the middle and renumber all the rest, as well as change a fields type. All these operations would break the schema and break existing deserializer code.
Again, I haven’t really done these modifications myself to see how things would behave. But this is fundamental enough that there has to be a solution for it somewhere. Having an efficient serialization format with generated strict types for various languages is only half the story. The other half is about not breaking everything when changes are introduced.
# My Conclusion
In the end, I was quite disappointed by both alternatives. Having to install protoc
or capnpc
separately from the OS
package manager is a pain for both.
The schema language of Cap'n Proto seemed to be more flexible at first, and it is being advertised as being more performant
than protobuf
. That is certainly true also when looking at the generated Rust code and how it handles zero copy deserialization.
But it does come at the cost of being extremely unergonomic to use.
gRPC was quite easy to get going from Rust, and it also ships with a grpcurl
tool that lets you test your server directly from the command line.
I really missed such a tool for Cap'n Proto.
However the Python side of things was just horrible. Bad documentation, no types being generated at all, and second-hand
reports about the grpc
integration in Python being incredibly slow and inefficient.
But there is also some positive news. Both are well established projects which are open source. So there are third party tools out there that might do things a lot better than the reference implementation. And if not, we can also roll our own.
Finally, as luck would have it, just as I was doing these exploratory experiments, the upstream gRPC team at google announced work on another fully Rust native implementation. So the future looks even better, at least for the Rust side. Though they already look quite good in Rust, I was much more dissatisfied with the situation in Python.