Rust async can truly be zero-cost
— 4 minUpdate:
I updated the code examples now that GATs have been stabilized.
One of the fundamental selling points of Rust is zero-cost abstractions. This means that you can write high-level generic code, and the compiler will optimize it in such a way that you couldn’t have written better code by hand.
There are tons of examples of Rust doing this. But I came with a very specific example in mind, and was curious if Rust could actually figure all this out.
I want to have some code which should be generic over an async, or a sync implementation. What does this mean? Essentially, I want to have an async-trait, with a generic async function using that trait. But I also want a sync version of it, without having to write those separately.
It is hard to explain in words, so lets demonstrate the idea with a snipped of Rust:
// as of today, we need one nightly-only features to make this work:
#![feature(type_alias_impl_trait)]
pub struct Stuff(pub u8);
trait StuffProvider {
type ProvideStuff<'c>: Future<Output = Stuff> + 'c where Self: 'c;
/// Provides [`Stuff`] asynchronously.
fn provide_stuff(&mut self) -> Self::ProvideStuff<'_>;
}
// This function is generic over something providing us stuff.
async fn do_stuff<P: StuffProvider>(mut provider: P) -> Stuff {
let mut stuff = provider.provide_stuff().await;
stuff.0 += 1;
stuff
}
Okay, so far so good. We have our async function, and the async-trait defined.
What is needed now is an implementation for this trait, which returns Stuff
right away, without asynchronously waiting. My idea was that I can use
core::future::Ready
for this, so lets do that:
struct SyncStuffProvider;
impl StuffProvider for SyncStuffProvider {
type ProvideStuff<'c> = Ready<Stuff>;
fn provide_stuff(&mut self) -> Self::ProvideStuff<'_> {
ready(Stuff(41))
}
}
So far so good. I expect the resulting do_stuff(SyncStuffProvider)
future
to return Poll::Ready
immediately the first time it is called. But how do I
poll this future? Usually it is the job of an async executor to do the polling.
Most async executors have a block_on
method that, well, blocks the current
thread for as long as the future needs to be ready, polling it repeatedly if
necessary. However, our future should be ready immediately.
Let us write a simple executor that does exactly this. It does involve a bit of
unsafe
code, and it will certainly make you very unhappy if you happen to
actually call it with a future that does not immediately return its results.
mod ready_or_diverge {
use core::future::Future;
use core::pin::Pin;
use core::task::{Context, Poll, RawWaker, RawWakerVTable, Waker};
// copy-pasted from https://docs.rs/futures/0.3.17/futures/task/fn.noop_waker.html
unsafe fn noop_clone(_data: *const ()) -> RawWaker {
noop_raw_waker()
}
unsafe fn noop(_data: *const ()) {}
const NOOP_WAKER_VTABLE: RawWakerVTable = RawWakerVTable::new(noop_clone, noop, noop, noop);
const fn noop_raw_waker() -> RawWaker {
RawWaker::new(core::ptr::null(), &NOOP_WAKER_VTABLE)
}
pub fn block_on<O, F: Future<Output = O>>(mut fut: F) -> O {
let waker = unsafe { Waker::from_raw(noop_raw_waker()) };
let mut context = Context::from_waker(&waker);
let pinned = unsafe { Pin::new_unchecked(&mut fut) };
match pinned.poll(&mut context) {
Poll::Ready(res) => return res,
_ => loop {}, // diverge
}
}
}
What this code does is a bit of boilerplate, and a single poll
. If we get the
result immediately, fine. Otherwise, loop forever, which is a way to tell Rust
that the function will never return in such a case.
Putting this all together:
pub fn do_stuff_sync() -> Stuff {
let fut = do_stuff(SyncStuffProvider);
ready_or_diverge::block_on(fut)
}
#[test]
fn does_stuff_sync() {
let stuff = do_stuff_sync();
assert_eq!(stuff.0, 42);
}
And using cargo +nightly test
, we see that things at least work as we expected:
running 1 test
test does_stuff_sync ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Very good! But how about the zero-cost abstractions that I wanted to talk about?
Well, for that we would have to actually look at the assembly code that the compiler generated. I suggest the Compiler Explorer for that. And sure enough, with optimizations turned on, the Rust compiler is actually smart enough to see through all of our executor, async and trait code and compiles it all away to just a simple return.
I am truly amazed! At least for such a simple example, async code is completely zero-cost.
However, I wonder at which level of complexity the compiler might fail to do so.
In my research, I have found an issue#71093
which highlights a fairly simple case as well in which the compiler was not smart
enough to optimize things away. That issue specifically mentions panicking code,
since async fn
will usually generate a panic in case the returned future is
polled again after it successfully returned a value.
That also made me think that maybe using something like
no-panic might provide at least half a
solution here. I wonder if I can use the same tricks in my ready_or_diverge
to rather make it "ready or fail to compile".
But that is an exercise for another day.