# Rust async can truly be zero-cost

— 5 min

One of the fundamental selling points of Rust is zero-cost abstractions. This means that you can write high-level generic code, and the compiler will optimize it in such a way that you couldn’t have written better code by hand.

There are tons of examples of Rust doing this. But I came with a very specific example in mind, and was curious if Rust could actually figure all this out.

I want to have some code which should be generic over an async, or a sync implementation. What does this mean? Essentially, I want to have an async-trait, with a generic async function using that trait. But I also want a sync version of it, without having to write those separately.

It is hard to explain in words, so lets demonstrate the idea with a snipped of Rust:

// as of today, we need two nightly-only features to make this work:
#![feature(generic_associated_types)]
#![feature(type_alias_impl_trait)]

pub struct Stuff(pub u8);

trait StuffProvider {
type ProvideStuff<'c>: Future<Output = Stuff> + 'c;
/// Provides [Stuff] asynchronously.
fn provide_stuff(&mut self) -> Self::ProvideStuff<'_>;
}

// This function is generic over something can provide us stuff.
async fn do_stuff<P: StuffProvider>(mut provider: P) -> Stuff {
let mut stuff = provider.provide_stuff().await;
stuff.0 += 1;
stuff
}


Okay, so far so good. We have our async function, and the async-trait defined.

What is needed now is an implementation for this trait, which returns Stuff right away, without asynchronously waiting. My idea was that I can use core::future::Ready for this, so lets do that:

struct SyncStuffProvider;

impl StuffProvider for SyncStuffProvider {

fn provide_stuff(&mut self) -> Self::ProvideStuff<'_> {
}
}


So far so good. I expect the resulting do_stuff(SyncStuffProvider) future to return Poll::Ready immediately the first time it is called. But how do I poll this future? Usually it is the job of an async executor to do the polling. Most async executors have a block_on method that, well, blocks the current thread for as long as the future needs to be ready, polling it repeatedly if necessary. However, our future should be ready immediately.

Let us write a simple executor that does exactly this. It does involve a bit of unsafe code, and it will certainly make you very unhappy if you happen to actually call it with a future that does not immediately return its results.

mod ready_or_diverge {
use core::future::Future;
use core::pin::Pin;
use core::task::{Context, Poll, RawWaker, RawWakerVTable, Waker};

unsafe fn noop_clone(_data: *const ()) -> RawWaker {
noop_raw_waker()
}

unsafe fn noop(_data: *const ()) {}

const NOOP_WAKER_VTABLE: RawWakerVTable = RawWakerVTable::new(noop_clone, noop, noop, noop);

const fn noop_raw_waker() -> RawWaker {
RawWaker::new(core::ptr::null(), &NOOP_WAKER_VTABLE)
}

pub fn block_on<O, F: Future<Output = O>>(mut fut: F) -> O {
let waker = unsafe { Waker::from_raw(noop_raw_waker()) };
let mut context = Context::from_waker(&waker);
let pinned = unsafe { Pin::new_unchecked(&mut fut) };
match pinned.poll(&mut context) {
_ => loop {}, // diverge
}
}
}


What this code does is a bit of boilerplate, and a single poll. If we get the result immediately, fine. Otherwise, loop forever, which is a way to tell Rust that the function will never return in such a case.

Putting this all together:

pub fn do_stuff_sync() -> Stuff {
let fut = do_stuff(SyncStuffProvider);
}

#[test]
fn does_stuff_sync() {
let stuff = do_stuff_sync();
assert_eq!(stuff.0, 42);
}


And using cargo +nightly test, we see that things at least work as we expected:

running 1 test
test does_stuff_sync ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s


Very good! But how about the zero-cost abstractions that I wanted to talk about?

Well, for that we would have to actually look at the assembly code that the compiler generated. I suggest the Compiler Explorer for that. And sure enough, with optimizations turned on, the Rust compiler is actually smart enough to see through all of our executor, async and trait code and compiles it all away to just a simple return.

I am truly amazed! At least for such a simple example, async code is completely zero-cost.

However, I wonder at which level of complexity the compiler might fail to do so.

In my research, I have found an issue#71093 which highlights a fairly simple case as well in which the compiler was not smart enough to optimize things away. That issue specifically mentions panicking code, since async fn will usually generate a panic in case the returned future is polled again after it successfully returned a value.

That also made me think that maybe using something like no-panic might provide at least half a solution here. I wonder if I can use the same tricks in my ready_or_diverge to rather make it "ready or fail to compile".

But that is an exercise for another day.