Implementation Details of async Rust

9 November 2022 — 5 min

I have been looking at a lot of Rust async stack traces lately. This was mostly related to profiling some heavily async code locally, as well as profiling some production systems in the cloud test driving Sentrys new profiling support for Rust.

We don’t need to go all that big and fancy, we can observe the problem already with a tiny example. Now that Backtrace is finally stable, we can capture one directly in stable Rust today without any external dependencies, though I do need to pull in an async executor.

I could just reuse my ready_or_diverge noop executor I used previously, but I settled on tokio instead.

use std::backtrace::Backtrace;

pub async fn a(arg: u32) -> Backtrace {
    let bt = b().await;
    let _arg = arg;
    bt
}

pub async fn b() -> Backtrace {
    Backtrace::force_capture()
}

#[cfg(test)]
mod tests {
    use super::*;

    #[tokio::test]
    async fn test_stack() {
        let backtrace = a(0).await;
        println!("{}", backtrace);
    }
}

So what kind of stack trace does this produce? A humongous one! Most of that is thread setup, #[test] infrastructure, and the tokio runtime scheduler. Closer to the top we will find the async functions we actually want to look at:

   4: async_codegen::b::{{closure}}
             at ./src/lib.rs:10:5
   5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
   6: async_codegen::a::{{closure}}
             at ./src/lib.rs:4:17
   7: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
   8: async_codegen::tests::test_stack::{{closure}}
             at ./src/lib.rs:19:29
   9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
  10: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/future.rs:124:9

Every second frame is the same, a from_generator::GenFuture<T>, which is not really that helpful.

There is a Rust issue about this. A second related issue suggests to use RUSTFLAGS="-Csymbol-mangling-version=v0" to improve that stack trace a little, so lets try that.

   4: async_codegen::b::{closure#0}
             at ./src/lib.rs:10:5
   5: <core::future::from_generator::GenFuture<async_codegen::b::{closure#0}> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
   6: async_codegen::a::{closure#0}
             at ./src/lib.rs:4:17
   7: <core::future::from_generator::GenFuture<async_codegen::a::{closure#0}> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
   8: async_codegen::tests::test_stack::{closure#0}
             at ./src/lib.rs:19:29
   9: <core::future::from_generator::GenFuture<async_codegen::tests::test_stack::{closure#0}> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
  10: <core::pin::Pin<&mut core::future::from_generator::GenFuture<async_codegen::tests::test_stack::{closure#0}>> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/future.rs:124:9

There is a lot more detail, thats for sure. We can now see that generic argument to GenFuture. However all that detail is redundant and not meaningful.

And what are all those {closure#0} things?

We will start this journey by looking at what this GetFuture is. We find it here in the core crate.

Its definition is quite simple, as is its impl Future:

struct GenFuture<T: Generator<ResumeTy, Yield = ()>>(T);

impl<T: Generator<ResumeTy, Yield = ()>> Future for GenFuture<T> {
    type Output = T::Return;
    fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        // SAFETY: Safe because we're !Unpin + !Drop, and this is just a field projection.
        let gen = unsafe { Pin::map_unchecked_mut(self, |s| &mut s.0) };

        // Resume the generator, turning the `&mut Context` into a `NonNull` raw pointer. The
        // `.await` lowering will safely cast that back to a `&mut Context`.
        match gen.resume(ResumeTy(NonNull::from(cx).cast::<Context<'static>>())) {
            GeneratorState::Yielded(()) => Poll::Pending,
            GeneratorState::Complete(x) => Poll::Ready(x),
        }
    }
}

There is also a few helpers there, but we can look at them later.

What this, and the surrounding from_generator fn, tells us is that async functions are based on generators internally.

What is a Generator then?

Generators are an unstable Rust feature that is documented in the unstable book.

Lets look at an abbreviated definition. The complete docs for the trait are here.

pub trait Generator<R = ()> {
    type Yield;
    type Return;

    fn resume(
        self: Pin<&mut Self>,
        arg: R
    ) -> GeneratorState<Self::Yield, Self::Return>;
}

pub enum GeneratorState<Y, R> {
    Yielded(Y),
    Complete(R),
}

This is indeed very similar to futures, hence async functions are built on them.

But how are async functions turned into generators? That is done in the compiler code when transforming the AST (abstract syntax tree) of your Rust program into the HIR (high-level intermediate representation).

The make_async_expr function is responsible for turning an async {} block into code similar to std::future::from_generator(<generator>). Immediately below is lower_expr_await. That function turns an await into a loop that will poll the underlying future and yield when it is Poll::Pending.

I would advise you to take a look at those functions. They are well documented and quite understandable, even if you are not a compiler expert.

So now we know where exactly our GenFuture stack frames are coming from.

There is one missing piece though. Why do we have {closure#0} all over the place?

In a different part of the AST to HIR lowering step we will find the lower_maybe_async_body fn.

Its job is to transform a async fn foo() {} into a fn foo() -> impl Future { async {} }. This function also calls into make_async_expr mentioned above, which then further turns that async {} block into our generator. That generator is just a special kind of closure internally in the compiler.

# Can we do better?

Well that is the remaining question now. Is it possible to remove these confusing and distracting stack frames? Is GenFuture really necessary? The compiler turns our async {} block into an impl Generator by some magic. and this Generator trait is extremely similar to the Future trait. Can’t the compiler just, well… create a impl Future by that same magic somehow?

And what about this {closure#0}? Here, even though it is a bit ugly, I do agree that the function that returns the lazy future is distinct from the actual future body. I have blogged before how this can be confusing and even dangerous sometimes. You can yourself create a non-lazy future that does some real work on call, vs lazily on poll. The call has a “normal” fn name, and the poll has this weird {closure#0} appended at the end.

Things can get even more complex if you add more explicit, or implicit closures into the mix. Consider this snippet for example:

async do_tasks(tasks: &[u32]) -> Vec<u32> {
   futures::future::join_all(tasks.iter().map(|num| async { num })).await
}

That is one implicit closure for the outer async fn, one explicit closure for the map, and a third implicit one for the async block. So this will show up as do_tasks::{closure#0}::{closure#0}::{closure#0} in your stack trace. Not particularly great, but it also reflects the reality when you peel away the abstractions.

So again, can we do any better? I’m actually intrigued to find out, and I will spend some weekend coding time to dig deeper into how the compiler magic creates impl Generator internally.

Similarly, it should be possible somehow to distinguish between real closures and async constructs in the stack trace. do_tasks::{async-fn#0}::{closure#0}::{async-block#0} does look a little nicer.