Swatinem Blog Resume

Improving async Rust codegen

— 10 min

Last week I was looking at the implementation details of async. Specifically I was looking at two issues that make stack traces in async programs confusing and hard to make sense of.

To recap, lets take this snippet of Rust code:

pub async fn fn_with_nested_block() -> Backtrace {
    None.unwrap_or_else(|| async { Backtrace::force_capture() })
        .await
}

When we run it with your favorite async runtime of choice and print the stack trace, we will get something like this on Linux:

   0: async_codegen::fn_with_nested_block::{{closure}}::{{closure}}::{{closure}}
             at ./src/lib.rs:15:36
   1: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
   2: async_codegen::fn_with_nested_block::{{closure}}
             at ./src/lib.rs:16:9
   3: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
   4: async_codegen::tests::test_stack::{{closure}}
             at ./src/lib.rs:77:51
   5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19

This does not look very nice for two reasons:

  1. Every async fn in our stack trace adds a meaningless and noisy GenFuture in the middle.
  2. For nested blocks, we end up with a ton of ::{{closure}} that are confusing.

I ended the post by saying “I will have a look”. Well, I did, and I have both good, and not as good news.

# Symbol Mangling

It turns out that the issue of function names is a Linux/macOS issue related to the way symbol mangling is done on those platforms.

This is not an issue on Windows which uses a different way of representing function names.

For the snippet above, I do get a much better output:

async_codegen::fn_with_nested_block::async_fn$0::closure$0::async_block$0

For Linux, I figured out the code paths that generate the mangled names, and have a Draft PR open that at least gets the necessary information through to that place.

The v0 symbol mangling scheme was first proposed in RFC 2603 in 2018, and implemented in early 2019. The PR to make it the default has been sitting there since end of 2021, though there seems to be a little progress.

The problem here is that Rust symbol mangling is larger than just the Rust project. It needs to be understood by gdb (and other GNU tools), lldb (and other LLVM tools), as well as a wide variety of profiling and binary instrumentation tools.

There were a couple of PRs along the years to amend the format which are linked from the PR above, so making changes is possible. Though without actually having a look at those, I imagine the process to be rather tedious and slow. Not something I’m too excited about. But I will keep looking at it.

# Getting rid of GenFuture

What does get me more excited though is getting rid of GenFuture.

I managed to hack together a proof of concept in about two days and a have Draft PR open.

A compiler with my PR does solve my original goal of getting rid of the superfluous GenFuture<T> frames in my stack traces. Here is the relevant snippet for the code above, on Windows:

   3: async_codegen::fn_with_nested_block::async_fn$0::closure$0::async_block$0
             at .\src\lib.rs:14
   4: async_codegen::fn_with_nested_block::async_fn$0
             at .\src\lib.rs:15
   5: async_codegen::tests::test_stack::async_block$0
             at .\src\lib.rs:28

That looks very clean now. I have also verified this on symbolicator which is a huge async heavy codebase. It builds, passes tests, and profiling it with samply yields much better stack traces than before.

The extensive Rust test suite also revealed some unexpected improvements:

Diagnostic spans now point to the whole async block, not only the block after the async keyword:

Before:
LL |     async { *x }
   |           ^^--^^
   |           | |
   |           | `x` is borrowed here
   |           may outlive borrowed value `x`
After:
LL |     async { *x }
   |     ^^^^^^^^--^^
   |     |       |
   |     |       `x` is borrowed here
   |     may outlive borrowed value `x`

OR:
Before:
LL |       let send_fut = async {
   |  __________________________^
After:
LL |       let send_fut = async {
   |  ____________________^

I had to do quite some work chasing down various diagnostics that had subtle changes. Most of those had some special handling for async constructs that was not compatible anymore after the changes I have done.

Though there is still some regressions to track down.

For one, async blocks are now trivially const. They are just data after all. While this is really an improvement, it is an unexpected improvement, as they are supposed to be behind a const_async_blocks feature.

At the time that check is performed, the async function as such does not exist anymore, so I will have to figure out a way to implement this check a different way.

# What an async fn Captures

This leaves me with another very hard to track down failure, and now that I am trying to write it down, I get more confused by the minute.

Consider this code, which comes directly from the Rusts test suite:

pub struct Foo<'a>(&'a ());

type Alias = Foo<'static>;
impl Alias {
    pub async fn using_alias<'a>(self: &Alias, arg: &'a ()) -> &() {
        arg
    }
}

This typechecks just fine with Rust stable (1.65), but fails with my PR:

error[E0700]: hidden type for `impl Future<Output = &'a ()>` captures lifetime that does not appear in bounds
  --> playground\async-codegen\src\lib.rs:22:68
   |
22 |       pub async fn using_alias<'a>(self: &Alias, arg: &'a ()) -> &() {
   |  ____________________________________________________________________^
23 | |         arg
24 | |     }
   | |_____^
   |
   = note: hidden type `impl Future<Output = &'a ()>` captures lifetime '_#17r

However, if I manually “inline” the type alias like so:

impl Foo<'static> {
    pub async fn using_self<'a>(&self, arg: &'a ()) -> &() {
        arg
    }
}

Things are already failing on stable Rust:

error: lifetime may not live long enough
  --> playground\async-codegen\src\lib.rs:29:9
   |
28 |     pub async fn using_self<'a>(&self, arg: &'a ()) -> &() {
   |                             --  - let's call the lifetime of this reference `'1`
   |                             |
   |                             lifetime `'a` defined here
29 |         arg
   |         ^^^ associated function was supposed to return data with lifetime `'1` but it is returning
data with lifetime `'a`

Am I completely misunderstanding how type aliases are supposed to work? Are they not interchangeable with the type they are aliasing?

pub fn alias(a: &Alias) -> &Foo<'static> {
    a
}
pub fn call_alias() {
    let a: &Foo<'static> = &Foo(&());
    alias(a);
}

This snippet of code suggests so, right?

Am I so out of touch with reality by now?


Lets take a different step, and try desugaring the async fn. As a reminder, the recent async fn in trait blog post showed this desugaring as well:

impl Alias {
    pub fn desugared_using_alias<'a>(self: &Alias, arg: &'a ()) -> impl Future<Output = &()> {
        let _self = self;
        async move {
            let _self = _self;
            arg
        }
    }
}

You might wonder, what am I doing with that weird _self parameter?

That is a way to explicitly capture that parameter. This is the main difference between functions and closures. Closures only capture what they need, whereas functions capture all the arguments, and drop them in a very specific order.

Trying to compile that code gives me my good friend E0700 again:

error[E0700]: hidden type for `impl Future<Output = &'a ()>` captures lifetime that does not appear in bounds
  --> playground\async-codegen\src\lib.rs:31:9
   |
29 |       pub fn desugared_using_alias<'a>(self: &Alias, arg: &'a ()) -> impl Future<Output = &()> {
   |                                              ------ hidden type `impl Future<Output = &'a ()>` captures the anonymous lifetime defined here
30 |           let _self = self;
31 | /         async move {
32 | |             let _self = _self;
33 | |             arg
34 | |         }
   | |_________^
   |
help: to declare that the `impl Trait` captures `'_`, you can add an explicit `'_` lifetime bound
   |
29 |     pub fn desugared_using_alias<'a>(self: &Alias, arg: &'a ()) -> impl Future<Output = &()> + '_ {   |                                                                                              ++++

And there is a suggestions. What if we apply it?

impl Alias {
    pub fn desugared_using_alias<'a>(self: &Alias, arg: &'a ()) -> impl Future<Output = &()> + '_ {
        let _self = self;
        async move {
            let _self = _self;
            arg
        }
    }
}

… and compile again:

error: lifetime may not live long enough
  --> playground\async-codegen\src\lib.rs:34:9
   |
29 |       pub fn desugared_using_alias<'a>(
   |                                    -- lifetime `'a` defined here
30 |           self: &Alias,
   |                 - let's call the lifetime of this reference `'1`
...
34 | /         async move {
35 | |             let _self = _self;
36 | |             arg
37 | |         }
   | |_________^ associated function was supposed to return data with lifetime `'a` but it is returning
data with lifetime `'1`

Uff, that is not very helpful either.


Interestingly enough, while I was experimenting, I had a slightly different snippet of code before, taking _self instead of self:

impl Alias {
    pub fn desugared_using_alias<'a>(_self: &Alias, arg: &'a ()) -> impl Future<Output = &()> + '_ {
        let _self = _self;
        async move {
            let _self = _self;
            arg
        }
    }
}

This surprisingly makes a huge difference in diagnostics:

error[E0106]: missing lifetime specifiers
  --> playground\async-codegen\src\lib.rs:29:90
   |
29 |     pub fn desugared_using_alias<'a>(_self: &Alias, arg: &'a ()) -> impl Future<Output = &()> + '_ {
   |                                             ------       ------                          ^      ^^ expected named lifetime parameter
   |                                                                                          |
   |                                                                                          expected named lifetime parameter
   |
   = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `_self` or `arg`
help: consider using the `'a` lifetime
   |
29 |     pub fn desugared_using_alias<'a>(_self: &Alias, arg: &'a ()) -> impl Future<Output = &'a ()> +'a {
   |                                                                                           ++      ~~

error[E0621]: explicit lifetime required in the type of `_self`
  --> playground\async-codegen\src\lib.rs:31:9
   |
29 |       pub fn desugared_using_alias<'a>(_self: &Alias, arg: &'a ()) -> impl Future<Output = &()> + '_ {
   |                                               ------ help: add explicit lifetime `'a` to the type of `_self`: `&'a Foo<'static>`
30 |           let _self = _self;
31 | /         async move {
32 | |             let _self = _self;
33 | |             arg
34 | |         }
   | |_________^ lifetime `'a` required

Now it is giving different errors and different suggestions, namely to just use 'a everywhere. And, to my surprise, even the diagnostics will just inline Alias as Foo<'static>.


Circling back to our original code with self, and applying these suggestions to just use 'a everywhere does solve the problem and the code finally compiles, but it is not entirely correct, as now both self and arg are tied to the same lifetime.

We can demonstrate this with another snippet:

pub async fn use_lifetimes() {
    let _self: Alias = Foo(&());
    let arg = ();
    let arg_ref = _self.desugared_using_alias(&arg).await;
    drop(_self);
    println!("{arg_ref:?}");
}
error[E0505]: cannot move out of `_self` because it is borrowed
  --> playground\async-codegen\src\lib.rs:42:10
   |
41 |     let arg_ref = _self.desugared_using_alias(&arg).await;
   |                   --------------------------------- borrow of `_self` occurs here
42 |     drop(_self);
   |          ^^^^^ move out of `_self` occurs here
43 |     println!("{arg_ref:?}");
   |                ------- borrow later used here

The diagnostics now say that _self is tied to arg_ref, which we said above by making it the same lifetime, but did not really intend. So how can we fix that?

By introducing separate lifetimes, and adding an explicit lifetime bound after the compiler told us to:

impl Alias {
    pub fn desugared_using_alias<'arg: 'slf, 'slf>(
        &'slf self,
        arg: &'arg (),
    ) -> impl Future<Output = &'arg ()> + 'slf {
        let _self = self;
        async move {
            let _self = _self;
            arg
        }
    }
}

pub async fn use_lifetimes() {
    let _self: Alias = Foo(&());
    let arg = ();
    let arg_ref = _self.desugared_using_alias(&arg).await;
    drop(_self);
    println!("{arg_ref:?}");
}

So where am I going with all this? Lifetimes are hard!

Either way, this was a very long post already, and still some things to solve.

I hope at least I could raise some excitement about the improvements I’m trying to make. Having cleaner and more readable stack traces is definitely a win.

I also anticipate that there will be smaller wins elsewhere. Less code for the compiler to inline and optimize away, less debuginfo to generate. It could potentially reduce compiletimes, output binary sizes, and even improve the runtime performance of the generated code. I haven’t measured that effect yet, and the Rust performance test suite did not yet run on my PR either.