Implementation Details of async Rust
— 5 minI have been looking at a lot of Rust async stack traces lately. This was mostly related to profiling some heavily async code locally, as well as profiling some production systems in the cloud test driving Sentrys new profiling support for Rust.
We don’t need to go all that big and fancy, we can observe the problem already
with a tiny example. Now that Backtrace
is finally stable, we can capture one
directly in stable Rust today without any external dependencies, though I do
need to pull in an async executor.
I could just reuse my ready_or_diverge
noop executor I used previously,
but I settled on tokio
instead.
use std::backtrace::Backtrace;
pub async fn a(arg: u32) -> Backtrace {
let bt = b().await;
let _arg = arg;
bt
}
pub async fn b() -> Backtrace {
Backtrace::force_capture()
}
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn test_stack() {
let backtrace = a(0).await;
println!("{}", backtrace);
}
}
So what kind of stack trace does this produce? A humongous one!
Most of that is thread setup, #[test]
infrastructure, and the tokio runtime
scheduler. Closer to the top we will find the async functions we actually want to look at:
4: async_codegen::b::{{closure}}
at ./src/lib.rs:10:5
5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
6: async_codegen::a::{{closure}}
at ./src/lib.rs:4:17
7: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
8: async_codegen::tests::test_stack::{{closure}}
at ./src/lib.rs:19:29
9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
10: <core::pin::Pin<P> as core::future::future::Future>::poll
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/future.rs:124:9
Every second frame is the same, a from_generator::GenFuture<T>
, which is not really that helpful.
There is a Rust issue about this.
A second related issue
suggests to use RUSTFLAGS="-Csymbol-mangling-version=v0"
to improve that stack
trace a little, so lets try that.
4: async_codegen::b::{closure#0}
at ./src/lib.rs:10:5
5: <core::future::from_generator::GenFuture<async_codegen::b::{closure#0}> as core::future::future::Future>::poll
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
6: async_codegen::a::{closure#0}
at ./src/lib.rs:4:17
7: <core::future::from_generator::GenFuture<async_codegen::a::{closure#0}> as core::future::future::Future>::poll
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
8: async_codegen::tests::test_stack::{closure#0}
at ./src/lib.rs:19:29
9: <core::future::from_generator::GenFuture<async_codegen::tests::test_stack::{closure#0}> as core::future::future::Future>::poll
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
10: <core::pin::Pin<&mut core::future::from_generator::GenFuture<async_codegen::tests::test_stack::{closure#0}>> as core::future::future::Future>::poll
at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/future.rs:124:9
There is a lot more detail, thats for sure. We can now see that generic argument to GenFuture
.
However all that detail is redundant and not meaningful.
And what are all those {closure#0}
things?
We will start this journey by looking at what this GetFuture
is. We find it
here in the core
crate.
Its definition is quite simple, as is its impl Future
:
struct GenFuture<T: Generator<ResumeTy, Yield = ()>>(T);
impl<T: Generator<ResumeTy, Yield = ()>> Future for GenFuture<T> {
type Output = T::Return;
fn poll(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
// SAFETY: Safe because we're !Unpin + !Drop, and this is just a field projection.
let gen = unsafe { Pin::map_unchecked_mut(self, |s| &mut s.0) };
// Resume the generator, turning the `&mut Context` into a `NonNull` raw pointer. The
// `.await` lowering will safely cast that back to a `&mut Context`.
match gen.resume(ResumeTy(NonNull::from(cx).cast::<Context<'static>>())) {
GeneratorState::Yielded(()) => Poll::Pending,
GeneratorState::Complete(x) => Poll::Ready(x),
}
}
}
There is also a few helpers there, but we can look at them later.
What this, and the surrounding from_generator
fn, tells us is that async
functions are based on generators internally.
What is a Generator
then?
Generators are an unstable Rust feature that is documented in the unstable book.
Lets look at an abbreviated definition. The complete docs for the trait are here.
pub trait Generator<R = ()> {
type Yield;
type Return;
fn resume(
self: Pin<&mut Self>,
arg: R
) -> GeneratorState<Self::Yield, Self::Return>;
}
pub enum GeneratorState<Y, R> {
Yielded(Y),
Complete(R),
}
This is indeed very similar to futures, hence async functions are built on them.
But how are async functions turned into generators? That is done in the compiler code when transforming the AST (abstract syntax tree) of your Rust program into the HIR (high-level intermediate representation).
The make_async_expr
function is responsible for turning an async {}
block into code
similar to std::future::from_generator(<generator>)
.
Immediately below is
lower_expr_await
.
That function turns an await
into a loop that will poll
the underlying future
and yield
when it is Poll::Pending
.
I would advise you to take a look at those functions. They are well documented and quite understandable, even if you are not a compiler expert.
So now we know where exactly our GenFuture
stack frames are coming from.
There is one missing piece though. Why do we have {closure#0}
all over the place?
In a different part of the AST to HIR lowering step we will find the
lower_maybe_async_body
fn.
Its job is to transform a async fn foo() {}
into a fn foo() -> impl Future { async {} }
.
This function also calls into make_async_expr
mentioned above, which then
further turns that async {}
block into our generator. That generator is just
a special kind of closure internally in the compiler.
# Can we do better?
Well that is the remaining question now. Is it possible to remove these
confusing and distracting stack frames? Is GenFuture
really necessary?
The compiler turns our async {}
block into an impl Generator
by some magic.
and this Generator
trait is extremely similar to the Future
trait.
Can’t the compiler just, well… create a impl Future
by that same magic somehow?
And what about this {closure#0}
? Here, even though it is a bit ugly, I do
agree that the function that returns the lazy future is distinct from the
actual future body. I have blogged before how this can be confusing and
even dangerous sometimes.
You can yourself create a non-lazy future that does some real work on call,
vs lazily on poll
. The call has a “normal” fn name, and the poll
has this
weird {closure#0}
appended at the end.
Things can get even more complex if you add more explicit, or implicit closures into the mix. Consider this snippet for example:
async do_tasks(tasks: &[u32]) -> Vec<u32> {
futures::future::join_all(tasks.iter().map(|num| async { num })).await
}
That is one implicit closure for the outer async fn
, one explicit closure for
the map
, and a third implicit one for the async
block. So this will show up as
do_tasks::{closure#0}::{closure#0}::{closure#0}
in your stack trace.
Not particularly great, but it also reflects the reality when you peel away
the abstractions.
So again, can we do any better? I’m actually intrigued to find out, and I will
spend some weekend coding time to dig deeper into how the compiler magic creates
impl Generator
internally.
Similarly, it should be possible somehow to distinguish between real closures
and async constructs in the stack trace.
do_tasks::{async-fn#0}::{closure#0}::{async-block#0}
does look a little nicer.