Rustdoc doctests need fixing
— 8 minBefore going on a slight rant about why rustdoc / doctests are broken, I first want to highlight that rustdoc / doctests are amazing !!!
I believe that great documentation and great tooling is a major contributor to Rusts success. And one part of that is rustdoc, and doctests.
The fact that you can write documentation and examples, and have those at the same time be part of your testsuite is an extreme productivity booster on the one hand, and equally valuable for potential library users on the other. What makes this even better is the fact that your documentation and examples will never go out of date because they are an integrated part of your testsuite.
# Whats wrong?
But if we look behind the curtain, we can see that one of the greatest features of the Rust ecosystem does not look as pretty on the inside. Let us explore some of the more gruesome sides of it. Maybe you will have the impression that things are barely being held together with doc-tape, pun intended.
# The compilation model
So how do rustdoc doctests work internally?
Rustdoc integrates tightly with the rust compiler, and as a first step it will
invoke the rust compiler in a limited capacity. Just enough to resolve #[cfg]
attributes and know which items there are and what you are use
-ing.
Fun fact: Triple-slash comments are just syntactic sugar for #[doc = "..."]
attributes. Also, did you know that you can combine that with cfg_attr
too?
Anyway. Now that rustc has resolved all the attributes, and rustdoc has collected all the items it needs to document with their desugared doc attributes, it will then collect individual doctests.
Then, it will do a purely textual transformation to create a small main
program for each of the doctests.
Next, each of these snippets will be compiled individually via separate rustc
invocations. Some secret environment variables are provided to rustc
to try
to re-map line numbers as best as possible, though there are bugs.
Finally, the resulting executable will then be run, obviously, and deleted
immediately afterwards. Unless you pass the unstable --persist-doctests
option.
This is not ideal.
People often criticize Rust for its slow compile times. Clearly those people have
never run webpack
or the clang static analyzer in cross-translation-unit mode.
But the problem still stands. Rustdoc will compile and link each doctest as an individual executable.
Cargo itself has a similar, but less severe problem as it will compile and link individual executables for each integration test. Hence it is common knowledge that you should delete (all but one) cargo integration tests. I have read previously that some bigger projects even have a "no doctests" policy, though I can’t seem to find a linkable blog post for that. But the reason mentioned there was also the unreasonable blowup in compilation and linking times.
# Workspaces, files and line numbers
To further highlight some of the problems with doctests, I will use the following example workspace with three crates:
// # crate-a/src/lib.rs:
//! Crate A
//!
//! Some random docs
//!
//! ```
//! assert_eq!("a" "b");
//! // ^ crate-a line 6, and yes the typo is intentional ;-)
//! ```
// # crate-b/src/lib.rs:
//! Crate B
//!
//! # Examples
//!
//! ```
//! assert_eq!(1, 2);
//! // ^ crate-b line 6
//! ```
// # crate-c/src/lib.rs:
/// Says hellew
///
/// # Examples
///
/// ```
/// crate_c::hellew();
/// ```
pub fn hellew() {
( // <- intentional typo
}
The examples I chose all have different kinds of errors in them, lets see them in action.
First, crate-c
has a typo in its Rust source:
> cargo test --doc -p doctest-c
Compiling doctest-c v0.1.0 (/home/swatinem/Coding/swatinem.de/playground/doctest-c)
error: mismatched closing delimiter: `}`
--> playground/doctest-c/src/lib.rs:9:5
|
8 | pub fn hellew() {
| - closing delimiter possibly meant for this
9 | ( // <- intentional typo
| ^ unclosed delimiter
10 | }
| ^ mismatched closing delimiter
error: could not compile `doctest-c` due to previous error
As we have discussed, doctests link to the underlying Rust library. So cargo will first try to compile that and fail. In this case rustdoc is not even being invoked. Moving on.
Next up, lets compile crate-a
which has a typo in the doctest:
> cargo test --doc -p doctest-a
Doc-tests doctest-a
running 1 test
test src/lib.rs - (line 5) ... FAILED
failures:
---- src/lib.rs - (line 5) stdout ----
error: no rules expected the token `"b"`
--> src/lib.rs:6:16
|
3 | assert_eq!("a" "b");
| -^^^ no rules expected this token in macro call
| |
| help: missing comma here
error: aborting due to previous error
Couldn't compile the test.
failures:
src/lib.rs - (line 5)
test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s
So far so good, we ran some tests which eventually failed.
test src/lib.rs
, okay. I have a workspace with multiple crates.
Which src/lib.rs
are you talking about exactly?
The source location is also not quite exact. Line 6
is good enough, but column
16
is a bit off. Off by 4
, or "//! ".len()
to be exact. But okay, I can
live with that.
But the provided source snippet says line 3
? Where is that coming from?
Lets look at the third example, crate-b
which should compile and fail at runtime.
> cargo test --doc -p doctest-b
Doc-tests doctest-b
running 1 test
test src/lib.rs - (line 5) ... FAILED
failures:
---- src/lib.rs - (line 5) stdout ----
Test executable failed (exit status: 101).
stderr:
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `1`,
right: `2`', src/lib.rs:3:1
The doctest (beginning on line 5
) panicked in file src/lib.rs
on line 3
.
Okay? This ominous line 3
again.
# Lets go nightly
Rustdoc and cargo have some unstable nightly-only options that can help a little bit with the encountered problems.
I originally implemented these options to help with better code coverage reports.
The -C instrument-coverage
option has been stabilized by now. But in order to
create code coverage reports you need the unstable --persist-doctests
rustdoc
option.
Running with code coverage manually is quite a complicated procedure, though at least it is well documented, including instructions on how to use it with rustdoc.
Luckily there is cargo-llvm-cov
which makes this a lot more pleasant.
Though it has limited support for doctests for
reasons.
To demonstrate the problem with code coverage, I will invoke all the necessary tools manually.
> RUSTFLAGS="-C instrument-coverage" \
RUSTDOCFLAGS="-C instrument-coverage -Z unstable-options --persist-doctests doctestbins" \
LLVM_PROFILE_FILE="doctests.profraw" \
cargo +nightly test --doc -p doctest-b
[…] same output as before
I end up with a playground/doctest-b/doctestbins/src_lib_rs_5_0/rust_out
executable, and
the profiler output in playground/doctest-b/doctests.profraw
. Note that both these
files ended up in the crate directory, more on that later.
Next up, creating the coverage report:
> llvm-profdata merge -sparse doctest-b/doctests.profraw -o doctest-b/doctests.profdata
> llvm-cov show --object doctest-b/doctestbins/src_lib_rs_5_0/rust_out --instr-profile doctest-b/doctests.profdata
1| |//! Crate B
2| |//!
3| |//! # Examples
4| |//!
5| 1|//! ```
6| 1|//! assert_eq!(1, 2);
7| 1|//! // ^ crate-b line 6
8| 1|//! ```
So far so good. llvm-cov report --summary-only
will also print full file names
and reveals to me that I am dealing with a full absolute path.
Now that we have briefly looked at code coverage, lets revisit the earlier
examples and use the unstable -Z doctest-in-workspace
cargo flag, which
internally passes --test-run-directory
to rustdoc.
> cargo +nightly test --doc -p doctest-a -Z doctest-in-workspace
Doc-tests doctest-a
running 1 test
test playground/doctest-a/src/lib.rs - (line 5) ... FAILED
failures:
---- playground/doctest-a/src/lib.rs - (line 5) stdout ----
error: no rules expected the token `"b"`
--> playground/doctest-a/src/lib.rs:6:16
|
3 | assert_eq!("a" "b");
| -^^^ no rules expected this token in macro call
| |
| help: missing comma here
error: aborting due to previous error
Couldn't compile the test.
failures:
playground/doctest-a/src/lib.rs - (line 5)
Nice, now I know which exact file is failing, instead of having to look at the
Doc-tests
header.
The line/column numbers are still slightly off though.
The failing doctest:
> cargo +nightly test --doc -p doctest-b -Z doctest-in-workspace
Doc-tests doctest-b
running 1 test
test playground/doctest-b/src/lib.rs - (line 5) ... FAILED
failures:
---- playground/doctest-b/src/lib.rs - (line 5) stdout ----
Test executable failed (exit status: 101).
stderr:
thread 'main' panicked at 'assertion failed: `(left == right)`
left: `1`,
right: `2`', playground/doctest-b/src/lib.rs:3:1
Same here. I get better workspace-relative filenames, similar to other kinds of tests. But again, the line number is off.
To my own surprise, there is no change when running code coverage tests. In both cases the llvm tools report full absolute paths.
Maybe things have improved here. I remember there were similar issue as with the
cargo output, as I developed the doctest-in-workspace
option specifically with
code coverage in mind. Or maybe my example was too simplistic and I would have
needed to have multiple doctests from multiple workspace crates merged into a
single code coverage report.
# Where do we go from here?
Well, I initially got the urge to write this blog post as I
opened a PR today to stabilize
rustdoc --test-run-directory
, which itself is just an implementation detail
for cargo --doctest-in-workspace
which is what I actually
want to stabilize.
I hope I have demonstrated with these examples here that cargo --doctest-in-workspace
is a nice thing to have. And to even make it the default eventually.
But rustdoc --test-run-directory
? Not so sure. This feels like more doc-tape
piled on the already way too brittle doctest infrastructure.
Rustdoc doctests need an overhaul.
Instead of a testsuite driven by rustdoc that compiles, links and runs each doctest individually, we should rather have rustdoc output a single binary with a testsuite.
Decouple the compilation of doctests from how they run, and have cargo control the whole process. That way it would better match the way rustc and other kinds of tests are being handled.
It should integrate with check/clippy.
With more sophisticated source location tracking, we could have better lines/column
numbers in error messages like above,
in code coverage reports, or
even in #[doc = include_str!(...)]
.
With a well generated test harness, we could also have a usable
--nocapture
.
Last but not least, it could lead to better integration with nextest as well.
In the end, rustdoc is still an amazing tool, and doctests an amazing concept.
But there are some mighty skeletons lurking in the closet. I have looked into the belly of the beast and I can say that, sadly, I don’t have the endurance to see such a transformation through. I’m even exhausted after proposing my stabilization PR and writing this blog post.
I do hope that someone will tackle this eventually. As I mentioned in the beginning, documentation and great tooling are a big driver for Rusts continued success, and I am looking forward to seeing things improve over time.