Rustdoc doctests need fixing

28 October 2022 — 8 min

Before going on a slight rant about why rustdoc / doctests are broken, I first want to highlight that rustdoc / doctests are amazing !!!

I believe that great documentation and great tooling is a major contributor to Rusts success. And one part of that is rustdoc, and doctests.

The fact that you can write documentation and examples, and have those at the same time be part of your testsuite is an extreme productivity booster on the one hand, and equally valuable for potential library users on the other. What makes this even better is the fact that your documentation and examples will never go out of date because they are an integrated part of your testsuite.

# Whats wrong?

But if we look behind the curtain, we can see that one of the greatest features of the Rust ecosystem does not look as pretty on the inside. Let us explore some of the more gruesome sides of it. Maybe you will have the impression that things are barely being held together with doc-tape, pun intended.

# The compilation model

So how do rustdoc doctests work internally?

Rustdoc integrates tightly with the rust compiler, and as a first step it will invoke the rust compiler in a limited capacity. Just enough to resolve #[cfg] attributes and know which items there are and what you are use-ing.

Fun fact: Triple-slash comments are just syntactic sugar for #[doc = "..."] attributes. Also, did you know that you can combine that with cfg_attr too?

Anyway. Now that rustc has resolved all the attributes, and rustdoc has collected all the items it needs to document with their desugared doc attributes, it will then collect individual doctests.

Then, it will do a purely textual transformation to create a small main program for each of the doctests.

Next, each of these snippets will be compiled individually via separate rustc invocations. Some secret environment variables are provided to rustc to try to re-map line numbers as best as possible, though there are bugs.

Finally, the resulting executable will then be run, obviously, and deleted immediately afterwards. Unless you pass the unstable --persist-doctests option.

This is not ideal.

People often criticize Rust for its slow compile times. Clearly those people have never run webpack or the clang static analyzer in cross-translation-unit mode.

But the problem still stands. Rustdoc will compile and link each doctest as an individual executable.

Cargo itself has a similar, but less severe problem as it will compile and link individual executables for each integration test. Hence it is common knowledge that you should delete (all but one) cargo integration tests. I have read previously that some bigger projects even have a "no doctests" policy, though I can’t seem to find a linkable blog post for that. But the reason mentioned there was also the unreasonable blowup in compilation and linking times.

# Workspaces, files and line numbers

To further highlight some of the problems with doctests, I will use the following example workspace with three crates:

// # crate-a/src/lib.rs:

//! Crate A
//!
//! Some random docs
//!
//! ```
//! assert_eq!("a" "b");
//! // ^ crate-a line 6, and yes the typo is intentional ;-)
//! ```

// # crate-b/src/lib.rs:

//! Crate B
//!
//! # Examples
//!
//! ```
//! assert_eq!(1, 2);
//! // ^ crate-b line 6
//! ```

// # crate-c/src/lib.rs:

/// Says hellew
///
/// # Examples
///
/// ```
/// crate_c::hellew();
/// ```
pub fn hellew() {
    ( // <- intentional typo
}

The examples I chose all have different kinds of errors in them, lets see them in action.

First, crate-c has a typo in its Rust source:

> cargo test --doc -p doctest-c
   Compiling doctest-c v0.1.0 (/home/swatinem/Coding/swatinem.de/playground/doctest-c)
error: mismatched closing delimiter: `}`
  --> playground/doctest-c/src/lib.rs:9:5
   |
8  | pub fn hellew() {
   |                 - closing delimiter possibly meant for this
9  |     ( // <- intentional typo
   |     ^ unclosed delimiter
10 | }
   | ^ mismatched closing delimiter

error: could not compile `doctest-c` due to previous error

As we have discussed, doctests link to the underlying Rust library. So cargo will first try to compile that and fail. In this case rustdoc is not even being invoked. Moving on.

Next up, lets compile crate-a which has a typo in the doctest:

> cargo test --doc -p doctest-a
   Doc-tests doctest-a

running 1 test
test src/lib.rs - (line 5) ... FAILED

failures:

---- src/lib.rs - (line 5) stdout ----
error: no rules expected the token `"b"`
 --> src/lib.rs:6:16
  |
3 | assert_eq!("a" "b");
  |               -^^^ no rules expected this token in macro call
  |               |
  |               help: missing comma here

error: aborting due to previous error

Couldn't compile the test.

failures:
    src/lib.rs - (line 5)

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s

So far so good, we ran some tests which eventually failed.

test src/lib.rs, okay. I have a workspace with multiple crates. Which src/lib.rs are you talking about exactly?

The source location is also not quite exact. Line 6 is good enough, but column 16 is a bit off. Off by 4, or "//! ".len() to be exact. But okay, I can live with that.

But the provided source snippet says line 3? Where is that coming from?

Lets look at the third example, crate-b which should compile and fail at runtime.

> cargo test --doc -p doctest-b
   Doc-tests doctest-b

running 1 test
test src/lib.rs - (line 5) ... FAILED

failures:

---- src/lib.rs - (line 5) stdout ----
Test executable failed (exit status: 101).

stderr:
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `2`', src/lib.rs:3:1

The doctest (beginning on line 5) panicked in file src/lib.rs on line 3. Okay? This ominous line 3 again.

# Lets go nightly

Rustdoc and cargo have some unstable nightly-only options that can help a little bit with the encountered problems.

I originally implemented these options to help with better code coverage reports. The -C instrument-coverage option has been stabilized by now. But in order to create code coverage reports you need the unstable --persist-doctests rustdoc option.

Running with code coverage manually is quite a complicated procedure, though at least it is well documented, including instructions on how to use it with rustdoc.

Luckily there is cargo-llvm-cov which makes this a lot more pleasant. Though it has limited support for doctests for reasons.

To demonstrate the problem with code coverage, I will invoke all the necessary tools manually.

> RUSTFLAGS="-C instrument-coverage" \
  RUSTDOCFLAGS="-C instrument-coverage -Z unstable-options --persist-doctests doctestbins" \
  LLVM_PROFILE_FILE="doctests.profraw" \
    cargo +nightly test --doc -p doctest-b

[…] same output as before

I end up with a playground/doctest-b/doctestbins/src_lib_rs_5_0/rust_out executable, and the profiler output in playground/doctest-b/doctests.profraw. Note that both these files ended up in the crate directory, more on that later.

Next up, creating the coverage report:

> llvm-profdata merge -sparse doctest-b/doctests.profraw -o doctest-b/doctests.profdata
> llvm-cov show --object doctest-b/doctestbins/src_lib_rs_5_0/rust_out  --instr-profile doctest-b/doctests.profdata
    1|       |//! Crate B
    2|       |//!
    3|       |//! # Examples
    4|       |//!
    5|      1|//! ```
    6|      1|//! assert_eq!(1, 2);
    7|      1|//! // ^ crate-b line 6
    8|      1|//! ```

So far so good. llvm-cov report --summary-only will also print full file names and reveals to me that I am dealing with a full absolute path.

Now that we have briefly looked at code coverage, lets revisit the earlier examples and use the unstable -Z doctest-in-workspace cargo flag, which internally passes --test-run-directory to rustdoc.

> cargo +nightly test --doc -p doctest-a -Z doctest-in-workspace
   Doc-tests doctest-a

running 1 test
test playground/doctest-a/src/lib.rs - (line 5) ... FAILED

failures:

---- playground/doctest-a/src/lib.rs - (line 5) stdout ----
error: no rules expected the token `"b"`
 --> playground/doctest-a/src/lib.rs:6:16
  |
3 | assert_eq!("a" "b");
  |               -^^^ no rules expected this token in macro call
  |               |
  |               help: missing comma here

error: aborting due to previous error

Couldn't compile the test.

failures:
    playground/doctest-a/src/lib.rs - (line 5)

Nice, now I know which exact file is failing, instead of having to look at the Doc-tests header.

The line/column numbers are still slightly off though.

The failing doctest:

> cargo +nightly test --doc -p doctest-b -Z doctest-in-workspace
   Doc-tests doctest-b

running 1 test
test playground/doctest-b/src/lib.rs - (line 5) ... FAILED

failures:

---- playground/doctest-b/src/lib.rs - (line 5) stdout ----
Test executable failed (exit status: 101).

stderr:
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `1`,
 right: `2`', playground/doctest-b/src/lib.rs:3:1

Same here. I get better workspace-relative filenames, similar to other kinds of tests. But again, the line number is off.

To my own surprise, there is no change when running code coverage tests. In both cases the llvm tools report full absolute paths.

Maybe things have improved here. I remember there were similar issue as with the cargo output, as I developed the doctest-in-workspace option specifically with code coverage in mind. Or maybe my example was too simplistic and I would have needed to have multiple doctests from multiple workspace crates merged into a single code coverage report.

# Where do we go from here?

Well, I initially got the urge to write this blog post as I opened a PR today to stabilize rustdoc --test-run-directory, which itself is just an implementation detail for cargo --doctest-in-workspace which is what I actually want to stabilize.

I hope I have demonstrated with these examples here that cargo --doctest-in-workspace is a nice thing to have. And to even make it the default eventually.

But rustdoc --test-run-directory? Not so sure. This feels like more doc-tape piled on the already way too brittle doctest infrastructure.

Rustdoc doctests need an overhaul.

Instead of a testsuite driven by rustdoc that compiles, links and runs each doctest individually, we should rather have rustdoc output a single binary with a testsuite.

Decouple the compilation of doctests from how they run, and have cargo control the whole process. That way it would better match the way rustc and other kinds of tests are being handled.

It should integrate with check/clippy. With more sophisticated source location tracking, we could have better lines/column numbers in error messages like above, in code coverage reports, or even in #[doc = include_str!(...)].

With a well generated test harness, we could also have a usable --nocapture.

Last but not least, it could lead to better integration with nextest as well.

In the end, rustdoc is still an amazing tool, and doctests an amazing concept.

But there are some mighty skeletons lurking in the closet. I have looked into the belly of the beast and I can say that, sadly, I don’t have the endurance to see such a transformation through. I’m even exhausted after proposing my stabilization PR and writing this blog post.

I do hope that someone will tackle this eventually. As I mentioned in the beginning, documentation and great tooling are a big driver for Rusts continued success, and I am looking forward to seeing things improve over time.