Feedback on Rusts Code Coverage

23 November 2020 — 7 min

In my Rust 2021 wishlist, I was expressing my excitement about having high-quality, precise code coverage in Rust. I have been playing around with it a bit these days, and here comes my feedback and a wishlist.

# Why Code Coverage matters

I think this is a matter of personal preference to some degree. But in general I think most developers do care about testing their software. And code coverage simply answers the question what parts of your code you actually test. Every part of the code that is not tested can potentially have bugs. Or maybe that part is also dead code?

The other question is how detailed you want coverage to be. I think that testing every single permutation is a bit overkill, but on the other hand, just having per-function or per-line statistics is way too less in my opinion. For me personally, the sweet spot is at the branch level. I want all the branches in my code to be covered by tests. If one branch is not covered, it might mean that a condition is always true. Either because my tests are lacking, or because I have basically dead code.

Branch-level coverage makes sure that all conditions of conditional-code are hit. This is especially important when chaining conditions using short-circuiting operators. So for the expression a() && b(), b() will only be executed if the result of a() is true. And you can nest a couple of those conditions.

Also, code coverage is a really nice way to gamify writing tests. Because you can write tests specifically that exercise certain parts of your code base, which also increases your understanding of the code at the same time.

# What I expect from Tools

So far, I have been used to the excellent tools that are available for JavaScript, more specifically istanbul. It does function, statement and branch-level coverage. It does so by instrumenting the code, giving each source-file a preamble with some metadata, and then for each function, statement or branch, it generates code to increment a counter. In post-processing, it will then generate a report out of that.

Since a few versions, node has had builtin coverage which is supposed to be a lot quicker and support "block level" granularity. This does go as deep as expressions, not sure about branches though. But there is c8 which can basically create the same output as istanbul, but a lot quicker. I have tried this some time ago with an older version, but found that the output quality was a bit lacking.

I think in the end both of these tools are reasonably good at producing code coverage for JavaScript, however, JS is not always the code that you write, it just happens to be the code that the engine executes. And the problem now becomes to map between those two. In the JS world, sourcemaps are the way that is done, but they are quite a pain sometimes and the results have varying quality.

Anyway. What I expect is to have a single command line option, or a wrapper around my command that will just magically provide me with a code coverage report that I can view and act on.

# Rust Process

The process of how to get reports is actually well documented, but still super complex.

Lets start from the beginning. The first step is to provide a switch to the rust compiler instructing it to generate an instrumented library or executable. Running cargo with RUSTFLAGS="-Zinstrument-coverage" does a wonderful job there, but is has one major disadvantage: It passes these rustflags to all of the code it compiles. But I would argue that in 99% of the cases, you only care about your own code, which translates to: crates in your workspace. This creates two problems: I have to explicitly ignore code that is not part of my workspace, and instrumentation does have a negative effect on both compile time, and also on runtime. Although I imagine with enough caching this won’t matter as much. Still, I am paying the cost for something that I don’t want to use!

The second env var that I have to provide is LLVM_PROFILE_FILE=$PWD/cov/%p.profraw. Note that I have to provide an absolute path here, and use the %p placeholder because unit tests, integration tests, and also doctests are basically their own executable and are run independently. Another side-effect of that is cargo#2832 which makes cargo test output rather unreadable.

Then run llvm-profdata merge -sparse cov/*.profraw -o coverage.profdata to merge these individual files. Well fair enough, I can fully understand why. I had to basically manually write something like that for istanbul because that functionality was somehow missing in that ecosystem, oh well.

Now the next step is the really annoying one, as I have to give a list of objects to llvm-cov for it to generate a report. I am far from actually knowing how this all works, but my guess is that it is using the debuginfo embedded/referenced by the object files to actually map to the sourcefiles and the line/offset in those. This is super tedious when dealing with cargo. Lets illustrate this with an example.

So lets start with a really simple toy example.

// src/lib.rs:
/// ```
/// assert_eq!(fucov::generic_fn("doc", "oh hai"), Ok("doctest"));
/// ```
pub fn generic_fn<T>(s: &str, val: T) -> Result<&str, T> {
    match s {
        "unit" => Ok("unit-test"),
        "integration" => Ok("integration-test"),
        "doc" => Ok("doctest"),
        _ => Err(val),
    }
}

#[test]
fn unit_test() {
    assert_eq!(generic_fn("unit", 1), Ok("unit-test"));
}


// tests/test_integration.rs:
#[test]
fn integration_test() {
    assert_eq!(
        fucov::generic_fn("integration", Some(true)),
        Ok("integration-test")
    );
}

Running my tests, I get the following output:

> cargo +nightly test

    Finished test [unoptimized + debuginfo] target(s) in 0.00s
     Running target/debug/deps/fucov-e207e6174e8f3968

running 1 test
test unit_test ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/test_integration-d1ff69dad6b5720c

running 1 test
test integration_test ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests fucov

running 1 test
test src/lib.rs - generic_fn (line 1) ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Well, so much for clean output… anyway… I do see which executables cargo is running, which is good. Except, which executable is run for my doctests? Checking the target/debug/deps folder, I do see a libfucov-HASH.rlib, maybe thats my library that was linked for my doctests?

Playing around with this a little, I played around with variations of this command line: llvm-cov show --format=html --show-instantiations=false --instr-profile coverage/fucov.profdata --object target/debug/deps/fucov-e207e6174e8f3968 --object target/debug/deps/test_integration-d1ff69dad6b5720c. This is really a mouthful. As you can see, I don’t have listed my rlib file there, as it did not make any difference in output. In the end, I was not able to get the coverage from my doctest at all. Thats a shame.

Looking at html output or json/lcov output, it was also interesting how this generic function was treated. The command line I gave explicitly excluded showing individual instantiations of the generic, as that would surely be overload if there were lots of generics, as I think there usually are with rust. In that case, also providing a demangler would have been useful as it will show each block of code captioned with the function name.

What I can see from the output is that my unit and integration test functions themselves are also covered, as is kind of expected but a bit useless.

# Conclusion

I’m really impressed with the quality of the reports, although the process to get there is a bit too convoluted IMO. So the foundation is there, now its just a matter of optimizing and making it simple to use. And well, maybe I feel like it and will create my own cargo command called fucov, which does all that, because well, fuck off and give me my coverage!