Feedback on Rusts Code Coverage
— 7 minIn my Rust 2021 wishlist, I was expressing my excitement about having high-quality, precise code coverage in Rust. I have been playing around with it a bit these days, and here comes my feedback and a wishlist.
# Why Code Coverage matters
I think this is a matter of personal preference to some degree. But in general I think most developers do care about testing their software. And code coverage simply answers the question what parts of your code you actually test. Every part of the code that is not tested can potentially have bugs. Or maybe that part is also dead code?
The other question is how detailed you want coverage to be. I think that testing every single permutation is a bit overkill, but on the other hand, just having per-function or per-line statistics is way too less in my opinion. For me personally, the sweet spot is at the branch level. I want all the branches in my code to be covered by tests. If one branch is not covered, it might mean that a condition is always true. Either because my tests are lacking, or because I have basically dead code.
Branch-level coverage makes sure that all conditions of conditional-code are
hit. This is especially important when chaining conditions using
short-circuiting operators. So for the expression a() && b()
, b()
will only
be executed if the result of a()
is true. And you can nest a couple of those
conditions.
Also, code coverage is a really nice way to gamify writing tests. Because you can write tests specifically that exercise certain parts of your code base, which also increases your understanding of the code at the same time.
# What I expect from Tools
So far, I have been used to the excellent tools that are available for JavaScript, more specifically istanbul. It does function, statement and branch-level coverage. It does so by instrumenting the code, giving each source-file a preamble with some metadata, and then for each function, statement or branch, it generates code to increment a counter. In post-processing, it will then generate a report out of that.
Since a few versions, node has had builtin coverage which is supposed to be a lot quicker and support "block level" granularity. This does go as deep as expressions, not sure about branches though. But there is c8 which can basically create the same output as istanbul, but a lot quicker. I have tried this some time ago with an older version, but found that the output quality was a bit lacking.
I think in the end both of these tools are reasonably good at producing code coverage for JavaScript, however, JS is not always the code that you write, it just happens to be the code that the engine executes. And the problem now becomes to map between those two. In the JS world, sourcemaps are the way that is done, but they are quite a pain sometimes and the results have varying quality.
Anyway. What I expect is to have a single command line option, or a wrapper around my command that will just magically provide me with a code coverage report that I can view and act on.
# Rust Process
The process of how to get reports is actually well documented, but still super complex.
Lets start from the beginning. The first step is to provide a switch to the rust
compiler instructing it to generate an instrumented library or executable.
Running cargo with RUSTFLAGS="-Zinstrument-coverage"
does a wonderful job
there, but is has one major disadvantage: It passes these rustflags to all of
the code it compiles. But I would argue that in 99% of the cases, you only care
about your own code, which translates to: crates in your workspace. This
creates two problems: I have to explicitly ignore code that is not part of my
workspace, and instrumentation does have a negative effect on both compile time,
and also on runtime. Although I imagine with enough caching this won’t matter
as much. Still, I am paying the cost for something that I don’t want to use!
The second env var that I have to provide is
LLVM_PROFILE_FILE=$PWD/cov/%p.profraw
. Note that I have to provide an
absolute path here, and use the %p
placeholder because unit tests, integration
tests, and also doctests are basically their own executable and are run
independently. Another side-effect of that is cargo#2832 which makes
cargo test
output rather unreadable.
Then run llvm-profdata merge -sparse cov/*.profraw -o coverage.profdata
to
merge these individual files. Well fair enough, I can fully understand why.
I had to basically manually write something like that for istanbul because
that functionality was somehow missing in that ecosystem, oh well.
Now the next step is the really annoying one, as I have to give a list of
objects to llvm-cov
for it to generate a report. I am far from actually
knowing how this all works, but my guess is that it is using the debuginfo
embedded/referenced by the object files to actually map to the sourcefiles and
the line/offset in those. This is super tedious when dealing with cargo. Lets
illustrate this with an example.
So lets start with a really simple toy example.
// src/lib.rs:
/// ```
/// assert_eq!(fucov::generic_fn("doc", "oh hai"), Ok("doctest"));
/// ```
pub fn generic_fn<T>(s: &str, val: T) -> Result<&str, T> {
match s {
"unit" => Ok("unit-test"),
"integration" => Ok("integration-test"),
"doc" => Ok("doctest"),
_ => Err(val),
}
}
#[test]
fn unit_test() {
assert_eq!(generic_fn("unit", 1), Ok("unit-test"));
}
// tests/test_integration.rs:
#[test]
fn integration_test() {
assert_eq!(
fucov::generic_fn("integration", Some(true)),
Ok("integration-test")
);
}
Running my tests, I get the following output:
> cargo +nightly test
Finished test [unoptimized + debuginfo] target(s) in 0.00s
Running target/debug/deps/fucov-e207e6174e8f3968
running 1 test
test unit_test ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Running target/debug/deps/test_integration-d1ff69dad6b5720c
running 1 test
test integration_test ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Doc-tests fucov
running 1 test
test src/lib.rs - generic_fn (line 1) ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Well, so much for clean output… anyway… I do see which executables cargo is
running, which is good. Except, which executable is run for my doctests?
Checking the target/debug/deps
folder, I do see a libfucov-HASH.rlib
, maybe
thats my library that was linked for my doctests?
Playing around with this a little, I played around with variations of this
command line: llvm-cov show --format=html --show-instantiations=false --instr-profile coverage/fucov.profdata --object target/debug/deps/fucov-e207e6174e8f3968 --object target/debug/deps/test_integration-d1ff69dad6b5720c
.
This is really a mouthful. As you can see, I don’t have listed my rlib file
there, as it did not make any difference in output. In the end, I was not able
to get the coverage from my doctest at all. Thats a shame.
Looking at html output or json/lcov output, it was also interesting how this generic function was treated. The command line I gave explicitly excluded showing individual instantiations of the generic, as that would surely be overload if there were lots of generics, as I think there usually are with rust. In that case, also providing a demangler would have been useful as it will show each block of code captioned with the function name.
What I can see from the output is that my unit and integration test functions themselves are also covered, as is kind of expected but a bit useless.
# Conclusion
I’m really impressed with the quality of the reports, although the process to
get there is a bit too convoluted IMO. So the foundation is there, now its just
a matter of optimizing and making it simple to use. And well, maybe I feel like
it and will create my own cargo command called fucov
, which does all that,
because well, fuck off and give me my coverage!