Swatinem Blog Resume

A Rant about Software Bloat

— 6 min

It’s been a while that I wrote a proper rant, but today is the day.

The Rust project is thoroughly tracking the performance of the compiler. For this, there is the Rust compiler performance test suite which includes a number of widely used crates in a fixed version, so that its easier to compare different compiler versions on the same piece of code.

I would actually love to see the opposite. Compiling different versions of a crate with the same compiler. The goal would be to somehow quantify software bloat over time.

Over time, the compiler is getting quicker (though with diminishing returns). A percent here, a percent there. But at the same time, crates add more code. More code to compile equals slower compile times.

It might be new features, some more code to deal with edge-cases, old code thats kept around for backwards compatibility. No matter the reason, over time software tends to inevitably become more complex and accumulate more lines of code.

I can pretty much paraphrase Wirth’s law:

software is getting bloated more rapidly than compilers are becoming faster.

# The elephant in the room

Now comes my main rant, and some tests to demonstrate. The literal elephant (pun intended, because its huge, get it?) in the room I am talking about is the recently released AWS Rust SDK, which is a prime example of bloat.

This doesn’t even come as such a surprise, as I had already experienced a heavily bloated AWS SDK, in TypeScript almost four years ago. There is even a recording where of a talk I gave about profiling (and trying to improve) the memory usage and speed of the TypeScript type checker, you can watch it here.

Back to the topic at hand, and lets actually measure the impact of this.

Lets create an empty Rust project and measure its build times with hyperfine.

We start of with just an empty project created by cargo init bloaty-sdk --bin.

Measuring clean build times with hyperfine --prepare 'cargo clean' 'cargo build' gives us quite a snappy baseline:

  Time (mean ± σ):     377.1 ms ±  85.4 ms    [User: 179.7 ms, System: 212.8 ms]
  Range (min … max):   331.5 ms … 567.6 ms    10 runs

Adding tokio to the mix, to establish a baseline for an async program with a full runtime. The times after a cargo add tokio --features full look like this:

  Time (mean ± σ):      9.165 s ±  0.275 s    [User: 25.922 s, System: 3.422 s]
  Range (min … max):    8.988 s …  9.937 s    10 runs

I now might add that I run these tests on my Ryzen 2700X Desktop which has 8 Cores and 16 Threads, running Windows. But the goal here is not to create a fully scientific and reproducible benchmark.

We can see that we had to wait about 9 seconds for a full compile, using up about 29 seconds of CPU time.

Now add in (part of) the AWS SDK using cargo add aws-config aws-sdk-s3 and try again:

  Time (mean ± σ):     53.932 s ±  1.725 s    [User: 329.753 s, System: 31.435 s]
  Range (min … max):   52.766 s … 55.913 s    3 runs

Wow, this added ~44 seconds of wall time and a whooping 5 minutes of CPU time on top of tokio.

Note that I was only using the S3 part of the SDK, as I my use-case is just downloading some files from a bucket. The landing page of the SDK specifically calls out that it is modular:

to minimize your compile times and binary sizes by only compiling code you actually use.

Really? I’m dying inside. I can’t even imagine how compile times would look like if I pulled in more parts of that SDK.

So you want to tell me that I have to wait a minute to compile this on a fairly beefy machine? Just to download files?

To be fair, lets compare this with reqwest, which is a (probably the most) popular http client crate.

Doing another round of benchmarks, only with tokio and reqwest:

  Time (mean ± σ):     22.074 s ±  0.357 s    [User: 102.893 s, System: 12.058 s]
  Range (min … max):   21.695 s … 22.732 s    10 runs

reqwest is no featherweight either, adding 13 seconds of wall time and a bit over a minute of CPU time on top of tokio. There are more alternatives to choose from for http clients, some of which should be quicker to compile.

But yeah, my point is that downloading things from the web shouldn’t add such bloat. And the compile time overhead of the AWS SDK is beyond reasonable.

# But incremental?

You might rightfully call out that things aren’t as bad in reality, as you are only doing a full clean build about every six weeks after a rustup update, otherwise you are only doing incremental builds in development.

Sure, this is a valid point. Depending on the size of your project, at some point incremental builds are being dominated by link time instead. Even though mold is a thing now, at least on Linux, the speed of linkers tends to be rather sad.

I copy-pasted the example to download a file from S3 and put in some bogus code that would trigger minimal rebuilds / relinks, which in the end became a hyperfine --prepare 'nu -c "date now | save -f src/foo.txt"' 'cargo build':

  Time (mean ± σ):      2.636 s ±  0.042 s    [User: 4.045 s, System: 1.434 s]
  Range (min … max):    2.588 s …  2.714 s    10 runs

This is not so terrible anymore. But it adds to the overall time. Imagine having a real program with a lot more code that is being linked. It’s a death by a thousand papercuts.

Well, that is pretty much all for today, and I want to leave you with a bit of inspiration.

Perfection is attained not when there is nothing more to add, but when there is nothing more to remove.

This quote is attributed to Antoine de Saint Exupéry. It is also pretty much a reflection of Elon Musks and SpaceX’ philosophy that the best part is no part, and to constantly try to remove things. If you are not adding back X%, you are not removing enough.

We need more of this mindset in Software Engineering.