SwatinemZola2024-03-03T00:00:00+00:00https://swatinem.de/atom.xmlRust `thread_local!`s are surprisingly expensive2024-03-03T00:00:00+00:002024-03-03T00:00:00+00:00
Unknown
https://swatinem.de/blog/slow-thread-local/<p>These last couple of weeks, I have been obsessing over “the cost of observability”, specifically metrics.
This whole topic is quite big and rather something for a conference, literally, as I submitted it as a talk for RustFest this year ;-)</p>
<p>But on this topic, I was experimenting with doing thread-local aggregation of metrics, and doing a ton of profiling
trying to micro-optimize the heck out of it.</p>
<p>To do this thread-local aggregation, I am pulling in the amazing <a href="https://docs.rs/thread_local"><code>thread_local</code> crate</a>,
which I have since also contributed to \o/.</p>
<p>The crate allows you to carry around an arbitrary container of data in your <code>struct</code>s, which internally
manages a concurrent list which is indexed into using a <em>truely</em> thread-local index.</p>
<p>The most awesome thing about this crate is that it also allows to iterate over <em>all</em> the thread local values if you want to.
This makes it perfect for thread-local aggregation which is then aggregated once every N seconds across the whole process.</p>
<p>Though this also comes at one disadvantage because you have to wrap your data in a thread-safe <code>Mutex</code> if you want to modify it.</p>
<hr />
<p>As I was then profiling the implementation, I was really shocked by what I saw.
Here is a flamegraph focussed on the main point I want to make:</p>
<p><img src="https://swatinem.de/blog/slow-thread-local/./thread_local-overhead.png" alt="" /></p>
<p>What we see here is two things:</p>
<p>First, an uncontended <code>Mutex</code> which is only used by a single thread (for 99.9% of the time) is quite fast.
As another note: I created this profile on macOS (yeah, I know) which, as you see, is still using <code>pthread_mutex</code> under the hood.
macOS will soon catch up to other OSs which already have a custom <code>Mutex</code> implementation, so things should get even faster still.</p>
<p>But more shockingly, the “true” Rust <code>thread_local!</code> is surprisingly slow. This is literally a <code>Cell<Option<Thread>></code> in this case,
where <code>Thread</code> is just a bunch of <code>usize</code>s that are used to access the desired data in the concurrent list.
It is just <code>40</code> bytes on x64. So how can this <code>Copy</code> be this slow?</p>
<p>As another fun fact to put things into perspective: The very slim line on the left of the flamegraph that you can barely make out is a
<code>OnceLock::get()</code>, which involves an atomic read.</p>
<p>Or maybe this atomic is actually what makes the <code>thread_local!</code> slow? Who knows.</p>
<hr />
<p>I pretty much just wanted to share this surprising outcome. This is also not really news.
@matklad has already <a href="https://matklad.github.io/2020/10/03/fast-thread-locals-in-rust.html">blogged about this</a> years ago.
Also NOTE that a nightly-only <code>#[thread_local]</code> attribute exists as well, which the <code>thread_local</code> crate will use when the <code>nightly</code> feature is enabled.
I haven’t tested how much faster that would be, if at all.
There is also an initiative underway in the Rust compiler to <a href="https://github.com/rust-lang/rust/issues/110897">improve <code>thread_local!</code> implementation details.</a>.</p>
<p>This is pretty much all I wanted to share, as this is also not the first time that I see <code>thread_local!</code> being slow.
As there is an ongoing initiative to clean it up, plus the nightly-only <code>#[thread_local]</code> looming on the horizon,
I am quite hopeful that things will improve sometime in the future though.</p>
A real look at inflation2023-12-31T00:00:00+00:002023-12-31T00:00:00+00:00
Unknown
https://swatinem.de/blog/real-inflation/<p>We have all seen it in the news, and we are feeling it in our own pockets.
Things are getting more expensive. I was also quite shocked recently to see that
a beer now consistently costs more than 5€ in restaurants, no matter where I go.</p>
<p>However, as a proper conspiracy nut, I do not trust the <em>official</em> numbers at all.
So I have my own numbers. I started tracking all my expenses and finances in
quite some detail in 2014.</p>
<p>I use <code>gnucash</code> for that, which allows me to categorize things, and create reports from all the data.
I can highly recommend keeping track of such things yourself.</p>
<p>I have also been living in the same apartment ever since,
so the numbers should be fairly constant. Similarly, my consumption habits have not
changed that wildly over the years, and I have always been a fairly stingy person that
is not throwing money around.</p>
<p>So lets have a look how things look from my very own perspective.
For comparison, I use the <a href="https://www.statistik.at/statistiken/volkswirtschaft-und-oeffentliche-finanzen/preise-und-preisindizes/verbraucherpreisindex-vpi/hvpi">official Austrian VPI (Verbraucherpreisindex) numbers</a>.</p>
<p>To summarize, the official numbers say this:</p>
<table><thead><tr><th>Year</th><th>Change YoY</th><th>Cumulative Change</th></tr></thead><tbody>
<tr><td>2014</td><td>-</td><td>100%</td></tr>
<tr><td>2015</td><td>0,9%</td><td>100,9%</td></tr>
<tr><td>2016</td><td>0,9%</td><td>101,8%</td></tr>
<tr><td>2017</td><td>2,1%</td><td>103,9%</td></tr>
<tr><td>2018</td><td>2,0%</td><td>106%</td></tr>
<tr><td>2019</td><td>1,5%</td><td>107,6%</td></tr>
<tr><td>2020</td><td>1,4%</td><td>109,1%</td></tr>
<tr><td>2021</td><td>2,8%</td><td>112,2%</td></tr>
<tr><td>2022</td><td>8,6%</td><td>121,8%</td></tr>
<tr><td>2023</td><td>~5,5%</td><td>~128,5%</td></tr>
</tbody></table>
<p>Some <em>NOTES</em> on this table:</p>
<ul>
<li>It looks like austria resets the statistics every 5 years, there is VPI 2010, VPI 2015, VPI 2020, etc.</li>
<li>Since I started my own accounting in 2014, I count the cumulative change from that year.</li>
<li>There is no official numbers for 2023 yet, and I just approximated it at <em>5,5%</em>.</li>
<li>This approximation is based on the VPI 2020 in December 2022 which was <code>116,1</code> and the November 2023 which was at <code>122,1</code>.</li>
<li>I just assume a growth to <code>122,5</code>. Doing the math <code>(122,5 - 116,1) / 116,1</code> gives me <strong>5,5%</strong>.</li>
</ul>
<p>I’m not a statistician and this is just based on my assumptions and some crude math,
so I’m curious when the official numbers for 2023 will be released.</p>
<p>Another thing to note is that my wife moved in with me in 2020, and I am tracking our combined expenses.
And quite obviously, two people spend more than a single person, so expect a jump in expenses in 2020/2021.</p>
<h2 id="fixed-expenses"><a class="anchor-link" href="#fixed-expenses" aria-label="Anchor link for: fixed-expenses">#</a>
Fixed expenses</h2>
<p>Here is my expenses related to housing. I live in an apartment block and I own my apartment outright.
This number is all the expenses related to that, like maintenance, cleaning, insurance, repair fund, etc.</p>
<table><thead><tr><th>Year</th><th>Absolute Number</th><th>Change YoY</th><th>Cumulative Change</th></tr></thead><tbody>
<tr><td>2014</td><td>824€</td><td>-</td><td>100%</td></tr>
<tr><td>2015</td><td>1070€</td><td>29,9%</td><td>129,9%</td></tr>
<tr><td>2016</td><td>1050€</td><td>-1,9%</td><td>127,4%</td></tr>
<tr><td>2017</td><td>1068€</td><td>1,7%</td><td>129,6%</td></tr>
<tr><td>2018</td><td>1056€</td><td>-1,1%</td><td>128,2%</td></tr>
<tr><td>2019</td><td>1140€</td><td>8,0%</td><td>138,3%</td></tr>
<tr><td>2020</td><td>1351€</td><td>18,5%</td><td>164,0%</td></tr>
<tr><td>2021</td><td>1886€</td><td>39,6%</td><td>228,9%</td></tr>
<tr><td>2022</td><td>1991€</td><td>5,6%</td><td>241,6%</td></tr>
<tr><td>2023</td><td>1992€</td><td>0,1%</td><td>241,7%</td></tr>
</tbody></table>
<p>I’m actually surprised this was quite stable or even went lower in some periods. But the grand total change <em>a lot</em>.</p>
<hr />
<p>Lets take a look at <strong>electricity</strong>:</p>
<table><thead><tr><th>Year</th><th>Absolute Number</th><th>Change YoY</th><th>Cumulative Change</th></tr></thead><tbody>
<tr><td>2014</td><td>364€</td><td>-</td><td>100%</td></tr>
<tr><td>2015</td><td>299€</td><td>-17,9%</td><td>82,1%</td></tr>
<tr><td>2016</td><td>305€</td><td>2,0%</td><td>83,8%</td></tr>
<tr><td>2017</td><td>343€</td><td>12,5%</td><td>94,2%</td></tr>
<tr><td>2018</td><td>360€</td><td>5,0%</td><td>98,9%</td></tr>
<tr><td>2019</td><td>341€</td><td>-5,3%</td><td>93,7%</td></tr>
<tr><td>2020</td><td>387€</td><td>13,5%</td><td>106,3%</td></tr>
<tr><td>2021</td><td>564€</td><td>45,7%</td><td>154,9%</td></tr>
<tr><td>2022</td><td>486€</td><td>-13,8%</td><td>133,5%</td></tr>
<tr><td>2023</td><td>328€</td><td>-32,5%</td><td>90,1%</td></tr>
</tbody></table>
<p>Again, I am quite surprised to see that the numbers have been fairly constant over the years,
except for a hard squeeze in 2021. The austrian government implemented a cap on electricity costs at some point, and it
looks like I am profiting from that, as the total yearly expense has been going down the last two years.</p>
<hr />
<p>On to <strong>heating</strong>:</p>
<table><thead><tr><th>Year</th><th>Absolute Number</th><th>Change YoY</th><th>Cumulative Change</th></tr></thead><tbody>
<tr><td>2014</td><td>381€</td><td>-</td><td>100%</td></tr>
<tr><td>2015</td><td>345€</td><td>-9,4%</td><td>90,6%</td></tr>
<tr><td>2016</td><td>323€</td><td>-6,4%</td><td>84,8%</td></tr>
<tr><td>2017</td><td>391€</td><td>21,1%</td><td>102,6%</td></tr>
<tr><td>2018</td><td>367€</td><td>-6,1%</td><td>96,3%</td></tr>
<tr><td>2019</td><td>378€</td><td>3,0%</td><td>99,2%</td></tr>
<tr><td>2020</td><td>372€</td><td>-1,6%</td><td>97,6%</td></tr>
<tr><td>2021</td><td>758€</td><td>103,8%</td><td>199,0%</td></tr>
<tr><td>2022</td><td>727€</td><td>-4,1%</td><td>190,8%</td></tr>
<tr><td>2023</td><td>1.034€</td><td>42,2%</td><td>271,4%</td></tr>
</tbody></table>
<p>Now there was a steep hike in 2021, but that is rather explained by more people in the household taking more warm
showers. Feel free to insert a joke about men not showing often enough here.</p>
<hr />
<p>Now the <strong>internet</strong> and mobile:</p>
<table><thead><tr><th>Year</th><th>Absolute Number</th><th>Change YoY</th><th>Cumulative Change</th></tr></thead><tbody>
<tr><td>2014</td><td>359€</td><td>-</td><td>100%</td></tr>
<tr><td>2015</td><td>388€</td><td>8,1%</td><td>108,1%</td></tr>
<tr><td>2016</td><td>432€</td><td>11,3%</td><td>120,3%</td></tr>
<tr><td>2017</td><td>428€</td><td>-0,9%</td><td>119,2%</td></tr>
<tr><td>2018</td><td>423€</td><td>-1,2%</td><td>117,8%</td></tr>
<tr><td>2019</td><td>431€</td><td>1,9%</td><td>120,1%</td></tr>
<tr><td>2020</td><td>505€</td><td>17,2%</td><td>140,7%</td></tr>
<tr><td>2021</td><td>559€</td><td>10,7%</td><td>155,7%</td></tr>
<tr><td>2022</td><td>593€</td><td>6,1%</td><td>165,2%</td></tr>
<tr><td>2023</td><td>613€</td><td>3,4%</td><td>170,8%</td></tr>
</tbody></table>
<p>The years 2017-2019 are interesting because I summed up the internet and mobile expenses.
The mobile expenses declined, while the internet got constantly more expensive.
Again, the jump in 2020 can be explained by now paying for two mobile plans instead of one.</p>
<p>The most surprising thing for me is that the official statistics claim that communication services got <em>cheaper</em> over
the years. That is definitely something I cannot confirm. Maybe the <em>per unit</em> price is lower in total.
I was forcibly upgraded to a higher plan a couple of times during the years. I got faster internet, but also paying more.
Another thing of note here is that especially in the early years the reliability of the internet was horrible.
I was often playing multiplayer games back then, and I had to battle with periods of extreme pings and packet loss.</p>
<p>Maybe the official statistics reflect that you are paying less per MB/s than before, who knows.
In total I pay <em>a lot</em> more.</p>
<hr />
<p>Lets sum up and summarize these <em>fixed expenses</em>:</p>
<table><thead><tr><th>Year</th><th>Absolute Number</th><th>Change YoY</th><th>Cumulative Change</th></tr></thead><tbody>
<tr><td>2014</td><td>1.928€</td><td>-</td><td>100%</td></tr>
<tr><td>2015</td><td>2.102€</td><td>9,0%</td><td>109,0%</td></tr>
<tr><td>2016</td><td>2.110€</td><td>0,4%</td><td>109,4%</td></tr>
<tr><td>2017</td><td>2.230€</td><td>5,7%</td><td>115,7%</td></tr>
<tr><td>2018</td><td>2.206€</td><td>-1,1%</td><td>114,4%</td></tr>
<tr><td>2019</td><td>2.290€</td><td>3,8%</td><td>118,8%</td></tr>
<tr><td>2020</td><td>2.615€</td><td>14,2%</td><td>135,6%</td></tr>
<tr><td>2021</td><td>3.767€</td><td>44,1%</td><td>195,4%</td></tr>
<tr><td>2022</td><td>3.797€</td><td>0,8%</td><td>196,9%</td></tr>
<tr><td>2023</td><td>3.967€</td><td>4,5%</td><td>205,8%</td></tr>
</tbody></table>
<p>There you have it, I pay twice as much for all the fixed expenses than I did 10 years ago.
Sure, there is now two people living in this household, and that is an obvious reason
for the larger changes in 2020/2021.</p>
<h2 id="groceries-and-restaurants"><a class="anchor-link" href="#groceries-and-restaurants" aria-label="Anchor link for: groceries-and-restaurants">#</a>
Groceries and Restaurants</h2>
<p>Lets do the groceries first. This is pretty much everything I spend in <strong>supermarkets</strong>.
So this is not limited to just groceries, but also includes things like hygiene products and other household stuff.</p>
<table><thead><tr><th>Year</th><th>Absolute Number</th><th>Change YoY</th><th>Cumulative Change</th></tr></thead><tbody>
<tr><td>2014</td><td>1.374€</td><td>-</td><td>100%</td></tr>
<tr><td>2015</td><td>1.857€</td><td>35,2%</td><td>135,2%</td></tr>
<tr><td>2016</td><td>2.148€</td><td>15,7%</td><td>156,3%</td></tr>
<tr><td>2017</td><td>1.492€</td><td>-30,5%</td><td>108,6%</td></tr>
<tr><td>2018</td><td>1.446€</td><td>-3,1%</td><td>105,2%</td></tr>
<tr><td>2019</td><td>1.259€</td><td>-12,9%</td><td>91,6%</td></tr>
<tr><td>2020</td><td>2.070€</td><td>64,4%</td><td>150,7%</td></tr>
<tr><td>2021</td><td>3.031€</td><td>46,4%</td><td>220,6%</td></tr>
<tr><td>2022</td><td>2.804€</td><td>-7,5%</td><td>204,1%</td></tr>
<tr><td>2023</td><td>3.813€</td><td>36,0%</td><td>277,5%</td></tr>
</tbody></table>
<p>Again, I’m quite surprised by the dip in the middle, but I believe that can be explained by a phase where I was
experimenting a lot with powdered foods. Or maybe by going to <strong>restaurants</strong> more, which is covered by the following table:</p>
<table><thead><tr><th>Year</th><th>Absolute Number</th><th>Change YoY</th><th>Cumulative Change</th></tr></thead><tbody>
<tr><td>2014</td><td>1.549€</td><td>-</td><td>100%</td></tr>
<tr><td>2015</td><td>1.085€</td><td>-30,0%</td><td>70,0%</td></tr>
<tr><td>2016</td><td>1.745€</td><td>60,8%</td><td>112,7%</td></tr>
<tr><td>2017</td><td>1.046€</td><td>-40,1%</td><td>67,5%</td></tr>
<tr><td>2018</td><td>1.906€</td><td>82,2%</td><td>123,0%</td></tr>
<tr><td>2019</td><td>1.520€</td><td>-20,3%</td><td>98,1%</td></tr>
<tr><td>2020</td><td>2.761€</td><td>81,6%</td><td>178,2%</td></tr>
<tr><td>2021</td><td>2.741€</td><td>-0,7%</td><td>177,0%</td></tr>
<tr><td>2022</td><td>4.060€</td><td>48,1%</td><td>262,1%</td></tr>
<tr><td>2023</td><td>1.939€</td><td>-52,2%</td><td>125,2%</td></tr>
</tbody></table>
<p>Another surprise here, we are spending way less money on going out than we used to the previous years. But okay, it also
reflects the increase in spending in supermarkets.</p>
<p>Lets sum this up:</p>
<table><thead><tr><th>Year</th><th>Absolute Number</th><th>Change YoY</th><th>Cumulative Change</th></tr></thead><tbody>
<tr><td>2014</td><td>2.923€</td><td>-</td><td>100%</td></tr>
<tr><td>2015</td><td>2.942€</td><td>0,7%</td><td>100,7%</td></tr>
<tr><td>2016</td><td>3.893€</td><td>32,3%</td><td>133,2%</td></tr>
<tr><td>2017</td><td>2.538€</td><td>-34,8%</td><td>86,8%</td></tr>
<tr><td>2018</td><td>3.352€</td><td>32,1%</td><td>114,7%</td></tr>
<tr><td>2019</td><td>2.779€</td><td>-17,1%</td><td>95,1%</td></tr>
<tr><td>2020</td><td>4.831€</td><td>73,8%</td><td>165,3%</td></tr>
<tr><td>2021</td><td>5.772€</td><td>19,5%</td><td>197,5%</td></tr>
<tr><td>2022</td><td>6.864€</td><td>18,9%</td><td>234,8%</td></tr>
<tr><td>2023</td><td>5.752€</td><td>-16,2%</td><td>196,8%</td></tr>
</tbody></table>
<p>Well sure, obviously for two people, we generally spend more on groceries and going out, its obvious.
In contrast to the fixed expenses, this is also something you can control more easily. Like not going out as often,
which evidently we didn’t do, which case as quite a surprise to me actually.</p>
<h1 id="the-grand-total"><a class="anchor-link" href="#the-grand-total" aria-label="Anchor link for: the-grand-total">#</a>
The grand total</h1>
<p>And here is it, the sum total for the cost of living, including housing and basic necessities:</p>
<table><thead><tr><th>Year</th><th>Absolute Number</th><th>Change YoY</th><th>Cumulative Change</th></tr></thead><tbody>
<tr><td>2014</td><td>4.851€</td><td>-</td><td>100%</td></tr>
<tr><td>2015</td><td>5.044€</td><td>4,0%</td><td>104,0%</td></tr>
<tr><td>2016</td><td>6.003€</td><td>19,0%</td><td>123,7%</td></tr>
<tr><td>2017</td><td>4.768€</td><td>-20,6%</td><td>98,3%</td></tr>
<tr><td>2018</td><td>5.558€</td><td>16,6%</td><td>114,6%</td></tr>
<tr><td>2019</td><td>5.069€</td><td>-8,8%</td><td>104,5%</td></tr>
<tr><td>2020</td><td>7.446€</td><td>46,9%</td><td>153,5%</td></tr>
<tr><td>2021</td><td>9.539€</td><td>28,1%</td><td>196,6%</td></tr>
<tr><td>2022</td><td>10.661€</td><td>11,8%</td><td>219,8%</td></tr>
<tr><td>2023</td><td>9.719€</td><td>-8,8%</td><td>200,4%</td></tr>
</tbody></table>
<p>You can clearly see that spending for two is a lot more than for one, obviously.</p>
<p>Also these numbers only reflect the <strong>absolute basics</strong>: a place to live and food to sustain oneself.
These numbers do not include the <em>wants</em>, like fancy clothing, going to the spa, the cinema, traveling and vacations,
or buying new computer hardware or paying for a car which is by far the biggest expense among them all.
All those things would probably add just as much to the total as the absolute basics.</p>
<p>And again, I am very lucky that I own my apartment outright, and I do not have to pay any rent.
Rent alone would amount for as much as everything else combined.</p>
<p>So all in all, we live a fairly modest life and really don’t spend as much on necessities.
But this was also a very small glimpse into our expenses. There is a ton of stuff I left out from this analysis.</p>
<hr />
<p>This concludes this small deep dive. You can see that the expenses we have more control over have a very high variability.
Too much to draw some hard conclusions from.
The fixed expenses however are growing at an alarming rate, which is much much higher than the officially reported
inflation numbers. Those fixed expenses are also the ones that are best amortized with more people living in an apartment.
Or in other words, the grows would probably be the same if I would still live there alone.
Do with this analysis what you will.</p>
A Rant about Software Bloat2023-12-02T00:00:00+00:002023-12-02T00:00:00+00:00
Unknown
https://swatinem.de/blog/bloaty-mcbloat-sdk/<p>It’s been a while that I wrote a proper rant, but today is the day.</p>
<p>The Rust project is thoroughly tracking the performance of the compiler.
For this, there is the Rust compiler performance test suite which includes a number of widely used crates in a
fixed version, so that its easier to compare different compiler versions on the same piece of code.</p>
<p>I would actually love to see the opposite. Compiling different versions of a crate with the same compiler.
The goal would be to somehow quantify software bloat over time.</p>
<p>Over time, the compiler is getting quicker (though with diminishing returns). A percent here, a percent there.
But at the same time, crates add more code. More code to compile equals slower compile times.</p>
<p>It might be new features, some more code to deal with edge-cases, old code thats kept around for backwards compatibility.
No matter the reason, over time software tends to inevitably become more complex and accumulate more lines of code.</p>
<p>I can pretty much paraphrase <a href="https://en.wikipedia.org/wiki/Wirth's_law">Wirth’s law</a>:</p>
<blockquote>
<p>software is getting bloated more rapidly than compilers are becoming faster.</p>
</blockquote>
<h1 id="the-elephant-in-the-room"><a class="anchor-link" href="#the-elephant-in-the-room" aria-label="Anchor link for: the-elephant-in-the-room">#</a>
The elephant in the room</h1>
<p>Now comes my main rant, and some tests to demonstrate. The literal <em>elephant</em> (pun intended, because its <em>huge</em>, get it?)
in the room I am talking about is the recently released AWS Rust SDK, which is a prime example of bloat.</p>
<p>This doesn’t even come as such a surprise, as I had already experienced a heavily bloated AWS SDK, in TypeScript almost
four years ago. There is even a recording where of a talk I gave about profiling (and trying to improve) the memory
usage and speed of the TypeScript type checker, you can watch it <a href="https://viennajs.org/en/meetup/2020-01/optimizing-nodejs-memory-usage">here</a>.</p>
<p>Back to the topic at hand, and lets actually measure the impact of this.</p>
<p>Lets create an empty Rust project and measure its build times with <a href="https://github.com/sharkdp/hyperfine"><code>hyperfine</code></a>.</p>
<p>We start of with just an empty project created by <code>cargo init bloaty-sdk --bin</code>.</p>
<p>Measuring clean build times with <code>hyperfine --prepare 'cargo clean' 'cargo build'</code> gives us quite a snappy baseline:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span> Time (mean ± σ): 377.1 ms ± 85.4 ms [User: 179.7 ms, System: 212.8 ms]
</span><span> Range (min … max): 331.5 ms … 567.6 ms 10 runs
</span></code></pre>
<p>Adding <code>tokio</code> to the mix, to establish a baseline for an async program with a full runtime.
The times after a <code>cargo add tokio --features full</code> look like this:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span> Time (mean ± σ): 9.165 s ± 0.275 s [User: 25.922 s, System: 3.422 s]
</span><span> Range (min … max): 8.988 s … 9.937 s 10 runs
</span></code></pre>
<p>I now might add that I run these tests on my Ryzen 2700X Desktop which has 8 Cores and 16 Threads, running Windows.
But the goal here is not to create a fully scientific and reproducible benchmark.</p>
<p>We can see that we had to wait about 9 seconds for a full compile, using up about 29 seconds of CPU time.</p>
<p>Now add in (part of) the AWS SDK using <code>cargo add aws-config aws-sdk-s3</code> and try again:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span> Time (mean ± σ): 53.932 s ± 1.725 s [User: 329.753 s, System: 31.435 s]
</span><span> Range (min … max): 52.766 s … 55.913 s 3 runs
</span></code></pre>
<p>Wow, this added ~44 seconds of wall time and a whooping 5 minutes of CPU time on top of tokio.</p>
<p>Note that I was only using the S3 part of the SDK, as my use-case is just downloading some files from a bucket.
The landing page of the SDK specifically calls out that it is modular:</p>
<blockquote>
<p>to minimize your compile times and binary sizes by only compiling code you actually use.</p>
</blockquote>
<p>Really? I’m dying inside. I can’t even imagine how compile times would look like if I pulled in more parts of that SDK.</p>
<hr />
<p>So you want to tell me that I have to wait a minute to compile this on a fairly beefy machine? Just to download files?</p>
<p>To be fair, lets compare this with <code>reqwest</code>, which is a (probably the most) popular http client crate.</p>
<p>Doing another round of benchmarks, only with <code>tokio</code> and <code>reqwest</code>:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span> Time (mean ± σ): 22.074 s ± 0.357 s [User: 102.893 s, System: 12.058 s]
</span><span> Range (min … max): 21.695 s … 22.732 s 10 runs
</span></code></pre>
<p><code>reqwest</code> is no featherweight either, adding 13 seconds of wall time and a bit over a minute of CPU time on top of <code>tokio</code>.
There are more alternatives to choose from for http clients, some of which should be quicker to compile.</p>
<p>But yeah, my point is that downloading things from the web shouldn’t add such bloat. And the compile time overhead of
the AWS SDK is beyond reasonable.</p>
<h1 id="but-incremental"><a class="anchor-link" href="#but-incremental" aria-label="Anchor link for: but-incremental">#</a>
But incremental?</h1>
<p>You might rightfully call out that things aren’t as bad in reality, as you are only doing a full clean build about every
six weeks after a <code>rustup update</code>, otherwise you are only doing incremental builds in development.</p>
<p>Sure, this is a valid point. Depending on the size of your project, at some point incremental builds are being dominated
by link time instead. Even though <code>mold</code> is a thing now, at least on Linux, the speed of linkers tends to be rather sad.</p>
<p>I copy-pasted the <a href="https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/rustv1/examples/s3/src/bin/get-object.rs#L18">example</a>
to download a file from S3 and put in some bogus code that would trigger minimal rebuilds / relinks, which in the end
became a <code>hyperfine --prepare 'nu -c "date now | save -f src/foo.txt"' 'cargo build'</code>:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span> Time (mean ± σ): 2.636 s ± 0.042 s [User: 4.045 s, System: 1.434 s]
</span><span> Range (min … max): 2.588 s … 2.714 s 10 runs
</span></code></pre>
<p>This is not so terrible anymore. But it adds to the overall time. Imagine having a real program with a lot more code that
is being linked. It’s a death by a thousand papercuts.</p>
<hr />
<p>Well, that is pretty much all for today, and I want to leave you with a bit of inspiration.</p>
<blockquote>
<p>Perfection is attained not when there is nothing more to add, but when there is nothing more to remove.</p>
</blockquote>
<p>This quote is attributed to <a href="https://en.wikiquote.org/wiki/Antoine_de_Saint_Exup%C3%A9ry">Antoine de Saint Exupéry</a>.
It is also pretty much a reflection of Elon Musks and SpaceX’ philosophy that the best part is no part,
and to constantly try to remove things.
If you are not adding back X%, you are not removing enough.</p>
<p>We need more of this mindset in Software Engineering.</p>
Choosing a more optimal `String` type2023-09-15T00:00:00+00:002023-09-15T00:00:00+00:00
Unknown
https://swatinem.de/blog/optimized-strings/<p>This week, I have been profiling and measuring the overhead of the Sentry Rust SDK, as another team has reported
a large overhead in their testing. So much so that the team shied away from using it more extensively in combination
with <code>#[tracing::instrument]</code>.</p>
<p>After some profiling, I identified a potential culprit, which was using very high quality randomness in the form of the
<code>getrandom</code> crate, which depending on the operating system was doing syscalls to get true randomness from the operating
system. This was clearly visible in profiles as contributing to SDK overhead. We definitely don’t need high quality
randomness to identify tracing spans, so I switched that to a faster randomness source which is still documented to be
cryptographically secure, though I might decide to further downgrade the quality of the randomness in favor of speed.</p>
<p>But I digress, I really wanted to talk about Strings here.</p>
<p>When profiling, one thing that often sticks out and is a good opportunity for optimization is avoiding allocations.
And there were a couple of allocation-related things visible in the profile. Primarily allocating, copying and freeing
Strings. Optimizing or avoiding these copies should give us some wins in terms of performance and SDK overhead.</p>
<p>Lets take a look at what our use-case is first.</p>
<ul>
<li><strong>Our Strings are immutable.</strong> You set them when initializing the SDK, configuring the Scope, or instrumenting a Span.
They never change.</li>
<li><strong>Our Strings are copied often.</strong> Whenever an event or trace is captured, we copy over some Scope data, like the
release identifier configured during SDK init, or all the tags set on the Scope.</li>
<li><strong>Strings are presumably small.</strong> I don’t have concrete evidence for this, but I would suspect most strings to be short.</li>
<li><strong>The Strings are serialized often.</strong> The strings that are being copied into events are then obviously serialized and
sent to Sentry. Except when they are being discarded inside the SDK because of a configured sampling rate, rate limits
or for other reasons. I’m unsure if we have any other frequent accesses like <code>PartialEq</code> or <code>Hash</code> usage however.</li>
<li><strong>Most of the Strings are <code>Option</code>al</strong>. Most of the properties of Events are <code>Option</code>s.</li>
<li><strong>Protocol types are in need of Optimization</strong>. Not strictly related to our usage of Strings, but all other protocol
types have way too detailed typing, and are not extensible on the other hand. In a ton of situations we might be better
served with just having the option to manually add arbitrary JSON properties.</li>
</ul>
<p>To summarize this again in more technical terms:</p>
<ul>
<li>We want <code>Clone</code> to be cheap, without allocating and copying the actual string contents, aka <code>O(1)</code>.</li>
<li>The type should optimize for <code>Option</code> usage, in particular <code>size_of::<T>() == size_of::<Option<T>>()</code>.</li>
<li>The type should at most as large as <code>String</code>, in particular <code>size_of::<T>() <= size_of::<String>()</code>.</li>
<li>Having Small String Optimization (SSO) is preferable, which means storing <code>N</code> inline without a heap allocation.</li>
<li>Ideally, creating a string should not do a roundtrip allocation.</li>
</ul>
<p>The last point in particular is a pain point with <code>Arc<str></code> for example, as creating it out of a <code>String</code> will almost
always incur a re-allocation. However, that allocation will amortize itself the first time you do a <code>clone()</code>, so might
as well not matter that much in practice.</p>
<p>There is a <em>ton</em> of options to chose from, and in this comparison I am focusing on these contenders:</p>
<ul>
<li><code>std::string::String</code>, obviously</li>
<li><code>std::sync::Arc<str></code></li>
<li><a href="https://crates.io/crates/arcstr"><code>arcstr</code></a></li>
<li><a href="https://crates.io/crates/kstring"><code>kstring</code></a></li>
<li><a href="https://crates.io/crates/smol_str"><code>smol_str</code></a>, used in <code>rust-analyzer</code></li>
<li><a href="https://crates.io/crates/compact_str"><code>compact_str</code></a></li>
<li><a href="https://crates.io/crates/flexstr"><code>flexstr</code></a></li>
<li><a href="https://crates.io/crates/smartstring"><code>smartstring</code></a></li>
</ul>
<p>Here is a quick comparison table looking at the various <code>size_of</code> values, and looking at other properties according to
the docs:</p>
<table><thead><tr><th>name</th><th><code>size_of::<T></code></th><th><code>size_of::<Option<T>></code></th><th>Clone</th><th>SSO</th><th>mutable</th></tr></thead><tbody>
<tr><td><code>String</code></td><td>24</td><td>24</td><td><code>O(n)</code></td><td>-</td><td>yes</td></tr>
<tr><td><code>Arc<str></code></td><td>16</td><td>16</td><td><code>O(1)</code></td><td>-</td><td>no</td></tr>
<tr><td><code>arcstr</code></td><td>8</td><td>8</td><td><code>O(1)</code></td><td>-</td><td>no</td></tr>
<tr><td><code>smol_str</code></td><td>24</td><td>24</td><td><code>O(1)</code></td><td>23</td><td>no</td></tr>
<tr><td><code>kstring</code> (<code>arc</code>)</td><td>24</td><td>32</td><td><code>O(1)</code></td><td>15 / 22</td><td>no</td></tr>
<tr><td><code>flexstr</code></td><td>24</td><td>32</td><td><code>O(1)</code></td><td>22</td><td>no</td></tr>
<tr><td><code>compact_str</code></td><td>24</td><td>24</td><td><code>O(n)</code></td><td>24</td><td>yes</td></tr>
<tr><td><code>smartstring</code></td><td>24</td><td>32</td><td><code>O(n)</code></td><td>23</td><td>yes</td></tr>
</tbody></table>
<p>I have not looked at any runtime performance of these crates, and haven’t checked if conversion from <code>String</code> really
incurs a re-allocation. I assume it does however.</p>
<p>As we can see from that quick table, there doesn’t seem to be any free lunch here. Some of the listed crates do have
small string optimization, but are not optimized for usage with <code>Option</code>.</p>
<p>Depending on which characteristics are most important to us, this leaves us with only <code>smol_str</code> which has SSO, cheap
clones and supports <code>Option</code>. However, it is still the same size as <code>String</code> and not smaller. Given that it is part
of <code>rust-analyzer</code> also gives us confidence that it is of high quality and well maintained.</p>
<p>If we want to aim for small size, <code>arcstr</code> is the way to go, which advertises itself as <em>a better <code>Arc<str></code></em>. It does
not have SSO, but to be honest, I doubt SSO would do much at size <code>8</code>, though I’m not sure what the sweet spot for our
particular use-case would be.</p>
<p>And one should definitely not dismiss <code>Arc<str></code>, which is both small, has cheap clones, and most of all is part of <code>std</code>
and thus the obvious choice if the goal is to minimize external dependencies.</p>
<h1 id="building-strings"><a class="anchor-link" href="#building-strings" aria-label="Anchor link for: building-strings">#</a>
Building Strings</h1>
<p>So far, we have looked at various String types that are good for <em>storing</em> and <em>cloning</em>. But what about creating Strings?</p>
<p>We have already established that <code>Arc<str></code> and most of the other contenders need to re-allocate when creating a new String,
either out of a <code>&str</code>, or from a <code>String</code> itself. Not surprisingly, all the contenders that have <code>O(n)</code> clones allow mutation.
So they are a good option for parsing, and when formatting small strings.</p>
<p>On that note, <code>format!</code> itself is using <code>String</code>, so is <code>to_string</code>. If you want to take advantage of any other string
type that can avoid allocations, you would have to use <code>write!(&mut s, "oh hi: {}", display_type)?</code>, which is a bit unergonomic.
Alternatives might include having an <code>impl From<fmt::Arguments> for MyStringType</code>, which allows using
<code>format_args!("oh hi: {}", display_type).into()</code>. Or having something like <code>impl<D: Display> From<D> for MyStringType</code>,
although I haven’t tried if that actually compiles, or if the impl bounds might be too broad.</p>
<p>Ideally, I would love to have a more flexible type that allows mutable String building, maybe something with a const
generic parameter giving the most flexibility on construction. And then for long term storage, one can do a single copy
/ allocation using <code>arcstr</code> for example. Or any of the other types that have SSO.</p>
<h1 id="conclusion"><a class="anchor-link" href="#conclusion" aria-label="Anchor link for: conclusion">#</a>
Conclusion</h1>
<p>It is really hard to make a concrete choice here. I really want to have cheap clones, and I absolutely want the type
to be optimized for usage with <code>Option</code>, and ideally be smaller than <code>Option<String></code> in the first place.
On the other hand though, the Sentry Rust SDK already has way too many external dependencies as it is, so adding even
more might not be the best thing.</p>
<p>In the end, I believe its a choice between <code>smol_str</code> which seems to be the best choice considering SSO, or <code>arcstr</code>
which seems to be the best choice when optimizing for pure <code>size_of</code>. Or good old <code>Arc<str></code> if we do not want to take
on any new external dependencies.</p>
<p>Either way, to retain maximum flexibility, I might start by defining an opaque newtype which derefs to <code>&str</code> and can
thus impl all the standard traits, especially <code>Display</code> and <code>Serialize</code>, and is constructible out of a <code>&str</code>, <code>String</code>,
and possibly <code>impl Display</code> if I can make that work. With that in place, we can change the internal implementation at
any time without breaking the API.</p>
<p>A big question in the end that still remains is how this can be combined with <code>serde_json::Value</code>, as we use that type
already in a couple of places, and I would like to use it even more, replacing way too detailed type definitions by
having all the types being extendable with a generic <code>Map<String, Value></code>. Especially the keys would probably benefit a
lot from small string optimization. This remains to be seen.</p>
Optimizing Rust Enum `Debug`-ing with Perfect Hashing2023-07-29T00:00:00+00:002023-07-29T00:00:00+00:00
Unknown
https://swatinem.de/blog/optimizing-enums/<p>This weekend is the start of my week of <em>chill at home and do nothing</em> vacation.
Which, for a passionate software engineer, is the perfect time to do some open source work and dive into
interesting topics outside of work.</p>
<p>The topic I will be looking at is optimizing code generation of Rust enums. This deep dive is motivated by a real world
<a href="https://github.com/rust-minidump/rust-minidump/issues/847">issue</a> in the rust <code>minidump(_common)</code> crate.</p>
<p>The <code>minidump_common</code> crate defines two <em>gigantic</em> C-style enums that look a little bit like this:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(u32)]
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Copy</span><span style="color:#61676ccc;">,</span><span> Clone</span><span style="color:#61676ccc;">,</span><span> PartialEq</span><span style="color:#61676ccc;">,</span><span> Eq</span><span style="color:#61676ccc;">,</span><span> Debug</span><span style="color:#61676ccc;">,</span><span> FromPrimitive)]
</span><span style="color:#fa6e32;">pub enum </span><span style="color:#399ee6;">WinErrorWindows </span><span>{
</span><span> </span><span style="color:#ff8f40;">ERROR_SUCCESS </span><span style="color:#ed9366;">= </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ff8f40;">ERROR_INVALID_FUNCTION </span><span style="color:#ed9366;">= </span><span style="color:#ff8f40;">1</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ff8f40;">ERROR_FILE_NOT_FOUND </span><span style="color:#ed9366;">= </span><span style="color:#ff8f40;">2</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ff8f40;">ERROR_PATH_NOT_FOUND </span><span style="color:#ed9366;">= </span><span style="color:#ff8f40;">3</span><span style="color:#61676ccc;">,
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// ... about ~2_000 more variants
</span><span>}
</span></code></pre>
<p>It is a data-less enum with an explicit <code>u32</code> discriminant, the discriminant values are explicitly assigned,
<em>and</em> the discriminant values are non-contiguous, meaning they have gaps in them.</p>
<p>The problem with this enum is that the <code>Debug</code> and <code>FromPrimitive</code> derives create <em>a ton</em> of bloat.</p>
<p>How much you ask? Well, according to <code>cargo bloat</code>, these two derives are among the top offenders compiling
<a href="https://github.com/getsentry/symbolicator"><code>symbolicator</code></a>, any <code>symbolicator</code> is a <em>huge</em> crate:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>File .text Size Crate Name
</span><span> 0.3% 0.5% 108.8KiB minidump_common <minidump_common::errors::windows::WinErrorWindows as core::fmt::Debug>::fmt
</span><span> 0.2% 0.3% 62.0KiB minidump_common <minidump_common::errors::windows::WinErrorWindows as num_traits::cast::FromPrimitive>::from_u64
</span></code></pre>
<p>Another indicator of this bloat is <code>cargo llvm-lines</code>, which has the following to say about our enum:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span> Lines Copies Function name
</span><span> ----- ------ -------------
</span><span> 9852153 271193 (TOTAL)
</span><span> 73972 (0.8%, 1.5%) 6184 (2.3%, 3.4%) <&T as core::fmt::Debug>::fmt
</span><span> 2832 (0.0%, 41.3%) 1 (0.0%, 25.6%) <minidump_common::errors::windows::WinErrorWindows as core::fmt::Debug>::fmt
</span><span> 2830 (0.0%, 41.4%) 1 (0.0%, 25.6%) <minidump_common::errors::windows::WinErrorWindows as num_traits::cast::FromPrimitive>::from_u64
</span></code></pre>
<p>The <code>cargo llvm-lines</code> output does not flag it as such a huge offender, and at ~2000 lines, there is
barely any improvements to be had, as that is fairly close to the number of variants this enum has.</p>
<h1 id="debug-ing-enum-codegen"><a class="anchor-link" href="#debug-ing-enum-codegen" aria-label="Anchor link for: debug-ing-enum-codegen">#</a>
<code>Debug</code>-ing enum codegen</h1>
<p>Lets break this whole problem down and start from the beginning with a dead simple enum.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Debug)]
</span><span style="color:#fa6e32;">pub enum </span><span style="color:#399ee6;">Enum </span><span>{
</span><span> A</span><span style="color:#61676ccc;">,
</span><span> B</span><span style="color:#61676ccc;">,
</span><span> C</span><span style="color:#61676ccc;">,
</span><span> D</span><span style="color:#61676ccc;">,
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">fmt</span><span>(</span><span style="color:#ff8f40;">e</span><span style="color:#61676ccc;">:</span><span> Enum) </span><span style="color:#61676ccc;">-></span><span> String {
</span><span> </span><span style="color:#f07171;">format!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{e:?}</span><span style="color:#86b300;">"</span><span>)
</span><span>}
</span></code></pre>
<p>Nothing out of the ordinary, and we can use this example to illustrate a couple of things.</p>
<p>First, lets see what kind of Rust code is being generated on behalf of that <code>#[derive(Debug)]</code>.
We can do that directly on the <a href="https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=6b1c66d8940b59d22258c6062c30ed9a">Rust Playground</a>.
Just choose <em>expand macros</em> from the <em>tools</em> dropdown. This is what it looks like:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub enum </span><span style="color:#399ee6;">Enum </span><span>{ A</span><span style="color:#61676ccc;">,</span><span> B</span><span style="color:#61676ccc;">,</span><span> C</span><span style="color:#61676ccc;">,</span><span> D</span><span style="color:#61676ccc;">, </span><span>}
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">automatically_derived</span><span>]
</span><span style="color:#fa6e32;">impl </span><span style="color:#ed9366;">::</span><span>core</span><span style="color:#ed9366;">::</span><span>fmt</span><span style="color:#ed9366;">::</span><span>Debug </span><span style="color:#fa6e32;">for </span><span style="color:#399ee6;">Enum </span><span>{
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">fmt</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">f</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut </span><span style="color:#ed9366;">::</span><span>core</span><span style="color:#ed9366;">::</span><span>fmt</span><span style="color:#ed9366;">::</span><span>Formatter) </span><span style="color:#61676ccc;">-> </span><span style="color:#ed9366;">::</span><span>core</span><span style="color:#ed9366;">::</span><span>fmt</span><span style="color:#ed9366;">::</span><span>Result {
</span><span> </span><span style="color:#ed9366;">::</span><span>core</span><span style="color:#ed9366;">::</span><span>fmt</span><span style="color:#ed9366;">::</span><span>Formatter</span><span style="color:#ed9366;">::</span><span>write_str(f</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#fa6e32;">match </span><span style="font-style:italic;color:#55b4d4;">self </span><span>{
</span><span> Enum</span><span style="color:#ed9366;">::</span><span>A </span><span style="color:#ed9366;">=> </span><span style="color:#86b300;">"A"</span><span style="color:#61676ccc;">,
</span><span> Enum</span><span style="color:#ed9366;">::</span><span>B </span><span style="color:#ed9366;">=> </span><span style="color:#86b300;">"B"</span><span style="color:#61676ccc;">,
</span><span> Enum</span><span style="color:#ed9366;">::</span><span>C </span><span style="color:#ed9366;">=> </span><span style="color:#86b300;">"C"</span><span style="color:#61676ccc;">,
</span><span> Enum</span><span style="color:#ed9366;">::</span><span>D </span><span style="color:#ed9366;">=> </span><span style="color:#86b300;">"D"</span><span style="color:#61676ccc;">,
</span><span> })
</span><span> }
</span><span>}
</span></code></pre>
<p>Very straight forward, no surprises here.</p>
<p>When this is being compiled into native code by LLVM, and checking the output in the <a href="https://godbolt.org/z/b85G4qceh">Compiler explorer</a>,
I got my first <em>positive shock</em>, as this is its output:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span><example::Enum as core::fmt::Debug>::fmt:
</span><span> mov rax, rsi
</span><span> movzx ecx, byte ptr [rdi]
</span><span> lea rdx, [rip + .Lreltable.<example::Enum as core::fmt::Debug>::fmt]
</span><span> movsxd rsi, dword ptr [rdx + 4*rcx]
</span><span> add rsi, rdx
</span><span> mov edx, 1
</span><span> mov rdi, rax
</span><span> jmp qword ptr [rip + _ZN4core3fmt9Formatter9write_str17hdb374abbd294d87eE@GOTPCREL]
</span><span>
</span><span>.Lreltable.<example::Enum as core::fmt::Debug>::fmt:
</span><span> .long .L__unnamed_3-.Lreltable.<example::Enum as core::fmt::Debug>::fmt
</span><span> .long .L__unnamed_4-.Lreltable.<example::Enum as core::fmt::Debug>::fmt
</span><span> .long .L__unnamed_5-.Lreltable.<example::Enum as core::fmt::Debug>::fmt
</span><span> .long .L__unnamed_6-.Lreltable.<example::Enum as core::fmt::Debug>::fmt
</span><span>
</span><span>.L__unnamed_3:
</span><span> .byte 65
</span><span>
</span><span>.L__unnamed_4:
</span><span> .byte 66
</span><span>
</span><span>.L__unnamed_5:
</span><span> .byte 67
</span><span>
</span><span>.L__unnamed_6:
</span><span> .byte 68
</span></code></pre>
<p>LLVM is smart enough to turn all of this into a bunch of <em>labeled</em> bytes representing our single-character enum variant names,
and a table that references them. As the compiler knows that all our variant names are just a single character,
it will hardcode the value <code>1</code> in there.</p>
<p>To be perfectly frank, the compiler could do even better in this case ;-)
If all our names are just single characters, and we have dense discriminants,
we wouldn’t need a table at all, we could just directly index into a flat string.</p>
<hr />
<p>But okay, so far I am already impressed. Lets build on this example and make the discriminant names variable-length:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Debug)]
</span><span style="color:#fa6e32;">pub enum </span><span style="color:#399ee6;">Enum </span><span>{
</span><span> A</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ff8f40;">BB</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ff8f40;">CCC</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ff8f40;">ABCD</span><span style="color:#61676ccc;">,
</span><span>}
</span></code></pre>
<p>And the assembly output is now this:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span><example::Enum as core::fmt::Debug>::fmt:
</span><span> mov rax, rsi
</span><span> movzx ecx, byte ptr [rdi]
</span><span> lea rdx, [rip + .Lswitch.table.<example::Enum as core::fmt::Debug>::fmt]
</span><span> mov rdx, qword ptr [rdx + 8*rcx]
</span><span> lea rdi, [rip + .Lreltable.<example::Enum as core::fmt::Debug>::fmt]
</span><span> movsxd rsi, dword ptr [rdi + 4*rcx]
</span><span> add rsi, rdi
</span><span> mov rdi, rax
</span><span> jmp qword ptr [rip + _ZN4core3fmt9Formatter9write_str17hdb374abbd294d87eE@GOTPCREL]
</span><span>
</span><span>.Lswitch.table.<example::Enum as core::fmt::Debug>::fmt:
</span><span> .quad 1
</span><span> .quad 2
</span><span> .quad 3
</span><span> .quad 4
</span><span> […]
</span><span>
</span><span>.Lreltable.<example::Enum as core::fmt::Debug>::fmt:
</span><span> .long .L__unnamed_3-.Lreltable.<example::Enum as core::fmt::Debug>::fmt
</span><span> .long .L__unnamed_4-.Lreltable.<example::Enum as core::fmt::Debug>::fmt
</span><span> .long .L__unnamed_5-.Lreltable.<example::Enum as core::fmt::Debug>::fmt
</span><span> .long .L__unnamed_6-.Lreltable.<example::Enum as core::fmt::Debug>::fmt
</span><span> […]
</span><span>
</span><span>.L__unnamed_3:
</span><span> .byte 65
</span><span>
</span><span>.L__unnamed_4:
</span><span> .zero 2,66
</span><span>
</span><span>.L__unnamed_5:
</span><span> .zero 3,67
</span><span>
</span><span>.L__unnamed_6:
</span><span> .ascii "ABCD"
</span></code></pre>
<p>As we can see, we now end up with two tables, one for the length of the names, and the second as before,
with a pointer to the raw bytes.
Also interesting that the compiler generates different assembler instructions depending on whether
the letters are repeating, and how long the string is.</p>
<hr />
<p>Our next experiment is making this a sparse enum by explicitly assigning discriminant values.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Debug)]
</span><span style="color:#fa6e32;">pub enum </span><span style="color:#399ee6;">Enum </span><span>{
</span><span> A </span><span style="color:#ed9366;">= </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ff8f40;">BB </span><span style="color:#ed9366;">= </span><span style="color:#ff8f40;">4</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ff8f40;">CCC</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ff8f40;">ABCD</span><span style="color:#61676ccc;">,
</span><span>}
</span></code></pre>
<p>Again, a positive surprise from LLVM, as it will just insert duplicated entries into the tables,
trading a bit of wasted space for efficient code:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>.Lswitch.table.<example::Enum as core::fmt::Debug>::fmt:
</span><span> .quad 1
</span><span> .quad 1
</span><span> .quad 1
</span><span> .quad 1
</span><span> .quad 2
</span><span> .quad 3
</span><span> .quad 4
</span><span>
</span><span>.Lreltable.<example::Enum as core::fmt::Debug>::fmt:
</span><span> .long .L__unnamed_3-.Lreltable.<example::Enum as core::fmt::Debug>::fmt
</span><span> .long .L__unnamed_3-.Lreltable.<example::Enum as core::fmt::Debug>::fmt
</span><span> .long .L__unnamed_3-.Lreltable.<example::Enum as core::fmt::Debug>::fmt
</span><span> .long .L__unnamed_3-.Lreltable.<example::Enum as core::fmt::Debug>::fmt
</span><span> .long .L__unnamed_4-.Lreltable.<example::Enum as core::fmt::Debug>::fmt
</span><span> .long .L__unnamed_5-.Lreltable.<example::Enum as core::fmt::Debug>::fmt
</span><span> .long .L__unnamed_6-.Lreltable.<example::Enum as core::fmt::Debug>::fmt
</span></code></pre>
<p>This optimization only triggers up to a certain threshold of course,
and when inserting a gap of ~200 in between the discriminants,
we end up with some vastly worse code being generated,
with a chain of comparisons and conditional jumps:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span><example::Enum as core::fmt::Debug>::fmt:
</span><span> mov rax, rsi
</span><span> movzx ecx, byte ptr [rdi]
</span><span> cmp ecx, 200
</span><span> jg .LBB1_5
</span><span> test ecx, ecx
</span><span> jne .LBB1_3
</span><span> lea rsi, [rip + .L__unnamed_2]
</span><span> mov edx, 1
</span><span> mov rdi, rax
</span><span> jmp qword ptr [rip + _ZN4core3fmt9Formatter9write_str17hdb374abbd294d87eE@GOTPCREL]
</span><span>.LBB1_5:
</span><span> cmp ecx, 201
</span><span> jne .LBB1_6
</span><span> lea rsi, [rip + .L__unnamed_3]
</span><span> mov edx, 3
</span><span> mov rdi, rax
</span><span> jmp qword ptr [rip + _ZN4core3fmt9Formatter9write_str17hdb374abbd294d87eE@GOTPCREL]
</span><span>.LBB1_3:
</span><span> lea rsi, [rip + .L__unnamed_4]
</span><span> mov edx, 2
</span><span> mov rdi, rax
</span><span> jmp qword ptr [rip + _ZN4core3fmt9Formatter9write_str17hdb374abbd294d87eE@GOTPCREL]
</span><span>.LBB1_6:
</span><span> lea rsi, [rip + .L__unnamed_5]
</span><span> mov edx, 4
</span><span> mov rdi, rax
</span><span> jmp qword ptr [rip + _ZN4core3fmt9Formatter9write_str17hdb374abbd294d87eE@GOTPCREL]
</span></code></pre>
<p>This is not only bad because it generates <em>a ton</em> of code that will lead to binary bloat as is evident from the example
I started out with. But this chain of comparisons also means that the <code>Debug</code> impl effectively scales with the number
of variants in the enum, which is not particularly great.</p>
<h1 id="perfect-hashing-to-the-rescue"><a class="anchor-link" href="#perfect-hashing-to-the-rescue" aria-label="Anchor link for: perfect-hashing-to-the-rescue">#</a>
Perfect Hashing to the rescue?</h1>
<p>Lets rewind a bit and take another look at how our enum looks like:</p>
<ul>
<li>It is data-less with an explicit <code>#[repr(u32)]</code>.</li>
<li>It has explicitly assigned discriminants which are sparse / non-contiguous.</li>
<li>Plus: it has a finite / exhaustive number of discriminants.</li>
<li>And we will only use valid discriminants in our <code>Debug</code> impl.</li>
</ul>
<p>With these assumptions in place, we are searching for something that can map a finite
number of discriminants to a table of values.
The data-structure that can do this is a <code>HashMap</code> of course.
And since all the discriminants and values are known at compile time,
we can use a perfect hash table (PHT) to encode everything statically at compile time.</p>
<p>I remember I read a very good article about how perfect hashing works in theory,
but I can’t seem to find it right now, so I will have a go at explaining it.</p>
<p>Compared to a normal hash table, that always has some spare capacity, and needs
to account for some hash collisions, the perfect hash table will always map
<em>valid</em> hash keys to their values directly, using as little spare capacity if possible.</p>
<p>Such a hash function might work like this, simplified:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span>(input_value</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">wrapping_add</span><span>(</span><span style="color:#ff8f40;">STARTING_OFFSET</span><span>) </span><span style="color:#ed9366;">^ </span><span style="color:#ff8f40;">XOR_VALUE</span><span>) </span><span style="color:#ed9366;">% </span><span style="color:#ff8f40;">TABLE_SIZE
</span></code></pre>
<p>There are sophisticated algorithms that will find appropriate values for <code>STARTING_OFFSET</code> and <code>XOR_VALUE</code> while
minimizing <code>TABLE_SIZE</code>.</p>
<p>But we just have <code>4</code> discriminants from our example above: <code>[0, 200, 201, 202]</code>.
It should be possible to brute-force some values here that fit us just right.
I chose to allow for some slack space as the brute-forcing exact matches did not
yield any hits in a reasonable time, whereas when allowing some wasted space,
it gave me an answer immediately.</p>
<p>Here is the very naive implementation:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">use </span><span>rand</span><span style="color:#ed9366;">::</span><span>prelude</span><span style="color:#ed9366;">::*</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">validate_no_duplicate_idx</span><span>(</span><span style="color:#ff8f40;">indices</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span>[</span><span style="color:#fa6e32;">u32</span><span>]) </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">bool </span><span>{
</span><span> </span><span style="color:#fa6e32;">let mut</span><span> indices_hit</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32 </span><span style="color:#ed9366;">= </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">for</span><span> idx </span><span style="color:#ed9366;">in</span><span> indices {
</span><span> indices_hit </span><span style="color:#ed9366;">|= </span><span style="color:#ff8f40;">1 </span><span style="color:#ed9366;"><<</span><span> idx</span><span style="color:#61676ccc;">;
</span><span> }
</span><span> indices_hit</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">count_ones</span><span>() </span><span style="color:#ed9366;">==</span><span> indices</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">len</span><span>() </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">u32
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Debug)]
</span><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">PhfResult</span><span><</span><span style="color:#fa6e32;">const</span><span> N</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span>> {
</span><span> start_value</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> xor_value</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> table_size</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> table_indices</span><span style="color:#61676ccc;">:</span><span> [</span><span style="color:#fa6e32;">u32</span><span>; N],
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">generate_perfect_hash_values</span><span><</span><span style="color:#fa6e32;">const</span><span> N</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span>>(
</span><span> </span><span style="color:#ff8f40;">input_values</span><span style="color:#61676ccc;">:</span><span> [</span><span style="color:#fa6e32;">u32</span><span>; N],
</span><span> </span><span style="color:#ff8f40;">slack_space</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span>,
</span><span>) </span><span style="color:#61676ccc;">-> </span><span>PhfResult<N> {
</span><span> </span><span style="color:#fa6e32;">let mut</span><span> rng </span><span style="color:#ed9366;">= </span><span>rand</span><span style="color:#ed9366;">::</span><span>thread_rng()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">loop </span><span>{
</span><span> </span><span style="color:#fa6e32;">let</span><span> xor_value</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32 </span><span style="color:#ed9366;">=</span><span> rng</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">gen</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> start_value</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32 </span><span style="color:#ed9366;">=</span><span> rng</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">gen</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> table_size </span><span style="color:#ed9366;">= </span><span>(N </span><span style="color:#ed9366;">+</span><span> slack_space) </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">u32</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#fa6e32;">let</span><span> table </span><span style="color:#ed9366;">=</span><span> input_values
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(|</span><span style="color:#ff8f40;">input_value</span><span>| (input_value</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">wrapping_add</span><span>(start_value) </span><span style="color:#ed9366;">^</span><span> xor_value) </span><span style="color:#ed9366;">%</span><span> table_size)</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#fa6e32;">if </span><span style="color:#f07171;">validate_no_duplicate_idx</span><span>(</span><span style="color:#ed9366;">&</span><span>table) {
</span><span> </span><span style="color:#fa6e32;">return</span><span> PhfResult {
</span><span> start_value</span><span style="color:#61676ccc;">,
</span><span> xor_value</span><span style="color:#61676ccc;">,
</span><span> table_size</span><span style="color:#61676ccc;">,
</span><span> table_indices</span><span style="color:#61676ccc;">:</span><span> table</span><span style="color:#61676ccc;">,
</span><span> }</span><span style="color:#61676ccc;">;
</span><span> }
</span><span> }
</span><span>}
</span></code></pre>
<p>Running it on our example above gave me the following output:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span>PhfResult {
</span><span> start_value</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">1803446167</span><span style="color:#61676ccc;">,
</span><span> xor_value</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">597238773</span><span style="color:#61676ccc;">,
</span><span> table_size</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">5</span><span style="color:#61676ccc;">,
</span><span> table</span><span style="color:#61676ccc;">: </span><span>[
</span><span> </span><span style="color:#ff8f40;">4</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ff8f40;">3</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ff8f40;">2</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ff8f40;">1</span><span style="color:#61676ccc;">,
</span><span> ]</span><span style="color:#61676ccc;">,
</span><span>}
</span></code></pre>
<p>Using these values, I can then hand-craft a better <code>Debug</code> impl, and validate that it works:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(u32)]
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Copy</span><span style="color:#61676ccc;">,</span><span> Clone)]
</span><span style="color:#fa6e32;">pub enum </span><span style="color:#399ee6;">Enum </span><span>{
</span><span> A </span><span style="color:#ed9366;">= </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ff8f40;">BB </span><span style="color:#ed9366;">= </span><span style="color:#ff8f40;">200</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ff8f40;">CCC</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ff8f40;">ABCD</span><span style="color:#61676ccc;">,
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl </span><span>std</span><span style="color:#ed9366;">::</span><span>fmt</span><span style="color:#ed9366;">::</span><span>Debug </span><span style="color:#fa6e32;">for </span><span style="color:#399ee6;">Enum </span><span>{
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">fmt</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">f</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut </span><span>std</span><span style="color:#ed9366;">::</span><span>fmt</span><span style="color:#ed9366;">::</span><span>Formatter<'</span><span style="color:#ed9366;">_</span><span>>) </span><span style="color:#61676ccc;">-> </span><span>std</span><span style="color:#ed9366;">::</span><span>fmt</span><span style="color:#ed9366;">::</span><span>Result {
</span><span> </span><span style="color:#fa6e32;">const </span><span style="color:#ff8f40;">TABLE</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span>[</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">str</span><span>] </span><span style="color:#ed9366;">= &</span><span>[</span><span style="color:#86b300;">""</span><span style="color:#61676ccc;">, </span><span style="color:#86b300;">"ABCD"</span><span style="color:#61676ccc;">, </span><span style="color:#86b300;">"CCC"</span><span style="color:#61676ccc;">, </span><span style="color:#86b300;">"BB"</span><span style="color:#61676ccc;">, </span><span style="color:#86b300;">"A"</span><span>]</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> discriminant </span><span style="color:#ed9366;">= *</span><span style="font-style:italic;color:#55b4d4;">self </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">u32</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> index </span><span style="color:#ed9366;">= </span><span>(discriminant</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">wrapping_add</span><span>(</span><span style="color:#ff8f40;">1803446167</span><span>) </span><span style="color:#ed9366;">^ </span><span style="color:#ff8f40;">597238773</span><span>) </span><span style="color:#ed9366;">% </span><span style="color:#ff8f40;">5</span><span style="color:#61676ccc;">;
</span><span> f</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">write_str</span><span>(</span><span style="color:#ff8f40;">TABLE</span><span>[index </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span>])
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">test</span><span>]
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">validate_debug_impl</span><span>() {
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(</span><span style="color:#f07171;">format!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{:?}</span><span style="color:#86b300;">"</span><span style="color:#61676ccc;">, </span><span>Enum</span><span style="color:#ed9366;">::</span><span>A)</span><span style="color:#61676ccc;">, </span><span style="color:#86b300;">"A"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(</span><span style="color:#f07171;">format!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{:?}</span><span style="color:#86b300;">"</span><span style="color:#61676ccc;">, </span><span>Enum</span><span style="color:#ed9366;">::</span><span style="color:#ff8f40;">BB</span><span>)</span><span style="color:#61676ccc;">, </span><span style="color:#86b300;">"BB"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(</span><span style="color:#f07171;">format!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{:?}</span><span style="color:#86b300;">"</span><span style="color:#61676ccc;">, </span><span>Enum</span><span style="color:#ed9366;">::</span><span style="color:#ff8f40;">CCC</span><span>)</span><span style="color:#61676ccc;">, </span><span style="color:#86b300;">"CCC"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(</span><span style="color:#f07171;">format!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{:?}</span><span style="color:#86b300;">"</span><span style="color:#61676ccc;">, </span><span>Enum</span><span style="color:#ed9366;">::</span><span style="color:#ff8f40;">ABCD</span><span>)</span><span style="color:#61676ccc;">, </span><span style="color:#86b300;">"ABCD"</span><span>)</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>But does it really compile to better code? Lets check again using the <a href="https://godbolt.org/z/v7j4M7oja">Compiler Explorer</a>:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span><example::Enum as core::fmt::Debug>::fmt:
</span><span> mov rax, rsi
</span><span> mov ecx, 1803446167
</span><span> add ecx, dword ptr [rdi]
</span><span> xor ecx, 597238773
</span><span> imul rdx, rcx, 1717986919
</span><span> shr rdx, 33
</span><span> lea edx, [rdx + 4*rdx]
</span><span> sub ecx, edx
</span><span> shl rcx, 4
</span><span> lea rdx, [rip + .L__unnamed_1]
</span><span> mov rsi, qword ptr [rcx + rdx]
</span><span> mov rdx, qword ptr [rcx + rdx + 8]
</span><span> mov rdi, rax
</span><span> jmp qword ptr [rip + _ZN4core3fmt9Formatter9write_str17hdb374abbd294d87eE@GOTPCREL]
</span><span>
</span><span>.L__unnamed_3:
</span><span>
</span><span>.L__unnamed_4:
</span><span> .ascii "ABCD"
</span><span>
</span><span>.L__unnamed_5:
</span><span> .zero 3,67
</span><span>
</span><span>.L__unnamed_6:
</span><span> .zero 2,66
</span><span>
</span><span>.L__unnamed_7:
</span><span> .byte 65
</span><span>
</span><span>.L__unnamed_1:
</span><span> .quad .L__unnamed_3
</span><span> .zero 8
</span><span> .quad .L__unnamed_4
</span><span> .asciz "\004\000\000\000\000\000\000"
</span><span> .quad .L__unnamed_5
</span><span> .asciz "\003\000\000\000\000\000\000"
</span><span> .quad .L__unnamed_6
</span><span> .asciz "\002\000\000\000\000\000\000"
</span><span> .quad .L__unnamed_7
</span><span> .asciz "\001\000\000\000\000\000\000"
</span><span>
</span></code></pre>
<p>Indeed, we again end up with code that has no branches, and indexes right into a static table.
<em>Success!</em></p>
<h1 id="making-it-scale"><a class="anchor-link" href="#making-it-scale" aria-label="Anchor link for: making-it-scale">#</a>
Making it scale</h1>
<p>The above was just a toy example to make a point. But can we use the same principle to solve the original issue?</p>
<p>Obviously, I wouldn’t use my horribly bad code to brute-force these constants,
and I also wouldn’t hand-craft the <code>Debug</code> impl either.
As in many other cases, there is a crate for that!</p>
<p>It is conveniently called <code>phf</code>, and there is also a companion crate called <code>phf_codegen</code>.
The combination of both crates indeed generates a <code>HashMap</code>-like type, that allows arbitrary lookups.</p>
<p>In our case however, we know statically that we are only looking things up that
are guaranteed to be in the map. So I decided to rather build on <code>phf_generator</code>
and <code>phf_shared</code>, which are the lower level building blocks.</p>
<p>Using <code>phf_generator::generate_hash</code> returns a <code>HashState</code> consisting of a random <code>key</code>, a couple of
<code>displacements</code>, and a <code>map</code> which encodes the order in which we have to output our original values,
just like the <code>table</code> in my example above.</p>
<p>We can then generate some code, and use <code>phf_shared</code> to derive the index.
Here is the hand-crafted code feeding values generated by <code>phf_generator</code> into <code>phf_shared</code>:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">impl </span><span>std</span><span style="color:#ed9366;">::</span><span>fmt</span><span style="color:#ed9366;">::</span><span>Debug </span><span style="color:#fa6e32;">for </span><span style="color:#399ee6;">Enum </span><span>{
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">fmt</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">f</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut </span><span>std</span><span style="color:#ed9366;">::</span><span>fmt</span><span style="color:#ed9366;">::</span><span>Formatter<'</span><span style="color:#ed9366;">_</span><span>>) </span><span style="color:#61676ccc;">-> </span><span>std</span><span style="color:#ed9366;">::</span><span>fmt</span><span style="color:#ed9366;">::</span><span>Result {
</span><span> </span><span style="color:#fa6e32;">const </span><span style="color:#ff8f40;">KEY</span><span style="color:#61676ccc;">: </span><span>phf_shared</span><span style="color:#ed9366;">::</span><span>HashKey </span><span style="color:#ed9366;">= </span><span style="color:#ff8f40;">12913932095322966823</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">const </span><span style="color:#ff8f40;">DISPLACEMENTS</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span>[(</span><span style="color:#fa6e32;">u32</span><span style="color:#61676ccc;">, </span><span style="color:#fa6e32;">u32</span><span>)] </span><span style="color:#ed9366;">= &</span><span>[(</span><span style="color:#ff8f40;">1</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span>)]</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">const </span><span style="color:#ff8f40;">TABLE</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span>[</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">str</span><span>] </span><span style="color:#ed9366;">= &</span><span>[</span><span style="color:#86b300;">"A"</span><span style="color:#61676ccc;">, </span><span style="color:#86b300;">"BB"</span><span style="color:#61676ccc;">, </span><span style="color:#86b300;">"ABCD"</span><span style="color:#61676ccc;">, </span><span style="color:#86b300;">"CCC"</span><span>]</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#fa6e32;">let</span><span> discriminant </span><span style="color:#ed9366;">= *</span><span style="font-style:italic;color:#55b4d4;">self </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">u32</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> hashes </span><span style="color:#ed9366;">= </span><span>phf_shared</span><span style="color:#ed9366;">::</span><span>hash(</span><span style="color:#ed9366;">&</span><span>discriminant</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span style="color:#ff8f40;">KEY</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> index </span><span style="color:#ed9366;">= </span><span>phf_shared</span><span style="color:#ed9366;">::</span><span>get_index(</span><span style="color:#ed9366;">&</span><span>hashes</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">DISPLACEMENTS</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">TABLE</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">len</span><span>())</span><span style="color:#61676ccc;">;
</span><span>
</span><span> f</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">write_str</span><span>(</span><span style="color:#ff8f40;">TABLE</span><span>[index </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span>])
</span><span> }
</span><span>}
</span></code></pre>
<p>I will spare you the raw assembly code. The load from <code>TABLE</code> is still the same as before,
otherwise we have a ton of inlined code related to using a proper hash function (Sip13/128)
which adds quite a lot of instructions.</p>
<hr />
<p>Now the big question that remains is to implement all this for the <code>minidump_common</code>
use-case and see if it is actually better in terms of bloat and also compile times.
But that is an exercise for another day.</p>
<p><strong>Update</strong>: Results are in, implementing this approach in <code>minidump_common</code> for the <code>Debug</code>
and <code>FromPrimitive</code> implementations yielded a ~100K win (about 1%) of the <code>minidump-stackwalk</code>
binary size. Not bad! Although I haven’t had a look at compile times at all.</p>
<p>Nonetheless, I’m not as convinced that a huge chunk of auto-generated code is the
best solution to all this, and believe that this should be handled in the compiler itself.
And as luck would have it, <a href="https://github.com/rust-lang/rust/issues/114106">a similar issue</a>
was just raised a couple of days before I wrote this post. Though that issue describes
a more general case where enums with a large number of variants exhibit poor codegen.
The case I presented here is a bit more special, as it involves manually defined
sparse discriminants with large gaps in their values.</p>
Finding and fixing runaway Android Battery Usage2023-06-18T00:00:00+00:002023-06-18T00:00:00+00:00
Unknown
https://swatinem.de/blog/fixing-android-battery-usage/<h1 id="tldr"><a class="anchor-link" href="#tldr" aria-label="Anchor link for: tldr">#</a>
TLDR</h1>
<p>My phone was draining unreasonable amounts of battery lately. Shelling into the
phone via <code>adb shell</code> after enabling USB debugging, and doing a simple <code>top</code> revealed that, of all things,
<code>com.sonymobile.launcher</code>, aka the Home Screen / Launcher was constantly running at 200% CPU (saturating 2 cores), and
using up to 10% of memory.</p>
<p>I switched to a different Launcher which so far looks to have fixed the battery drain, though I should observe it
for a longer time to be certain.</p>
<h1 id="the-whole-story"><a class="anchor-link" href="#the-whole-story" aria-label="Anchor link for: the-whole-story">#</a>
The whole story</h1>
<p>I got myself a new Android phone fairly recently. A Sony Xperia 10 IV. A phone with a headphone jack,
finger print reader, and most importantly for me by now: No fucking useless notch / face camera cutout. And a screen
with only a minimally curved corners, and no screen to the edge. Because I want to hold it firmly in my hands without
triggering any touch actions.</p>
<p>Anyway, so far I was very happy with it, a new phone with a brand new battery. The usage was also reasonable, it used to
drain about ~20% per day with my average usage pattern. Which means I needed to charge it about every 3 days if I go from
20% to 80%, neither draining too deep, nor topping it completely off.</p>
<p>After an update about a month ago however, I had troubles with battery usage. The phone would get into some kind
of broken state where it would drain up to 4x the usual amount. I had to charge it almost daily.
It was also pretty random. I could go half a day with normal battery drain, just to have it drain half the battery
when being completely idle over night. Sometimes restarting it would put battery usage back to normal. Sometimes
going on airplane mode would. But other times these things just didn’t make a difference.</p>
<p>I was completely clueless. The battery usage page in the Android system settings pointed to some unreasonable battery
draw from the mobile network. I went through a couple of online help pages suggesting to turn of always-online mobile
network when you are connected to WIFI, or to disable power hungry 5G completely. Still nothing.</p>
<hr />
<p>Well then, I’m a software engineer I thought. How about I debug this thing like an Android developer would.
There is a page in the Android developer docs that explains how to setup and use <code>batterystats</code> and <code>Battery Historian</code>.
I tried that, and after the phone drained 16% of battery in about 6 hours of being mostly idle, I collected a snapshot
and looked at the result. <em>Battery Historian</em> reported an idle power draw af 2.3%/hr, and a whooping 8.6%/hr when active.</p>
<p>And then there it was. For a total wall time of about 6 hours, the phone had over 7 hours of CPU time, almost 4 of that
in system time. There was something going wrong there indeed. The CPU and the <code>kernel only uptime</code> was constantly active
during that time. <code>Kernel Wakelocks</code> was also accounting for 4.5 hrs of that time. I thought, what the hell is the kernel
doing there? The kernel <code>wakesources</code> showed <code>rgb_ctrl_wq</code> as having a total duration of 4 hrs as well. What does that
even mean?</p>
<p>Going through the apps by CPU usage then showed the <code>com.sonymobile.launcher</code>, aka home screen as using 6.5 hrs of user
time, and 3 hours of system time. As a reminder, the whole experiment was running for a little less than 6 hours.
So was the home screen really using up all that CPU, and burning through the battery doing so?</p>
<p>Then it hit me: Android is in theory a linux just like every other linux? And I remembered from some time ago that it
has <code>top</code> installed. So I used <code>adb shell</code> to connect, and <code>top</code> indeed showed <code>com.sonymobile.launcher</code> constantly
using up 200% of CPU, aka two full cores. Not only that, but also 9% of the memory, almost 500M out of the 5.5G usable
on the phone. Well, no wonder the system is constantly killing my backgrounded apps I don’t want to be killed.</p>
<p>I really should have thought of that earlier. A good old <code>htop</code> (or Windows Task Manager) is the first thing to look at
if things are not performing as expected. The tools built into the Android System Settings are useless.</p>
<hr />
<p>I cannot completely blame the launcher in itself though. I did have a bunch of widgets on the home screen. Any of those
might have been misbehaving. For example I had a home screen widget for VLC which I use as media player. And for some
unknown problem, the system was revoking its permissions to read media files. Twice in a row. For no reason! Its quite
possible that because of this permissions problem, the VLC widget was causing the home screen to spin. Or maybe some
other widgets, who knows?</p>
<p>Either way, this <em>does</em> cast a very bad light on Sony, and Phone manufacturers in general. It is a widely believed myth
that phone vendors are intentionally slowing down their phones to incentivize people to buy new ones. Or is it reality
after all? I mean, there <em>have</em> been documented cases where this indeed happened.</p>
<p>A vendor-supplied system app burning through the battery shortly after I did a system update. After I have never
experienced this behavior before for the half year I own this device.
It sure is an awfully bad coincidence, and casts a very bad light on the vendor.</p>
<p>Since then I switched to an alternative home screen. I definitely need to run the phone for longer with it to verify the
long term behavior, but so far it looks like the battery drain problem has been fixed.</p>
The magic of scope guards2023-05-21T00:00:00+00:002023-05-21T00:00:00+00:00
Unknown
https://swatinem.de/blog/magic-scope-guards/<p>Scope guards in Rust are awesome!</p>
<p>Apart from explaining why, I also want to explore one specific side of them that I have never read about directly:
their effect on compile times.</p>
<h1 id="a-small-ra-ii-nt"><a class="anchor-link" href="#a-small-ra-ii-nt" aria-label="Anchor link for: a-small-ra-ii-nt">#</a>
A small RA(II)nt</h1>
<p>Let me start todays exploration with a bit of a rant. What I am talking about today, and what I refer to as “scope guards”
is often called the RAII pattern. That stands for “resource acquisition is initialization”, and I believe its a horrible
acronym. What are we initializing? Are we even acquiring anything? Well maybe when we are talking about locks yes, but
otherwise?</p>
<p>Apart from that, I am also very much against <em>computer-science-speak</em>. More specifically, hiding otherwise easy to
understand concepts behind complicated-sounding nomenclature. In computer-science-speak, this concept is called
“affine types”. What the hell does “affine” even mean? I am not a native speaker, but according to a dictionary, it is
also translated as “affin” in my native german. Well thanks for nothing. Another translation is “verwandt”, which
means “related” in english. Okay, that does not help either.</p>
<p>These “affine types” are also <em>related</em> (see what I did there?) to “linear types” that are being discussed in the Rust
community right now. Another word that on its own does not convey any meaning.</p>
<hr />
<p>Putting these concepts into words that anyone should be able to understand:
A scope guards, RAII or affine type is a type that has a <em>destructor</em> (a piece of code) which is called automatically
at the end of its scope. Hence I call them scope guards, as I believe that describes their use-case the best.
If I understand the whole linear type debate, the problem is that the scope can in theory be extended to <em>infinity</em> by
leaking the type, which is especially bad for types that require their destructor to run for soundness.</p>
<h1 id="scopes"><a class="anchor-link" href="#scopes" aria-label="Anchor link for: scopes">#</a>
Scopes</h1>
<p>This brings us to the classic example that exhibited one of these soundness problems: Scoped Threads. I do not want to
go into the details here, as I bet I would get half of that wrong.</p>
<p>What I <em>do</em> want to highlight is the usage of closures, specifically a <code>FnOnce</code> that has harder guarantees of enforcing
destructors to run than scope guards. The function that executes the closure will only return to its caller when all
the necessary cleanup is done. But being a (generic) function comes with two major downsides.</p>
<p>One is the function-coloring problem that makes it not play well with <code>async</code> code. And the other one is that it is
<em>generic</em> and will thus be monomorphized by the compiler. The compiler will compile the outer function multiple times,
in the worst case for every time it is called.</p>
<p>An interesting observation on the side is that you can always trivially move from a scope-guard version of code to a
closure version:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">takes_closure</span><span><O, F</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">FnOnce</span><span>() </span><span style="color:#61676ccc;">-></span><span> O>(</span><span style="color:#ff8f40;">f</span><span style="color:#61676ccc;">:</span><span> F) {
</span><span> </span><span style="color:#fa6e32;">let</span><span> _guard </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">create_guard</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">f</span><span>()
</span><span>}
</span></code></pre>
<p>Sometimes you <em>want</em> the compiler to duplicate and inline all the code. Sometimes inlining it will give better runtime
performance. But depending on how large the code is, <em>outlining</em> might be the better idea. While outlined code might
introduce more jumps and another stack frame, but depending on how hot the actual code is, it might be better for the
instruction cache.</p>
<h1 id="compile-times"><a class="anchor-link" href="#compile-times" aria-label="Anchor link for: compile-times">#</a>
Compile times</h1>
<p>But today I want to specifically focus on compile times.</p>
<p>For this I first created a chunk of large and slow to compile code:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#f07171;">macro_rules! </span><span style="color:#399ee6;">blow_up </span><span>{
</span><span> (</span><span style="color:#ff8f40;">$a</span><span style="color:#61676ccc;">:</span><span style="color:#fa6e32;">ident</span><span>) </span><span style="color:#ed9366;">=> </span><span>{
</span><span> </span><span style="color:#f07171;">println!</span><span>(</span><span style="color:#86b300;">"hello </span><span style="color:#ff8f40;">{}</span><span style="color:#86b300;">!"</span><span style="color:#61676ccc;">, </span><span style="color:#f07171;">stringify!</span><span>($a))</span><span style="color:#61676ccc;">;
</span><span> }</span><span style="color:#61676ccc;">;
</span><span>
</span><span> (</span><span style="color:#ff8f40;">$a</span><span style="color:#61676ccc;">:</span><span style="color:#fa6e32;">ident </span><span style="color:#ed9366;">$</span><span>(</span><span style="color:#ff8f40;">$rest</span><span style="color:#61676ccc;">:</span><span style="color:#fa6e32;">tt</span><span>)</span><span style="color:#ed9366;">+</span><span>) </span><span style="color:#ed9366;">=> </span><span>{
</span><span> </span><span style="color:#f07171;">blow_up!</span><span>($a)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">blow_up!</span><span>(</span><span style="color:#ed9366;">$</span><span>($rest)</span><span style="color:#ed9366;">+</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">blow_up!</span><span>(</span><span style="color:#ed9366;">$</span><span>($rest)</span><span style="color:#ed9366;">+</span><span>)</span><span style="color:#61676ccc;">;
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#f07171;">macro_rules! </span><span style="color:#399ee6;">make_slow </span><span>{
</span><span> () </span><span style="color:#ed9366;">=> </span><span>{
</span><span> </span><span style="color:#f07171;">blow_up!</span><span>(
</span><span> a0 b0 c0 d0 e0 f0 g0 h0 i0 j0
</span><span> )</span><span style="color:#61676ccc;">;
</span><span> }
</span><span>}
</span></code></pre>
<p>This code intentionally generates an exponential number of <code>println!</code> statements to make sure it is slow to compile,
and compiles to a ton of code, so we have something to measure.</p>
<p>Going with the closure-based code first, we want to put this code both before and after our actual closure call, like this:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">takes_closure</span><span><O, F</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">FnOnce</span><span>() </span><span style="color:#61676ccc;">-></span><span> O>(</span><span style="color:#ff8f40;">f</span><span style="color:#61676ccc;">:</span><span> F) </span><span style="color:#61676ccc;">-></span><span> O {
</span><span> </span><span style="color:#f07171;">make_slow!</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> o </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">f</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">make_slow!</span><span>()</span><span style="color:#61676ccc;">;
</span><span> o
</span><span>}
</span></code></pre>
<p>And in the end we will invoke the closure with a couple of times with different times to be extra sure the compiler will
compile it multiple times:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">main</span><span>() {
</span><span> </span><span style="color:#fa6e32;">let</span><span> a </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">takes_closure</span><span>(|| </span><span style="color:#ff8f40;">1</span><span style="color:#fa6e32;">u8</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">print!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{a}</span><span style="color:#86b300;">"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> a </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">takes_closure</span><span>(|| </span><span style="color:#ff8f40;">1</span><span style="color:#fa6e32;">u16</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">print!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{a}</span><span style="color:#86b300;">"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> a </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">takes_closure</span><span>(|| </span><span style="color:#ff8f40;">1</span><span style="color:#fa6e32;">u32</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">print!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{a}</span><span style="color:#86b300;">"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> a </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">takes_closure</span><span>(|| </span><span style="color:#ff8f40;">1</span><span style="color:#fa6e32;">u64</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">print!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{a}</span><span style="color:#86b300;">"</span><span>)</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#f07171;">println!</span><span>()</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>On my system, compiling this code in debug mode with <code>-Z time-passes</code> takes a bit over 2 seconds, and highlights a
couple of slow parts of compilation:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>time: 0.368; rss: 55MB -> 91MB ( +36MB) MIR_borrow_checking
</span><span>time: 1.108; rss: 122MB -> 46MB ( -77MB) LLVM_passes
</span><span>time: 1.276; rss: 65MB -> 46MB ( -19MB) link
</span><span>time: 2.383; rss: 10MB -> 39MB ( +29MB) total
</span></code></pre>
<p>Doing a <code>--release</code> build increases the timing a little, obviously:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>time: 3.427; rss: 118MB -> 46MB ( -72MB) LLVM_passes
</span><span>time: 3.631; rss: 54MB -> 46MB ( -8MB) link
</span><span>time: 4.628; rss: 10MB -> 41MB ( +31MB) total
</span></code></pre>
<p>Looking at the <code>cargo llvm-lines</code> output reveals that we have 4 copies of the same function, as expected:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span> Lines Copies Function name
</span><span> ----- ------ -------------
</span><span> 98640 25 (TOTAL)
</span><span> 98292 (99.6%, 99.6%) 4 (16.0%, 16.0%) guards_closures::takes_closure
</span></code></pre>
<p>LLVM is rightly slow, as it has a ton to compile. Can we do better on that front, by moving all that code to the
scope guard pattern, at the same time making it compatible with async code?</p>
<p>Lets see. First up, we need our guard type:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">Guard</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">impl </span><span style="color:#399ee6;">Guard </span><span>{
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">new</span><span>() </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">Self </span><span>{
</span><span> </span><span style="color:#f07171;">make_slow!</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">Self
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl </span><span>Drop </span><span style="color:#fa6e32;">for </span><span style="color:#399ee6;">Guard </span><span>{
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">drop</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut </span><span style="color:#ff8f40;">self</span><span>) {
</span><span> </span><span style="color:#f07171;">make_slow!</span><span>()</span><span style="color:#61676ccc;">;
</span><span> }
</span><span>}
</span></code></pre>
<p>There is no generic code here anymore, which is exactly what we wanted to achieve. We can then manually create some
scopes, create the guard type and have its destructor automatically called at the end:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span>
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">main</span><span>() {
</span><span> {
</span><span> </span><span style="color:#fa6e32;">let</span><span> _guard </span><span style="color:#ed9366;">= </span><span>Guard</span><span style="color:#ed9366;">::</span><span>new()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">print!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{}</span><span style="color:#86b300;">"</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">1</span><span style="color:#fa6e32;">u8</span><span>)</span><span style="color:#61676ccc;">;
</span><span> }
</span><span> {
</span><span> </span><span style="color:#fa6e32;">let</span><span> _guard </span><span style="color:#ed9366;">= </span><span>Guard</span><span style="color:#ed9366;">::</span><span>new()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">print!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{}</span><span style="color:#86b300;">"</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">1</span><span style="color:#fa6e32;">u16</span><span>)</span><span style="color:#61676ccc;">;
</span><span> }
</span><span> {
</span><span> </span><span style="color:#fa6e32;">let</span><span> _guard </span><span style="color:#ed9366;">= </span><span>Guard</span><span style="color:#ed9366;">::</span><span>new()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">print!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{}</span><span style="color:#86b300;">"</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">1</span><span style="color:#fa6e32;">u32</span><span>)</span><span style="color:#61676ccc;">;
</span><span> }
</span><span> {
</span><span> </span><span style="color:#fa6e32;">let</span><span> _guard </span><span style="color:#ed9366;">= </span><span>Guard</span><span style="color:#ed9366;">::</span><span>new()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">print!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{}</span><span style="color:#86b300;">"</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">1</span><span style="color:#fa6e32;">u64</span><span>)</span><span style="color:#61676ccc;">;
</span><span> }
</span><span>
</span><span> </span><span style="color:#f07171;">println!</span><span>()</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>How does it do in terms of compile times and <code>cargo llvm-lines</code> now?</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>time: 0.288; rss: 50MB -> 82MB ( +32MB) MIR_borrow_checking
</span><span>time: 0.072; rss: 95MB -> 43MB ( -52MB) LLVM_passes
</span><span>time: 0.246; rss: 62MB -> 44MB ( -18MB) link
</span><span>time: 1.069; rss: 10MB -> 37MB ( +28MB) total
</span></code></pre>
<p>The MIR borrow checking time might as well just be some noise, but the LLVM time is <em>a lot</em> faster.</p>
<p>Here is the same for a <code>--release</code> build:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>time: 1.066; rss: 93MB -> 45MB ( -48MB) LLVM_passes
</span><span>time: 1.272; rss: 44MB -> 45MB ( +1MB) link
</span><span>time: 2.028; rss: 10MB -> 40MB ( +30MB) total
</span></code></pre>
<p>That is a bit more than twice as fast to compile than the closure-based version. Lets check the <code>llvm-lines</code> output:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>Lines Copies Function name
</span><span> ----- ------ -------------
</span><span> 24917 20 (TOTAL)
</span><span> 12280 (49.3%, 49.3%) 1 (5.0%, 5.0%) <guards_guards::Guard as core::ops::drop::Drop>::drop
</span><span> 12277 (49.3%, 98.6%) 1 (5.0%, 10.0%) guards_guards::Guard::new
</span></code></pre>
<p>As expected, we only have a single copy of the expensive constructor and destructor, and as expected about 4 times less
code to compile than before.</p>
<p>How does this affect the final binary size?</p>
<p>Looking at the Windows <code>.exe</code> (without the <code>.pdb</code>) I end up with the following matrix:</p>
<table><thead><tr><th>Pattern</th><th>Debug</th><th>Release</th></tr></thead><tbody>
<tr><td>closures</td><td>630K</td><td>247K</td></tr>
<tr><td>guards</td><td>274K</td><td>200K</td></tr>
</tbody></table>
<p>That is a big difference indeed, both in compile times, and in the size of the compiled executable.</p>
<hr />
<p>I will end todays exploration on that note. In the real world, I have an example of a way-too-generic crate that I
suspect to massively slow down compile times, and I would like to explore moving it to a scope-guard code style.</p>
<p>This blog post explores that idea in a “lab setting”. I do not yet know if the same improvements could be had with some
real world code as well. Not to mention that actually migrating the codebase might be a huge effort on its own.</p>
A locking war story2023-05-16T00:00:00+00:002023-05-16T00:00:00+00:00
Unknown
https://swatinem.de/blog/locking-war-story/<p>An alternative clickbait title for this could be: “<code>Read + Seek</code> considered dangerous”.</p>
<p>This is a very interesting story, and one of the nice side effects of working on open source software is that I can
share all of the details of it publicly, along with a link to the <a href="https://github.com/getsentry/symbolic/pull/787">PR</a>
that implemented the fix.</p>
<h1 id="tldr"><a class="anchor-link" href="#tldr" aria-label="Anchor link for: tldr">#</a>
TLDR</h1>
<p>As the alternative clickbait title suggests, the core of the problem is that both <code>Read</code> and <code>Seek</code>, and the combination
of the two need a <code>&mut</code> reference to the reader to do any operations. So <em>read-only</em> access still requires an exclusive
reference. (Aside: This might be a good example why people have advocated to call <code>&mut</code> <em>exclusive</em> access.)</p>
<p>In my example, I was dealing with a <code>zip::ZipArchive</code>, which wraps a <code>Read + Seek</code>, and needs <code>&mut self</code> access to read
a file from the archive. So sharing this archive across multiple tasks that want to read files from it leads to lock
contention as only a single task can read files from the archive at a time.</p>
<h1 id="background"><a class="anchor-link" href="#background" aria-label="Anchor link for: background">#</a>
Background</h1>
<p>Surprisingly, this story starts with JavaScript. Or more precisely, with processing JavaScript stack traces using SourceMaps.
My team recently migrated all of the SourceMap processing done at Sentry from Python code that is supported by some Rust
binding, to a pure Rust service that is still driven by Python.</p>
<p>JavaScript customers upload those SourceMaps, along with minified JS files and other files as a special <code>zip</code> file that
we call a <code>SourceBundle</code>. This archive also contains a manifest, which has a bit of metadata for each file. Things like
the reference to the corresponding SourceMap for files that do not have an embedded <code>sourceMappingURL</code> reference. And
also most importantly, this metadata includes a <code>url</code> for that file, because SourceMap processing sadly still relies on
very brittle URLs. I touched on those problems in my previous post around <a href="https://swatinem.de/blog/file-identity/">file identity</a>,
so I won’t go into more details.</p>
<h1 id="being-too-smart-for-our-own-good"><a class="anchor-link" href="#being-too-smart-for-our-own-good" aria-label="Anchor link for: being-too-smart-for-our-own-good">#</a>
Being too Smart for our own Good</h1>
<p>The primary driver of moving more parts of the processing to Rust was to be able to better reuse repeated computations.
Our SourceMap processing infers function / scope names by parsing the minified source, and it builds a fast lookup
index that is meant to be reused. Although the python code never did that. The stateful Rust service however has a variety
of in-memory and on-disk caches to avoid expensive computations for each event that needs to be processed.</p>
<p>One of the more expensive computations that I wanted to avoid was opening up the zip archive and parsing the manifest
contained within. We then ended up with a parsed manifest / index, and an open <code>zip::ZipArchive</code>, more precisely a
<code>zip::ZipArchive<std::io::Cursor<&'data [u8]>></code>. So we already have a memory-mapped <code>&[u8]</code> that gives us trivial
random access. But we need to wrap it in a <code>Cursor</code> to make it into a <code>Read + Seek</code>. As the <code>ZipArchive</code> needs <code>&mut</code>
access, we also had to wrap it in a <code>Mutex</code>. And this <code>Mutex</code> was exactly the thing that was contended in this case.</p>
<p>Trying to avoid repeatedly opening and parsing the manifest by keeping it in-memory and sharing it across computations
combined with that <code>Mutex</code> meant that all the events that needed access to a specific zip file were all contending on
that mutex. Feeding more events to a single server even made things worse, and caused trouble for the whole pipeline.</p>
<p>The problem with <code>Read + Seek</code> is that it indeed needs to maintain some internal mutable state, namely the
cursor position. If it were not synchronized using a <code>&mut</code> and a <code>Mutex</code>, it would mean that concurrent readers could
potentially read garbage, or worse. So thank you Rust for the strict guarantees that avoided that :-)</p>
<p>The solution in the end was to give each reader its own (still <code>Mutex</code>-locked) copy of the <code>ZipArchive</code>. According to
its docs, it is a cheap to clone if its generic reader is, which is the case for <code>Cursor</code>. Rolling out this fix indeed
fixed the contention problem for us, and our production systems are now much happier. Although they are still doing way
too much unzipping, but later on that.</p>
<h1 id="can-we-do-better"><a class="anchor-link" href="#can-we-do-better" aria-label="Anchor link for: can-we-do-better">#</a>
Can we do better?</h1>
<p>The mutable state fundamentally comes from usage of <code>Read</code> which implicitly updates a cursor position, and <code>Seek</code> which
does so explicitly. And this is a reasonable choice for <code>ZipArchive</code>, as I believe if is most frequently used in
combination with a <code>std::io::BufReader<std::fs::File></code>. However, I believe there are a few crates out there that
abstract over the reader as well. For example, both <code>object::ReadRef</code> / <code>object::ReadCache</code> and <code>scroll::Pread</code> work
with shared references, and require an explicit <code>offset</code> for each of the read methods, instead of maintaining the offset
internally via <code>Seek</code>.</p>
<p>In our case we have a memory-mapped <code>&[u8]</code>, and reading from that is a trivial memory access. I cannot overstate how
much of a productivity and sanity boost <code>mmap</code> is. Sure, one might argue that <code>Read</code> gives more explicit control, and
it is very obvious and explicit when a syscall and context switch to the kernel happens, whereas with <code>mmap</code> that is
done implicitly via page faults. Maybe in some very extreme situations, deep control over this might be beneficial, but
in the general case, again, I cannot overstate how awesome <code>mmap</code> is.</p>
<h1 id="a-rant-on-zip"><a class="anchor-link" href="#a-rant-on-zip" aria-label="Anchor link for: a-rant-on-zip">#</a>
A rant on zip</h1>
<p>While the lock contention issue, and the <em>read-only, but not really</em> nature of <code>ZipArchive</code> was a pain, but one that
was easily fixable, there is another issue looming here. Why are we using zip archives in the first place? The fact that
lock contention became a problem highlights that we are using these archives a lot. And while we have various caches all
over the place, one thing that is not cached right now is access to the files within that zip archive.</p>
<p>So we are really using the same files from within the same archives all over again. And we are decompressing them over
and over again. I haven’t measured this in production yet, but running this through a local stress test highlights the
fact that our processing is now mainly dominated by decompression.</p>
<p>Zip archives are great and they serve a specific purpose, but their main purpose is long-term <em>archival</em> as the name
suggests, not frequent random access. There might be a possibility to still use zip archives, but using a compression
algorithm that is faster for decompression, but that is a story for a different time. Along with the discussion to
maybe use something else entirely.</p>
<p>All in all, I am fairly happy with the fact that decompression seems to now dominate the performance, as it means that
the rest of the architecture at least is doing a really great job at being high-performance :-)</p>
Files need Identity2023-03-29T00:00:00+00:002023-03-29T00:00:00+00:00
Unknown
https://swatinem.de/blog/file-identity/<p>Interestingly, the same theme has come up multiple times recently within Sentry.
I myself recently wrote a <a href="https://github.com/getsentry/rfcs/pull/81">Sentry RFC</a> about SourceMap <code>DebugId</code>s.
And at the same time, I was supporting and advising other teams working on Java Source Context, and Flutter Obfuscation.</p>
<p>All these different initiatives have the following in common: You have multiple build artifacts for a single application
build. These artifacts together form a tight unit.</p>
<ul>
<li>A minified JS file and its corresponding SourceMap allow you to resolve the original source location.</li>
<li>A Java App and its corresponding SourceBundle allow you to apply Source Context.</li>
<li>A Flutter App and its corresponding Obfuscation Map allow you to de-obfuscate identifiers.</li>
</ul>
<p>Whereas <code>SourceBundle</code>s are a Sentry invention, the other two use-cases are being implemented by external tools.
And they lack a <em>strong</em> association of the different artifacts / assets that form one final build output.</p>
<p>A <code>SourceMap</code> is just some JSON, so is the Flutter obfuscation mapping, though a little different which makes it harder
to deal with, more in a minute.</p>
<p>We need the obfuscation mapping to be able to de-obfuscate, so far so good. But with a few different versions of apps
being installed and used by end users, how do we know <em>which</em> obfuscation mapping we need?</p>
<p>That is where file <em>identity</em> comes in. Each group of tightly coupled build artifacts needs to be uniquely identified
<em>somehow</em>, so we are able to find the matching file we need, no matter if that is a SourceMap, a SourceBundle, or an
obfuscation mapping.</p>
<p>To achieve that, each artifact needs to have a <em>unique identifier</em>. It is also very beneficial if that unique identifier
is embedded in that file, so it becomes <em>self identifying</em>.</p>
<p>This is the problem with the Flutter obfuscation mapping. It is a JSON file, but with an array at its root. There is no
way to extend that file with another field at the root that includes this identifier. Well, too bad I guess :-(</p>
<p>Lets say we have not only two tightly coupled build artifacts but more. To stick with the Flutter example, it might be
the case that a Flutter-web build outputs both a minified JS file, a corresponding SourceMap, <em>and</em> an obfuscation mapping.</p>
<p>Two of those files are just JSON. Our proposal for SourceMap <code>DebugId</code>s I linked above proposes to add a new field to
the SourceMap with its unique identifier. It is pretty much impossible to extend the obfuscation mapping however.
But lets ignore that problem for now. In the end we have two JSON files. How do we tell them apart then?</p>
<p>Each file needs to have some form of marker in it that tells us <em>what kind</em> of file it is. For JSON files, the JSON Schema
<code>"$schema"</code> field naturally presents itself. Authoring a full JSON Schema might not be everyones cup of tea, and that
is not the point here. The point is that this unique <code>"$schema"</code> field tells us <em>what kind</em> of file we are looking at.
By having such a field, the file becomes <em>self describing</em>.</p>
<p>A SourceMap just happens to be a SourceMap if it has a <code>"version": 3</code> field, and a <code>"mappings"</code> field. It might be
very unlikely, but any random JSON file could potentially have these fields and then be wrongly interpreted as a SourceMap.</p>
<p>To summarize this section, every file should be <em>self identifying</em>, by embedding some kind of unique identifier, and
it should also be <em>self describing</em> by embedding some kind of marker that describes the kind (or format) of the file.</p>
<p>With these two pieces of information, we can upload any file to any dumb storage service and look it up.</p>
<hr />
<p>But how do we know which file to look up? Let us come back to the example from before. Lets assume we have an obfuscated
Flutter app running on some customer device, and it produces an obfuscated stack trace that is uploaded to Sentry or
any other service. How will Sentry know which obfuscation mapping to use?</p>
<p>To be able to do so, the report that has the obfuscated stack trace also has to provide the unique identifier of the
obfuscation mapping. We can then look up the mapping using that unique identifier and correctly deobfuscate the stack trace.</p>
<p>So we need a way to get access to that unique identifier <em>at runtime</em>. Surprisingly, this is the most complex part of
our Flutter example, as well as the most controversial thing about our SourceMap proposal. Ideally, the <em>Platform</em>
(whatever it is) offers a programmatic API that provides this unique identifier.</p>
<p>It is totally possible to have a different unique identifier for each accompanying artifact, for example a different
identifier for an associated SourceMap, and obfuscation mapping. Though I strongly advise to have one unique identifier
that is shared among these tightly coupled artifacts.</p>
<p>To summarize, we have some <em>self identifying</em> and <em>self describing</em> artifacts that we will just stash away on some dumb
storage service, and we need a way <em>at runtime</em> to query that unique identifier.</p>
<h1 id="native-inspiration"><a class="anchor-link" href="#native-inspiration" aria-label="Anchor link for: native-inspiration">#</a>
Native Inspiration</h1>
<p>The native ecosystem has most of this figured out to various degrees, lets take a look.</p>
<p>To start this off, binary file formats are usually <em>self describing</em> by starting off with a magic-byte sequence that
identifies the file format. Our native platforms each have their own executable formats for example.</p>
<p>On <strong>macOS</strong>, we have Mach-O files which pretty consistently have a unique identifier called <code>LC_UUID</code> (for <em>load command</em>).
The executables are also commonly split into a main executable, and an associated debug file called <code>dSYM</code>. Both share
the same unique identifier. However, both have the Mach-O format.</p>
<p>As this first example shows, the file <em>format</em> on its own is not enough to identify the file <em>kind</em> / <em>purpose</em>. However
by looking at the presence of various sections in that file, one can quite confidently say if it is an executable, or
the corresponding debug file.</p>
<p><strong>Linux</strong> has <code>ELF</code> (executable and linker format) files. These files can have a unique identifier called <code>NT_GNU_BUILD_ID</code>
(<code>NT</code> for <em>note</em>), though it is sadly frequently missing. The executables are not split by default as they are produced
by build tools, but developers frequently split them apart manually. Again, the two files have the same file format,
but it is possible to tell their <em>purpose</em> apart by looking at the various sections. When splitting those files apart,
both retain the same unique identifier.</p>
<p>The situation on <strong>Windows</strong> is slightly different. An executable in <code>PE</code> (portable executable) format has its own
identifying which is the combination of the <code>Timestamp</code> and <code>SizeOfImage</code> header values. This can hardly be called
<em>unique</em> though. This file can then reference a <code>PDB</code> (program database) file via a <code>DebugDirectoryEntry</code> which contains
the unique identifier of the <code>PDB</code> file. One thing here that tools frequently get wrong is that one executable can have
multiple <code>DebugDirectoryEntry</code> entries, referencing more than one debug files. I wrote about that previously in a post
titled <a href="https://swatinem.de/blog/format-ossification/">Format Ossification</a>, because most tools got so used to only ever
seeing zero or one <code>DebugDirectoryEntry</code>s, the fact that there can be in fact more than one got completely lost.</p>
<p>In summary, the native formats are pretty good at <em>self identifying</em>.</p>
<h2 id="symbol-lookup"><a class="anchor-link" href="#symbol-lookup" aria-label="Anchor link for: symbol-lookup">#</a>
Symbol Lookup</h2>
<p>One thing I mentioned before is being able to easily find and download these debug files from any dumb storage service.
The native ecosystem offers mainly two possibilities here.</p>
<p>In the <strong>Linux</strong> ecosystem, we have <code>debuginfod</code> which defines a simple <a href="https://www.mankier.com/8/debuginfod#Webapi">lookup scheme</a>.
One can simply download the <code>/buildid/{BUILDID}/debuginfo</code> file and get the debuginfo for a uniquely identified executable.
There is public <code>debuginfod</code> servers for every major Linux distribution as well.</p>
<p>Then there is the <code>symstore</code> Server and accompanying <code>SSQP</code> (simple symbol query protocol), which is primarily used for
the <strong>Windows</strong> ecosystem, but does support other ecosystems as well.
The <a href="https://github.com/dotnet/symstore/blob/main/docs/specs/SSQP_Key_Conventions.md#key-formats">lookup scheme</a> has
support for a ton of formats, including lookup for <code>ELF</code> and Mach-O files using their corresponding unique identifiers.</p>
<p>One problem with <code>symstore</code> though becomes obvious looking at the scheme for <code>PE</code> files: <code><filename>/<Timestamp><SizeOfImage>/<filename></code>
As I mentioned, the <code>Timestamp</code> and <code>SizeOfImage</code> combination might not be unique enough. So just combine it with the
filename, problem solved, right? Well this creates new problems all on its own.
For example Electron hosts its own <a href="https://www.electronjs.org/docs/latest/development/setting-up-symbol-server/">symbol server</a>.
But what happens if you ship an electron app and rename the main <code>electron.exe</code> file? Well too bad, you can’t find
that symbol anymore. This is indeed a real pain for Sentry customers.</p>
<p>For <strong>macOS</strong>, the situation is pure sadness. Apple does not host any public symbol server, and the licensing around
these things is also unclear. Sentry goes through great pain to maintain its own internal symbol server for Apple symbols,
but it is a frequent source of problems, with a brittle pipeline for scraping the symbols, and frequent problems with
symbols missing.</p>
<h2 id="programmatic-api"><a class="anchor-link" href="#programmatic-api" aria-label="Anchor link for: programmatic-api">#</a>
Programmatic API</h2>
<p>This is another source of sadness. None of the native platforms have builtin platform support to get at these unique
identifiers. Also getting at the list of all the loaded libraries is a huge pain on some platforms.</p>
<p>For each platforms, getting at the unique identifiers involves manually reading the platform native file format headers
and chasing references around, which can be unsafe as it involves a lot of pointer arithmetic.
The problem with these file formats is also that they are extremely badly documented. I wonder how it is possible that
they are so well understood, although the <code>DebugDirectoryEntry</code> situation makes me doubtful.</p>
<p>The formats and the structures you have to read are not documented <em>publicly on the internet</em>. They are defined in some
platform specific headers that are primarily only available on that platform. For example on Windows, the <code>PE</code> definitions
are part of the Windows SDK. For macOS, I believe the Mach-O headers are provided by Xcode.
One critical bit to get the unique identifier of a <code>PDB</code> file is also famously missing from the Windows SDK headers.
The <code>CodeView</code> record is not defined <em>anywhere</em>. All the tools just copy-paste the definition into their own code from
<em>somewhere</em>.
The <code>ELF</code> format however is reasonably well specified and documented in various man pages, though those are far from
easily usable.</p>
<p>And even if we have a couple of header definitions that we won’t find on the public web, it is C headers. Can someone
tell me again how many bits a C <code>long</code> has? No? Thought so.</p>
<p>Well, I’m not going on a ranting spree about how un-portable C is. The point I’m trying to make is that reading the
unique identifiers of files at runtime is a huge pain and involves unsafe code. It would be so much nicer if we had
built-in platform APIs that let us easily enumerate all the loaded libraries, and easily get their unique identifiers.</p>
<h1 id="in-summary"><a class="anchor-link" href="#in-summary" aria-label="Anchor link for: in-summary">#</a>
In summary</h1>
<p>And with that I am at the end of todays post. What I would love to have from a tool developers perspective is:</p>
<ul>
<li><em>Self identifying</em> files with an embedded unique identifier.</li>
<li><em>Self describing</em> files that describe their type / purpose.</li>
<li><em>Programmatic API</em> to access a files unique identifier at runtime, and also enumerate all the files currently loaded.</li>
</ul>
<p>The <a href="https://github.com/getsentry/rfcs/pull/81">SourceMap RFC</a> I mentioned in the beginning tries to solve two of these
problems, and I would love to get feedback on it.</p>
<p>One thing from the RFC that still needs discussion is <em>how</em> to generate these unique identifiers. Our current draft
implementation generates a new random UUID each time, which I argue is a bad idea.</p>
<p>I would like these identifiers, and the files themselves to be bit-for-bit deterministic / reproducible given the same
inputs.
The <a href="https://github.com/dotnet/runtime/blob/main/docs/design/specs/PE-COFF.md">Portable PDB</a> specification mentions
explicitly how to create the unique identifier:</p>
<blockquote>
<p>the checksum is calculated by hashing the entire content of the PDB file with the PDB ID set to 0 (20 zeroed bytes).</p>
</blockquote>
<p>So there is precedent in other ecosystems for reproducible and deterministic unique identifiers.</p>
The size of Rust Futures2023-01-20T00:00:00+00:002023-01-20T00:00:00+00:00
Unknown
https://swatinem.de/blog/future-size/<p>I have recently discovered that Rust Futures, or rather, <code>async fn</code> calls can
lead to surprising performance problems if they are nested too deeply.</p>
<p>I learned that the hard way by triggering a stack overflow in a
<a href="https://github.com/getsentry/symbolicator/pull/979">PR to <code>symbolicator</code></a>.
I then tracked that down to unreasonably huge (as measured by <code>mem::size_of_val</code>)
futures. The problem was also being exacerbated by the deeply nested async fn
calls in the <code>moka</code> crate that I have <a href="https://github.com/moka-rs/moka/issues/212">reported here</a>.</p>
<p>It is not my intention to bash that specific crate here, as I absolutely love
its intuitive APIs, and would love to use it even more in the future.
However the crate does make some wrong assumptions about how async Rust code
works that I am sure not a lot of people are aware of, and which can cause
problems.</p>
<p>Apart from highlighting the source of the problem in great depth, I also want to
propose some workarounds for this specific issue.</p>
<p>The problem I will highlight is also only present in todays Rust (nightly 1.68).
It is perfectly possible that the compiler will optimize these things in the future.</p>
<hr />
<p>So, lets dive in, and consider this piece of code here:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub</span><span> async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">test</span><span>() {
</span><span> </span><span style="color:#fa6e32;">let </span><span style="color:#ed9366;">_ = </span><span style="color:#f07171;">a</span><span>(())</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">pub</span><span> async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">a</span><span><T>(</span><span style="color:#ff8f40;">t</span><span style="color:#61676ccc;">:</span><span> T) </span><span style="color:#61676ccc;">-></span><span> T {
</span><span> </span><span style="color:#f07171;">b</span><span>(t)</span><span style="color:#ed9366;">.</span><span>await
</span><span>}
</span><span>async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">b</span><span><T>(</span><span style="color:#ff8f40;">t</span><span style="color:#61676ccc;">:</span><span> T) </span><span style="color:#61676ccc;">-></span><span> T {
</span><span> </span><span style="color:#f07171;">c</span><span>(t)</span><span style="color:#ed9366;">.</span><span>await
</span><span>}
</span><span>async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">c</span><span><T>(</span><span style="color:#ff8f40;">t</span><span style="color:#61676ccc;">:</span><span> T) </span><span style="color:#61676ccc;">-></span><span> T {
</span><span> t
</span><span>}
</span></code></pre>
<p>We have an outer future, which threads a generic argument through a nested call
chain.</p>
<p>As the topic of this blog post is <em>size</em>, lets see how large this future is.
Instead of calling <code>mem::size_of_val</code> at runtime, I will use the nightly-only
<code>-Zprint-type-sizes</code> flag and show you an abbreviated output of that:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>type: `[async fn test}`: 4 bytes
</span><span> discriminant: 1 bytes
</span><span> variant `Unresumed`: 0 bytes
</span><span> variant `Suspend0`: 3 bytes
</span><span> field `.__awaitee`: 3 bytes, offset: 0 bytes
</span><span>type: `[async fn a]`: 3 bytes
</span><span> discriminant: 1 bytes
</span><span> variant `Unresumed`: 0 bytes
</span><span> field `.t`: 0 bytes, offset: 0 bytes
</span><span> variant `Suspend0`: 2 bytes
</span><span> field `.t`: 0 bytes, offset: 0 bytes
</span><span> field `.__awaitee`: 2 bytes
</span><span>type: `[async fn b]`: 2 bytes
</span><span> discriminant: 1 bytes
</span><span> variant `Unresumed`: 0 bytes
</span><span> field `.t`: 0 bytes, offset: 0 bytes
</span><span> variant `Suspend0`: 1 bytes
</span><span> field `.t`: 0 bytes, offset: 0 bytes
</span><span> field `.__awaitee`: 1 bytes
</span><span>type: `[async fn c]`: 1 bytes
</span><span> discriminant: 1 bytes
</span><span> variant `Unresumed`: 0 bytes
</span><span> field `.t`: 0 bytes, offset: 0 bytes
</span></code></pre>
<p>In this first example, our <code>T</code> is zero-sized. But still our outer future has a
<code>4 bytes</code> size. The reason is that each future is internally a state machine
enum with a discriminant. That discriminant in our case is <code>1 bytes</code> each. Apart from
that, all our types have an alignment of <code>1 bytes</code> which is included in the normal
output of <code>-Zprint-type-sizes</code> but I have removed that for brevity.</p>
<p>So far so good, what happens when we put a larger <code>T</code> there? How about <code>[0u8, 1024]</code>?</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>type: `[async fn test]`: 3076 bytes
</span><span> variant `Suspend0`: 3075 bytes
</span><span> field `.__awaitee`: 3075 bytes, offset: 0 bytes
</span><span>type: `[async fn a]`: 3075 bytes
</span><span> variant `Suspend0`: 3074 bytes
</span><span> field `.t`: 1024 bytes, offset: 0 bytes
</span><span> field `.__awaitee`: 2050 bytes
</span><span> variant `Unresumed`: 1024 bytes
</span><span> field `.t`: 1024 bytes, offset: 0 bytes
</span><span>type: `[async fn b]`: 2050 bytes
</span><span> variant `Suspend0`: 2049 bytes
</span><span> field `.t`: 1024 bytes, offset: 0 bytes
</span><span> field `.__awaitee`: 1025 bytes
</span><span> variant `Unresumed`: 1024 bytes
</span><span> field `.t`: 1024 bytes, offset: 0 bytes
</span><span>type: `[async fn c]`: 1025 bytes
</span><span> variant `Unresumed`: 1024 bytes
</span><span> field `.t`: 1024 bytes, offset: 0 bytes
</span></code></pre>
<p>I have removed the discriminants in this output.</p>
<p>What we see however is that each of the nested futures has some space reserved
for its own copy of <code>T</code> for the case when it is <code>Unresumed</code>. In the <code>Suspend0</code>
case, it has to hold onto the nested future it is polling, but it <em>also</em> retained
a copy of its own <code>T</code>. Why exactly is that?</p>
<p>I suspect it is because it has to <em>move</em> that <code>T</code> into the new future somehow.</p>
<p>Remember that Rust futures are lazy by definition. Which mean calling an <code>async fn</code>
does nothing except create a new value type that you can move around, <code>Box</code> or
<code>Send</code> to other threads.</p>
<p>The data has to be copied around when calling the function <code>a</code>, no?</p>
<p>Lets look at a bit of generated assembly. I will use my trusty
<code>ready_or_diverge</code> function there to actually create and also poll the future.
I even had to annotate one of my futures with <code>#[inline(never)]</code>, because yes,
the Rust compiler is very smart indeed and can even
<a href="https://swatinem.de/blog/zero-cost-async/">make async futures disappear completely</a>
in some cases.</p>
<p>You can look at the full example in the <a href="https://godbolt.org/z/eqexr4jTK">Compiler Explorer</a>,
but here is the relevant snippet:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>example::a:
</span><span> push rbx
</span><span> mov rbx, rdi
</span><span> mov edx, 1024
</span><span> call qword ptr [rip + memcpy@GOTPCREL]
</span><span> mov byte ptr [rbx + 3074], 0
</span><span> mov rax, rbx
</span><span> pop rbx
</span><span> ret
</span><span>
</span><span>example::poll_test:
</span><span> push rbx
</span><span> mov eax, 4128
</span><span> call __rust_probestack
</span><span> sub rsp, rax
</span></code></pre>
<p>We can see that our <code>#[inline(never)]</code> fn <code>a</code> has a call to <code>memcpy</code>, so it is
copying our data around.
Though as I mentioned, the compiler is very good at optimizing things away.
Especially in my simplified testcase where I never actually return <code>Pending</code>,
and the compiler is smart enough to figure that out and optimize everything
away.</p>
<p>However, we also see that my <code>poll_test</code> function reserves <code>4128 bytes</code> on the
stack. That is indeed a local copy of my <code>1024 byte</code> buffer, the size of the
outer <code>test</code> future, <code>3076 bytes</code>, and then some.</p>
<p>This is my stack overflow right there. One thing to note here is also that
<code>__rust_probestack</code> checks for stack overflows explicitly and prints out a
helpful error message before aborting the program. Otherwise the stack overflow
would only manifest itself as a hard crash when trying to access out of bounds
memory. So thank you Rust for that :-)</p>
<p>If I add a <code>Box::pin</code> around the <code>test()</code> future I want to poll to completion,
the compiler still reserves <code>2080 bytes</code> of stack space, which is a little bit
more than twice my <code>1024 bytes</code> buffer. There is no call to <code>__rust_probestack</code>
in that case. Likely because there is some threshold at which the compiler
inserts that call (probably <code>4096 bytes</code>?).</p>
<p>That reservation goes away when I remove the <code>#[inline(never)]</code> annotation.
But remember, this is a very simple playground example in which the compiler
can figure out that my future is polled to completion and is immediately <code>Ready</code>.
I believe in real world examples, like the one I hit with my own real world code,
the compiler will have to fall back to a lot more <code>memcpy</code> and stack allocations.</p>
<p><strong>Aside</strong>: One side-effect of <code>async fn</code> being actually two separate functions,
one creating the future, and one for its <code>poll</code> implementation is that annotations
like <code>#[inline]</code> currently apply to the outer <em>creates the future</em> function only.
There is an <a href="https://github.com/rust-lang/rust/issues/106765">open issue</a> about
that, and I believe it should be fairly straight forward to make that annotation
apply to both functions in this case.</p>
<hr />
<p>So what have we learned so far?</p>
<p>Rust <code>async fn</code> capture all of their arguments. And <code>capture</code> in this case
means copying them into a new value type that represents the underlying future.
Depending on compiler optimizations, that involves a lot of <code>memcpy</code>, and
possibly also huge stack allocations.</p>
<p>This especially hurts if you are passing huge arguments through multiple layers
of function calls. The caller has one copy of the argument inside its own
type, and it moves / copies that argument into the callee, effectively doubling
the size of the argument type. With a deeply nested call chain, and large
arguments, this can quickly cause problems.</p>
<p>The <em>large argument</em> in some cases is an <code>impl Future</code> itself. So this can
easily become exponential.</p>
<hr />
<p>And what can we do about this?</p>
<p>Unfortunately, I do not see a simple <em>one-size-fits-all</em> solution. There is
tradeoffs everywhere. Either performance, or code-style.</p>
<p>One obvious solution would be to <code>Box::pin</code> large futures. That is the solution
I went for to work around my own real world problem. That incurs a heap
allocation though which might have some runtime cost. This is a bit sad in my
case as I use it in combination with <code>moka</code>, which as an in-memory cache should
optimize my fast-path. And now I have have a heap allocation in <em>all</em> the cases.</p>
<p>Another solution would be to group multiple arguments into a single reference:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub</span><span> async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">before_caller</span><span><A, B>(</span><span style="color:#ff8f40;">a</span><span style="color:#61676ccc;">:</span><span> A, </span><span style="color:#ff8f40;">b</span><span style="color:#61676ccc;">:</span><span> B) {
</span><span> </span><span style="color:#f07171;">before_callee</span><span>(a</span><span style="color:#61676ccc;">,</span><span> b)</span><span style="color:#ed9366;">.</span><span>await
</span><span>}
</span><span>async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">before_callee</span><span><A, B>(</span><span style="color:#ff8f40;">_a</span><span style="color:#61676ccc;">:</span><span> A, </span><span style="color:#ff8f40;">_b</span><span style="color:#61676ccc;">:</span><span> B) {}
</span><span>
</span><span style="color:#fa6e32;">pub</span><span> async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">after_caller</span><span><A, B>(</span><span style="color:#ff8f40;">a</span><span style="color:#61676ccc;">:</span><span> A, </span><span style="color:#ff8f40;">b</span><span style="color:#61676ccc;">:</span><span> B) {
</span><span> </span><span style="color:#fa6e32;">let</span><span> ab </span><span style="color:#ed9366;">= </span><span>(a</span><span style="color:#61676ccc;">,</span><span> b)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">after_callee</span><span>(</span><span style="color:#ed9366;">&</span><span>ab)</span><span style="color:#ed9366;">.</span><span>await
</span><span>}
</span><span>async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">after_callee</span><span><A, B>(</span><span style="color:#ff8f40;">_ab</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span>(</span><span style="color:#ff8f40;">A</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">B</span><span>)) {}
</span></code></pre>
<p>This definitely does not look nice, but it reduces the type that needs to be
copied into the future to <code>8 bytes</code>, the size of a normal references.</p>
<p>But sometimes we want to move and consume things, how to handle those cases?
Well, you can use a <code>&mut Option</code> for that and just <code>.take().unwrap()</code>. It is
ugly, but works. However it also has a cost. Doing an <code>unwrap</code> has some runtime
cost, as well as generating a ton of panic messages.</p>
<p>And it leaves you open to developer error, as accidentally doing that twice
<strong>will</strong> panic. Using <code>if let Some(x) = opt.take()</code> is a panic-free version with
a different tradeoff: It can lead to a latent bug that you will never know about.
So in this case doing a <code>panic</code> is a good thing, as it reminds you of that bug.</p>
<p>If we are dealing with futures, we can also pass a <code>Pin<&mut impl Future></code> when
we pin the relevant future in the outermost callee. Again, not very nice, and
it also makes you vulnerable to polling that future again after it completed,
which will <code>panic</code>.</p>
<p>Last but not least, we can combine all of these cases into a stateful wrapper
struct, and just use a <code>&mut self</code>. Though this still leaves us with the
<code>Option</code> / <code>poll</code> problem, so not a bulletproof solution either.</p>
<hr />
<p>Well, here we are. Another PSA about some hidden pitfalls with Rust <code>async fn</code>.
I hope by explaining all the details here, and even giving some suggestions
(though not perfect ones), you can avoid some of these problems in your own code.</p>
<p>Also, it is perfectly possible that the Rust compiler will get smarter over time
and could eventually completely remove this problem.
Using <code>.await</code> already makes sure that you consume the future. If the compiler
can also prove that the future does not cross the async fn boundary, it should
be free to not only inline the runtime code, but also the data of a callee into
the caller. But not yet. Time will tell.</p>
<hr />
<p><strong>Update</strong></p>
<p>After searching through the Rust issue tracker, there are multiple open issues
related to this.
A <a href="https://github.com/rust-lang/rust/issues/69826">tracking issue about memory usage</a>,
<a href="https://github.com/rust-lang/rust/issues/62958">arguments being duplicated across yield points</a>,
and <a href="https://github.com/rust-lang/rust/issues/99504">inefficient codegen, mostly <code>memcpy</code> related</a>.</p>
<p>Following these issues through other interlinked issues and PRs indeed shows that
it is a well known issue, and there have been multiple experiments and proposals
so far for how to solve it piece by piece, but not all of those were successful.</p>
<p>All in all, this makes me quite confident that this problem can indeed be solved
over time. And maybe even my own work to remove <code>GenFuture</code>, and the
<code>identity_future</code> I had to leave in its place, can help with this effort as well.</p>
A deep dive into DWARF line programs2023-01-04T00:00:00+00:002023-01-04T00:00:00+00:00
Unknown
https://swatinem.de/blog/dwarf-lines/<p>I started writing a series a blog posts explaining various debug formats, specifically
formats that allow you to recover the original source locations.</p>
<p>I wrote about <a href="https://swatinem.de/blog/sourcemaps/">SourceMaps</a> and Portable PDB
<a href="https://swatinem.de/blog/sequence-points/">Sequence Points</a> already.</p>
<p>Now it is time to look at DWARF line programs.</p>
<h1 id="dwarf-the-specification"><a class="anchor-link" href="#dwarf-the-specification" aria-label="Anchor link for: dwarf-the-specification">#</a>
DWARF, the specification</h1>
<p>The whole DWARF specification is available over at <a href="https://dwarfstd.org/">dwarfstd.org</a>.
It is a gigantic PDF file with >450 pages (including indices, etc). Things are
reasonably well interlinked in there, though its still hard to navigate and
find specific things you are looking for.</p>
<p>DWARF also evolves quite slowly. Version 5, which is only now starting to be used
as the default version output by compilers is dated February 2017. That is almost…
checks date… 6 years.</p>
<p>Some compilers are a bit overeager to use newer features though, and some things
from DWARF v6 are already in use, even though the standard version has not been
<em>published</em> yet. In those cases one can only link to PRs from the compiler
implementation.</p>
<p>The DWARF information itself is scattered throughout different formats and tables.
They are included in different sections of an executable. The line program is
defined in the <code>.debug_line</code> section (or <code>__debug_line</code> on macOS),
and it can reference data in other sections as well.</p>
<p>As with other sections, and DWARF info in general, the <code>.debug_line</code> section
is just a concatenation of all the line programs of all the compilation units.</p>
<p>Either way, on to line programs. These are described in <em>Chapter 6.2</em> (of the V5 doc).
As with the previous formats I have described, the DWARF line program is also
encoded as a state machine. This state machine encodes at least the following information
(literally copied from the standard):</p>
<ul>
<li>the source file name</li>
<li>the source line number</li>
<li>the source column number</li>
</ul>
<p>The format is also very extensible, and encodes more information than that.
In the current version, it also encodes information about statements, basic
blocks, which are a sequence of instructions that are branch targets and do not
branch away themselves. As well as a couple of flags to indicate end of prologue,
beginning of epilogue and end of sequence.</p>
<p>For the purpose of this blog post we are only interested in the end of sequences.
Sequences are contiguous runs of instructions. The state machine is
reset after a sequence and they mark the first instruction <em>after</em> the sequence.
I believe sequences more or less correspond to functions.
As the linker is free to reorder functions, and only the starting offset of a
function needs to be updated in that case.</p>
<p>After a header describing the configuration of the state machine, specifically
<code>opcode_base</code> and <code>line_base</code> which have an effect on the <em>special opcodes</em>
that are encoded in only one byte. How to decode and interpret these is explained
in chapter <code>6.2.5.1</code> of the DWARF v5 spec.
Other opcodes may take advantage of LEB128 encoded integers, so are variable
length.</p>
<h1 id="decoding-a-sequence"><a class="anchor-link" href="#decoding-a-sequence" aria-label="Anchor link for: decoding-a-sequence">#</a>
Decoding a sequence</h1>
<p>As the whole <code>.debug_line</code> section is quite complex, and the header includes a
variable length list of directories and file names, I will simplify this to
only look at the state machine itself.</p>
<p>The header gives us at least the following information, which you can also get
when you dump the <code>.debug_line</code> contents via <code>llvm-dwarfdump --debug-line --verbose</code>:</p>
<ul>
<li><code>line_base: -5</code></li>
<li><code>line_range: 14</code></li>
<li><code>opcode_base: 13</code></li>
<li><code>file_names[1]: "main.c"</code></li>
</ul>
<p>The header also defines <code>min_inst_length: 1</code> and <code>max_ops_per_inst: 1</code>, which
simplifies the calculation of the <em>operation advance</em>, or the address increment.
In that case, the state machine does not need to keep track of an internal <code>op_index</code>.</p>
<p>The leaves us with the following bytes to decode:</p>
<pre data-lang="text" style="background-color:#fafafa;color:#61676c;" class="language-text "><code class="language-text" data-lang="text"><span>blob: 00 09 02 50 3f 00 00 01 00 00 00 16 05 05 0a e5 59 75 02 06 00 01 01
</span><span>
</span><span>We start out with { addr: 0x0, file: 1, line: 1, column: 0 }
</span><span>
</span><span>0x00: this is an extended opcode
</span><span>0x09: the extended opcode spans 9 bytes
</span><span>0x02: this is the extended opcode `DW_LNE_set_address`
</span><span>50 3f 00 00 01 00 00 00: the remaining 8 bytes are little endian for: `0x100003f50`
</span><span>0x16 (22 in decimal): this is a special opcode:
</span><span> - adjusted opcode: 22 - 13 = 9
</span><span> - operation advance: 9 / 14 = 0 (truncating division)
</span><span> - line increment: -5 + (9 % 14) = 4
</span><span> => We emit the following entry: { addr: 0x100003f50, file: 1, line: 5, column 0 }
</span><span>0x05: this is a standard opcode `DW_LNS_set_column`
</span><span>0x05: set the column number to `5`
</span><span>0x0a (10 in decimal): this is a standard opcode `DW_LNS_set_prologue_end`
</span><span>0xe5 (229 in decimal): this is a special opcode:
</span><span> - adjusted opcode: 229 - 13 = 216
</span><span> - operation advance: 216 / 14 = 15
</span><span> - line increment: -5 + (216 % 14) = 1
</span><span> => We emit the following entry: { addr: 0x100003f5f, file: 1, line: 6, column: 5 }
</span><span> ... also, this is a prologue end, but we do not care about that
</span><span>0x59 (89 in decimal): this is a special opcode:
</span><span> - adjusted opcode: 89 - 13 = 76
</span><span> - operation advance: 76 / 14 = 5
</span><span> - line increment: -5 + (76 % 14) = 1
</span><span> => We emit the following entry: { addr: 0x100003f64, file: 1, line: 7, column: 5 }
</span><span>0x75 (117 in decimal): this is a special opcode:
</span><span> - adjusted opcode: 117 - 13 = 104
</span><span> - operation advance: 104 / 14 = 7
</span><span> - line increment: -5 + (104 % 14) = 1
</span><span> => We emit the following entry: { addr: 0x100003f6b, file: 1, line: 8, column: 5 }
</span><span>0x02: this is a standard opcode `DW_LNS_advance_pc`
</span><span>0x06: operation advance: 6
</span><span>0x00: this is an extended opcode
</span><span>0x01: the extended opcode spans 1 byte
</span><span>0x01: this is the extended opcode `DW_LNE_end_sequence`
</span><span> => Our sequence ends at: { addr: 0x100003f71 }
</span></code></pre>
<h1 id="how-to-use-these-mappings"><a class="anchor-link" href="#how-to-use-these-mappings" aria-label="Anchor link for: how-to-use-these-mappings">#</a>
How to use these mappings</h1>
<p>This was a simplified example, and only uses a single source file and only a limited
number of entries.</p>
<p>Each entry implicitly goes to the next one, ond the <code>end_sequence</code> does not really count,
thus we have the following entries:</p>
<pre data-lang="text" style="background-color:#fafafa;color:#61676c;" class="language-text "><code class="language-text" data-lang="text"><span>- 0x100003f50 - 0x100003f5f: file 1 (which is `"main.c"`), line 5, column 0
</span><span> (this is the function prologue)
</span><span>- 0x100003f5f - 0x100003f64: file 1, line 6, column 5
</span><span>- 0x100003f64 - 0x100003f6b: file 1, line 7, column 5
</span><span>- 0x100003f6b - 0x100003f71: file 1, line 8, column 5
</span></code></pre>
<p>As each sequence is contiguous internally, and is terminated by an <code>end_sequence</code>
marker, instead of storing the end explicitly, we could also add a sentinel
value instead, put everything into a sorted list and binary search that quickly.</p>
<p>This is pretty much how the Sentry SymCache format works.</p>
<h1 id="summary"><a class="anchor-link" href="#summary" aria-label="Anchor link for: summary">#</a>
Summary</h1>
<p>We have looked in depth at the DWARF line program binary format and learned a
couple of things about it:</p>
<ul>
<li>The DWARF specification a complex but well documented format, though the specification
can be hard to read and understand at some points.</li>
<li>The line programs, one per compilation unit, are contained in a <code>.debug_line</code>
section. They can also reference other sections depending on the DWARF version.</li>
<li>Each line program has a header and a list of instructions.</li>
<li>These instructions encode the address, file, line, column and a bunch of flags.</li>
<li>The line program is divided into contiguous sequences.</li>
<li>The format and opcodes are very extensible, supporting all kinds of instruction
set architectures, which also makes it very complex.</li>
<li>The line program itself has no information about functions and their names.
That information is part of the <code>.debug_info</code> section and the debug information
entries contained within.</li>
</ul>
<hr />
<p>This concludes the deep dive into DWARF. This leaves me with only one more
format to go in this series: <em>Windows PDB line programs</em>.</p>
<p>Those are pretty much completely undocumented, so it will take some time to
digest everything into a hopefully understandable blog post.</p>
2022 Retrospective2022-12-30T00:00:00+00:002022-12-30T00:00:00+00:00
Unknown
https://swatinem.de/blog/2022-retrospective/<p>It is the end of the year, and a lot of people are writing end-of-year posts,
and outlooks for the new year. So here is mine.</p>
<h1 id="sentry"><a class="anchor-link" href="#sentry" aria-label="Anchor link for: sentry">#</a>
Sentry</h1>
<p>It was an interesting year for sure. We shipped a bunch of stuff.
Some internal that otherwise noone would know about and others external features
adding support for new platforms and ecosystems.</p>
<p>A big internal item was switching our serverside processor from breakpad to
<a href="https://github.com/rust-minidump/rust-minidump">rust-minidump</a>. The crate was
primarily developed by folks at Mozilla, but our team contributed a bunch of
fixes and extended support to other platforms such as MIPS.</p>
<p>We ran both implementations side by side for some time to gather up differences
and were looking at some mismatching cases. We fixed regressions and celebrated
improvements. Though some regressions slipped through and we had to fix them
after the switch. As we are dealing with customer data though, it was not really
possible to create testcases out of all the interesting cases.</p>
<p>Our support for unwind info was also extended, especially on Windows to be able
to unwind in more situations.</p>
<p>There were also a bunch of fixes to our own SymCache format, a lot of which were
contributed by Mozilla people as well.</p>
<p>We also started dogfooding our own performance monitoring product, as well as the
profiling support in some limited fashion.</p>
<p>On to more visible features, we enabled support for line numbers in Unity projects.
After a lot of initial experimentation and exploration, we ended up with a
simple and elegant solution. The Unity il2cpp compiler puts annotations into
the generated C++ sources that map back to the C# code they were generated from.
We simply use those annotations to map back to the C# code.</p>
<p>There is one followup to the whole Unity story however. As we do not control the
stack walking on the client-side at all, we are at the mercy of what Unity gives us.
And in some cases it gives us offset instruction addresses. In other cases we
offset those again in a wrong way.</p>
<p>Surprisingly, the source of this problem touches lots of parts of the product,
and work is ongoing to find a solution that serves more usecases as well.</p>
<p>Next up, we added server-side support to symbolicate .NET stack traces. And a
contractor even extended this support to also offer source context as well.</p>
<p>Whereas we have server-side support for this for quite some time, the .NET SDK
is a bit lacking behind, but should soon catch up to ship this feature to customers.</p>
<p>Another big thing we shipped was improved SourceMap support. We are now parsing
the minified JS source to extract function scopes and give them reasonably
meaningful names, and using the SourceMap to recover the original names.</p>
<p>You can read up on our <a href="https://blog.sentry.io/2022/11/30/how-we-made-javascript-stack-traces-awesome/">blog post</a>
that explains all this in more detail. We also designed a new lookup format for
this that is very similar to our existing SymCache format. This opens up the
door to cache all these computations to amortize their cost in the future.</p>
<hr />
<p>Apart from all this feature work, I also blogged about some interesting issues
and bugs that I fixed along the way, like
<a href="https://swatinem.de/blog/format-ossification/">format ossification</a> and a serious
inefficiency in the parsing of
<a href="https://swatinem.de/blog/abbreviations/">DWARF Abbreviations</a>.</p>
<p>Inspired by the above feature work on <a href="https://swatinem.de/blog/sourcemaps/">SourceMaps</a>
and <a href="https://swatinem.de/blog/sequence-points/">.NET Portable PDBs</a>, I started
a blog series exploring those formats in great depth. I am procrastinating hard,
but still plan to eventually write about DWARF and native PDB formats as well.</p>
<h1 id="rust"><a class="anchor-link" href="#rust" aria-label="Anchor link for: rust">#</a>
Rust</h1>
<p>I wrote quite a lot of posts about Rust this year. Take a look at the archive,
they are too many to list.</p>
<p>These range from educational posts about Futures in general, commentary on some
of the broader discussions in the ecosystem as well as describing some common
pitfalls and exploring some zero-copy parsing.</p>
<p>I called out to <a href="https://swatinem.de/blog/fix-rustdoc/">fix rustdoc doctests</a> as they
are rather held together by doc-tape right now. (Yes, I want to make this pun a thing!)
Some posts in the community about what the Rust foundation should focus on
mentioned hiring / funding people to work on especially tedious and non-glamorous
tasks. Doctests seem to be one of those, and are in need of someone giving more
love to them.</p>
<p>One PR of mine, <a href="https://github.com/rust-lang/rust/pull/103682">stabilizing the <code>--test-run-directory</code></a>
flag, which itself is just an implementation detail to make the output of
<code>cargo test</code> more readable has been in FCP limbo for quite some time now, and
there is little movement.</p>
<p>Towards the end of the year I took it on myself to improve the inner workings
of <code>async fn</code>. Primarily motivated by Sentry starting to dogfood our profiling
product, and seeing how bad async stack traces look in profiles. Someone called
me out that this is indeed a big “yak shave” just to get prettier stack traces.</p>
<p>This was not as smooth sailing as I had hoped though. Whereas removal of the
intermediate <code>GenFuture</code> is on its way to Rust <code>1.67</code>, the state it is in does
take one significant shortcut.
The Future gets a <code>&mut Context<'_></code> from the outside, but is treating that as
a <code>ResumeTy</code> internally. <code>ResumeTy</code> is an unsafe pointer wrapper around <code>Context</code>,
and its only purpose is to paper over some shortcomings in the type checker.
For now, this works in practice as <code>ResumeTy</code> and <code>&mut Context<'_></code> are really
<em>just pointers</em>, but the <code>cg_clif</code> codegen backend validates these types more
strictly and is complaining, rightfully so.</p>
<p>An earlier attempt of mine to fix that and use <code>&'static mut Context<'_></code>
failed as it caused a <code>higher-ranked lifetime error</code>. This error already existed,
and still exists for some weird cases, though my change really did break existing
code.</p>
<p>It was eventually reverted, but left <code>nightly</code> and even <code>beta</code> in an unusable
state for some people for some time. A lesson to myself here is to revert
things earlier if there is serious breakage.</p>
<p>Reverting that change however is still causing problems for <code>cg_clif</code>. I do have
a <a href="https://github.com/rust-lang/rust/pull/105977">PR up</a> that replaces this
<code>ResumeTy</code> later in the pipeline so the type checker is still dealing with
<code>ResumeTy</code> for now. I believe there is already work underway in the type checker
to solve the underlying problems that made this workaround necessary.</p>
<h1 id="personal"><a class="anchor-link" href="#personal" aria-label="Anchor link for: personal">#</a>
Personal</h1>
<p>Well, probably everyone is feeling it themselves that we have arrived in a real
cost of living crises. I won’t go into details of who is responsible though.</p>
<p>I wanted to buy real estate and relocate for some time, but that plan is put on
ice right now, as this depends on four factors that make right now the worst
time to commit to such an investment:</p>
<ul>
<li>Financing options have been getting worse over the year to a point where
mortgages are not affordable at all.</li>
<li>At the same time, real estate prices are as inflated as ever, and there is not
yet a trend of them going down either.</li>
<li>Cost of living has exploded, so after living expenses there is less capital
left over to allocate to investments.</li>
<li>Salaries haven’t caught up yet with any of this.</li>
</ul>
<p>I fear that things are getting worse still before they get better. Its also a
sentiment shared by a lot of people.</p>
<p>All in all, this whole situation makes me super anxious, and it feels like the
world around me is going to shit.</p>
<h1 id="2023"><a class="anchor-link" href="#2023" aria-label="Anchor link for: 2023">#</a>
2023</h1>
<p>Well, lets leave this year behind us and start fresh into the new one, shall we?</p>
<p>I’m not someone to have big plans and ambitions, but there is one thing I would
like to do this coming year.</p>
<p>Sentry has recently <a href="https://blog.sentry.io/2022/11/30/bringing-codecov-into-the-sentry-family-where-code-coverage-meets-application-monitoring/">acquired CodeCov</a>.
This is great news, as I am super passionate about code coverage as well.
After all, <a href="https://cov.rs/">cov.rs</a> redirects to my blog here.</p>
<p>I have contributed <a href="https://github.com/rust-lang/rust/pull/79762">improvements to code coverage of doctests</a>
already two years ago, and was taking a look at <a href="https://github.com/rust-lang/rust/pull/90047">some preliminary work</a>
to eventually add proper <a href="https://github.com/rust-lang/rust/issues/79649">branch coverage</a>
support to the Rust compiler.</p>
<p>With these recent news, I hope I will be able to dedicate some <em>official</em> time
to this effort. Along with taking care of any followup work from my changes
to <code>async</code> functions. And I would also like to present the inner working of
<code>async</code> functions any my work to improve that at a meetup and/or conference
sometime this year as well.</p>
<p>That is pretty much all I can think of right now. Everything else I will make up
as I go. :-)</p>
Improving async Rust codegen2022-11-18T00:00:00+00:002022-11-18T00:00:00+00:00
Unknown
https://swatinem.de/blog/improving-async-codegen/<p>Last week I was looking at the <a href="https://swatinem.de/blog/async-codegen/">implementation details of async</a>.
Specifically I was looking at two issues that make stack traces in async programs
confusing and hard to make sense of.</p>
<p>To recap, lets take this snippet of Rust code:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub</span><span> async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">fn_with_nested_block</span><span>() </span><span style="color:#61676ccc;">-></span><span> Backtrace {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">None</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap_or_else</span><span>(|| async { Backtrace</span><span style="color:#ed9366;">::</span><span>force_capture() })
</span><span> </span><span style="color:#ed9366;">.</span><span>await
</span><span>}
</span></code></pre>
<p>When we run it with your favorite async runtime of choice and print the stack trace,
we will get something like this on Linux:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span> 0: async_codegen::fn_with_nested_block::{{closure}}::{{closure}}::{{closure}}
</span><span> at ./src/lib.rs:15:36
</span><span> 1: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
</span><span> at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
</span><span> 2: async_codegen::fn_with_nested_block::{{closure}}
</span><span> at ./src/lib.rs:16:9
</span><span> 3: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
</span><span> at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
</span><span> 4: async_codegen::tests::test_stack::{{closure}}
</span><span> at ./src/lib.rs:77:51
</span><span> 5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
</span><span> at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
</span></code></pre>
<p>This does not look very nice for two reasons:</p>
<ol>
<li>Every async fn in our stack trace adds a meaningless and noisy <code>GenFuture</code> in the middle.</li>
<li>For nested blocks, we end up with a ton of <code>::{{closure}}</code> that are confusing.</li>
</ol>
<p>I ended the post by saying “I will have a look”. Well, I did, and I have both
good, and not as good news.</p>
<h1 id="symbol-mangling"><a class="anchor-link" href="#symbol-mangling" aria-label="Anchor link for: symbol-mangling">#</a>
Symbol Mangling</h1>
<p>It turns out that the issue of function names is a Linux/macOS issue related
to the way symbol mangling is done on those platforms.</p>
<p>This is not an issue on Windows which uses a different way of representing
function names.</p>
<p>For the snippet above, I do get a much better output:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>async_codegen::fn_with_nested_block::async_fn$0::closure$0::async_block$0
</span></code></pre>
<p>For Linux, I figured out the code paths that generate the mangled names, and
have a <a href="https://github.com/rust-lang/rust/pull/104333">Draft PR</a> open that at
least gets the necessary information through to that place.</p>
<p>The <code>v0</code> symbol mangling scheme was first proposed in
<a href="https://github.com/rust-lang/rfcs/pull/2603">RFC 2603</a> in 2018, and
implemented in early 2019. The <a href="https://github.com/rust-lang/rust/pull/89917">PR to make it the default</a>
has been sitting there since end of 2021, though there seems to be a little
progress.</p>
<p>The problem here is that Rust symbol mangling is larger than just the Rust project.
It needs to be understood by gdb (and other GNU tools), lldb (and other LLVM tools),
as well as a wide variety of profiling and binary instrumentation tools.</p>
<p>There were a couple of PRs along the years to amend the format which are linked
from the PR above, so making changes is possible.
Though without actually having a look at those, I imagine the process to be
rather tedious and slow. Not something I’m too excited about. But I will keep
looking at it.</p>
<h1 id="getting-rid-of-genfuture"><a class="anchor-link" href="#getting-rid-of-genfuture" aria-label="Anchor link for: getting-rid-of-genfuture">#</a>
Getting rid of <code>GenFuture</code></h1>
<p>What does get me more excited though is getting rid of <code>GenFuture</code>.</p>
<p>I managed to hack together a proof of concept in about two days and a have
<a href="https://github.com/rust-lang/rust/pull/104321">Draft PR</a> open.</p>
<p>A compiler with my PR does solve my original goal of getting rid of the superfluous
<code>GenFuture<T></code> frames in my stack traces. Here is the relevant snippet for the code
above, on Windows:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span> 3: async_codegen::fn_with_nested_block::async_fn$0::closure$0::async_block$0
</span><span> at .\src\lib.rs:14
</span><span> 4: async_codegen::fn_with_nested_block::async_fn$0
</span><span> at .\src\lib.rs:15
</span><span> 5: async_codegen::tests::test_stack::async_block$0
</span><span> at .\src\lib.rs:28
</span></code></pre>
<p>That looks very clean now. I have also verified this on
<a href="https://github.com/getsentry/symbolicator">symbolicator</a> which is a huge async
heavy codebase. It builds, passes tests, and profiling it with
<a href="https://github.com/mstange/samply">samply</a> yields much better stack traces than
before.</p>
<p>The extensive Rust test suite also revealed some unexpected improvements:</p>
<p>Diagnostic spans now point to the <em>whole</em> async block, not only the <em>block</em> after
the async keyword:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>Before:
</span><span>LL | async { *x }
</span><span> | ^^--^^
</span><span> | | |
</span><span> | | `x` is borrowed here
</span><span> | may outlive borrowed value `x`
</span><span>After:
</span><span>LL | async { *x }
</span><span> | ^^^^^^^^--^^
</span><span> | | |
</span><span> | | `x` is borrowed here
</span><span> | may outlive borrowed value `x`
</span><span>
</span><span>OR:
</span><span>Before:
</span><span>LL | let send_fut = async {
</span><span> | __________________________^
</span><span>After:
</span><span>LL | let send_fut = async {
</span><span> | ____________________^
</span></code></pre>
<p>I had to do quite some work chasing down various diagnostics that had subtle
changes. Most of those had some special handling for async constructs that was
not compatible anymore after the changes I have done.</p>
<p>Though there is still some regressions to track down.</p>
<p>For one, async blocks are now trivially <code>const</code>. They are <em>just data</em> after all.
While this is really an improvement, it is an unexpected improvement, as they
are supposed to be behind a <a href="https://github.com/rust-lang/rust/issues/85368"><code>const_async_blocks</code> feature</a>.</p>
<p>At the time that check is performed, the async function as such does not exist
anymore, so I will have to figure out a way to implement this check a different way.</p>
<h1 id="what-an-async-fn-captures"><a class="anchor-link" href="#what-an-async-fn-captures" aria-label="Anchor link for: what-an-async-fn-captures">#</a>
What an async fn <code>Captures</code></h1>
<p>This leaves me with another very hard to track down failure, and now that I
am trying to write it down, I get more confused by the minute.</p>
<p>Consider this code, which comes directly from the
<a href="https://github.com/rust-lang/rust/blob/83356b78c4ff3e7d84e977aa6143793545967301/src/test/ui/self/self_lifetime-async.rs">Rusts test suite</a>:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">Foo</span><span><</span><span style="color:#fa6e32;">'a</span><span>>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'a </span><span>())</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">type </span><span style="color:#399ee6;">Alias </span><span style="color:#ed9366;">= </span><span>Foo<</span><span style="color:#fa6e32;">'static</span><span>></span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">impl </span><span style="color:#399ee6;">Alias </span><span>{
</span><span> </span><span style="color:#fa6e32;">pub</span><span> async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">using_alias</span><span><</span><span style="color:#fa6e32;">'a</span><span>>(</span><span style="color:#ff8f40;">self</span><span>: </span><span style="color:#ed9366;">&</span><span>Alias, </span><span style="color:#ff8f40;">arg</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'a </span><span>()) </span><span style="color:#61676ccc;">-> </span><span style="color:#ed9366;">&</span><span>() {
</span><span> arg
</span><span> }
</span><span>}
</span></code></pre>
<p>This typechecks just fine with Rust stable (<code>1.65</code>), but fails with my PR:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>error[E0700]: hidden type for `impl Future<Output = &'a ()>` captures lifetime that does not appear in bounds
</span><span> --> playground\async-codegen\src\lib.rs:22:68
</span><span> |
</span><span>22 | pub async fn using_alias<'a>(self: &Alias, arg: &'a ()) -> &() {
</span><span> | ____________________________________________________________________^
</span><span>23 | | arg
</span><span>24 | | }
</span><span> | |_____^
</span><span> |
</span><span> = note: hidden type `impl Future<Output = &'a ()>` captures lifetime '_#17r
</span></code></pre>
<p>However, if I manually “inline” the type alias like so:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">impl </span><span style="color:#399ee6;">Foo</span><span><</span><span style="color:#fa6e32;">'static</span><span>> {
</span><span> </span><span style="color:#fa6e32;">pub</span><span> async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">using_self</span><span><</span><span style="color:#fa6e32;">'a</span><span>>(</span><span style="color:#ed9366;">&</span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">arg</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'a </span><span>()) </span><span style="color:#61676ccc;">-> </span><span style="color:#ed9366;">&</span><span>() {
</span><span> arg
</span><span> }
</span><span>}
</span></code></pre>
<p>Things are already failing on stable Rust:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>error: lifetime may not live long enough
</span><span> --> playground\async-codegen\src\lib.rs:29:9
</span><span> |
</span><span>28 | pub async fn using_self<'a>(&self, arg: &'a ()) -> &() {
</span><span> | -- - let's call the lifetime of this reference `'1`
</span><span> | |
</span><span> | lifetime `'a` defined here
</span><span>29 | arg
</span><span> | ^^^ associated function was supposed to return data with lifetime `'1` but it is returning
</span><span>data with lifetime `'a`
</span></code></pre>
<p>Am I completely misunderstanding how type aliases are supposed to work? Are they
not interchangeable with the type they are aliasing?</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">alias</span><span>(</span><span style="color:#ff8f40;">a</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span>Alias) </span><span style="color:#61676ccc;">-> </span><span style="color:#ed9366;">&</span><span>Foo<</span><span style="color:#fa6e32;">'static</span><span>> {
</span><span> a
</span><span>}
</span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">call_alias</span><span>() {
</span><span> </span><span style="color:#fa6e32;">let</span><span> a</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span>Foo<</span><span style="color:#fa6e32;">'static</span><span>> </span><span style="color:#ed9366;">= &</span><span>Foo(</span><span style="color:#ed9366;">&</span><span>())</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">alias</span><span>(a)</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>This snippet of code suggests so, right?</p>
<p>Am I so out of touch with reality by now?</p>
<hr />
<p>Lets take a different step, and try desugaring the async fn. As a reminder,
the recent <a href="https://blog.rust-lang.org/inside-rust/2022/11/17/async-fn-in-trait-nightly.html#recap-how-asyncawait-works-in-rust">async fn in trait</a>
blog post showed this desugaring as well:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">impl </span><span style="color:#399ee6;">Alias </span><span>{
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">desugared_using_alias</span><span><</span><span style="color:#fa6e32;">'a</span><span>>(</span><span style="color:#ff8f40;">self</span><span>: </span><span style="color:#ed9366;">&</span><span>Alias, </span><span style="color:#ff8f40;">arg</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'a </span><span>()) </span><span style="color:#61676ccc;">-></span><span> impl Future<Output = </span><span style="color:#ed9366;">&</span><span>()> {
</span><span> </span><span style="color:#fa6e32;">let</span><span> _self </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#61676ccc;">;
</span><span> async </span><span style="color:#fa6e32;">move </span><span>{
</span><span> </span><span style="color:#fa6e32;">let</span><span> _self </span><span style="color:#ed9366;">=</span><span> _self</span><span style="color:#61676ccc;">;
</span><span> arg
</span><span> }
</span><span> }
</span><span>}
</span></code></pre>
<p>You might wonder, what am I doing with that weird <code>_self</code> parameter?</p>
<p>That is a way to explicitly capture that parameter. This is the main difference
between functions and closures. Closures only capture what they <em>need</em>, whereas
functions capture all the arguments, and drop them in a very specific order.</p>
<p>Trying to compile that code gives me my good friend <code>E0700</code> again:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>error[E0700]: hidden type for `impl Future<Output = &'a ()>` captures lifetime that does not appear in bounds
</span><span> --> playground\async-codegen\src\lib.rs:31:9
</span><span> |
</span><span>29 | pub fn desugared_using_alias<'a>(self: &Alias, arg: &'a ()) -> impl Future<Output = &()> {
</span><span> | ------ hidden type `impl Future<Output = &'a ()>` captures the anonymous lifetime defined here
</span><span>30 | let _self = self;
</span><span>31 | / async move {
</span><span>32 | | let _self = _self;
</span><span>33 | | arg
</span><span>34 | | }
</span><span> | |_________^
</span><span> |
</span><span>help: to declare that the `impl Trait` captures `'_`, you can add an explicit `'_` lifetime bound
</span><span> |
</span><span>29 | pub fn desugared_using_alias<'a>(self: &Alias, arg: &'a ()) -> impl Future<Output = &()> + '_ { | ++++
</span></code></pre>
<p>And there is a suggestions. What if we apply it?</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">impl </span><span style="color:#399ee6;">Alias </span><span>{
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">desugared_using_alias</span><span><</span><span style="color:#fa6e32;">'a</span><span>>(</span><span style="color:#ff8f40;">self</span><span>: </span><span style="color:#ed9366;">&</span><span>Alias, </span><span style="color:#ff8f40;">arg</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'a </span><span>()) </span><span style="color:#61676ccc;">-></span><span> impl Future<Output = </span><span style="color:#ed9366;">&</span><span>()> </span><span style="color:#ed9366;">+ '_ </span><span>{
</span><span> </span><span style="color:#fa6e32;">let</span><span> _self </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#61676ccc;">;
</span><span> async </span><span style="color:#fa6e32;">move </span><span>{
</span><span> </span><span style="color:#fa6e32;">let</span><span> _self </span><span style="color:#ed9366;">=</span><span> _self</span><span style="color:#61676ccc;">;
</span><span> arg
</span><span> }
</span><span> }
</span><span>}
</span></code></pre>
<p>… and compile again:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>error: lifetime may not live long enough
</span><span> --> playground\async-codegen\src\lib.rs:34:9
</span><span> |
</span><span>29 | pub fn desugared_using_alias<'a>(
</span><span> | -- lifetime `'a` defined here
</span><span>30 | self: &Alias,
</span><span> | - let's call the lifetime of this reference `'1`
</span><span>...
</span><span>34 | / async move {
</span><span>35 | | let _self = _self;
</span><span>36 | | arg
</span><span>37 | | }
</span><span> | |_________^ associated function was supposed to return data with lifetime `'a` but it is returning
</span><span>data with lifetime `'1`
</span></code></pre>
<p>Uff, that is not very helpful either.</p>
<hr />
<p>Interestingly enough, while I was experimenting, I had a slightly different
snippet of code before, taking <code>_self</code> instead of <code>self</code>:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">impl </span><span style="color:#399ee6;">Alias </span><span>{
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">desugared_using_alias</span><span><</span><span style="color:#fa6e32;">'a</span><span>>(</span><span style="color:#ff8f40;">_self</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span>Alias, </span><span style="color:#ff8f40;">arg</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'a </span><span>()) </span><span style="color:#61676ccc;">-></span><span> impl Future<Output = </span><span style="color:#ed9366;">&</span><span>()> </span><span style="color:#ed9366;">+ '_ </span><span>{
</span><span> </span><span style="color:#fa6e32;">let</span><span> _self </span><span style="color:#ed9366;">=</span><span> _self</span><span style="color:#61676ccc;">;
</span><span> async </span><span style="color:#fa6e32;">move </span><span>{
</span><span> </span><span style="color:#fa6e32;">let</span><span> _self </span><span style="color:#ed9366;">=</span><span> _self</span><span style="color:#61676ccc;">;
</span><span> arg
</span><span> }
</span><span> }
</span><span>}
</span></code></pre>
<p>This surprisingly makes a huge difference in diagnostics:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>error[E0106]: missing lifetime specifiers
</span><span> --> playground\async-codegen\src\lib.rs:29:90
</span><span> |
</span><span>29 | pub fn desugared_using_alias<'a>(_self: &Alias, arg: &'a ()) -> impl Future<Output = &()> + '_ {
</span><span> | ------ ------ ^ ^^ expected named lifetime parameter
</span><span> | |
</span><span> | expected named lifetime parameter
</span><span> |
</span><span> = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `_self` or `arg`
</span><span>help: consider using the `'a` lifetime
</span><span> |
</span><span>29 | pub fn desugared_using_alias<'a>(_self: &Alias, arg: &'a ()) -> impl Future<Output = &'a ()> +'a {
</span><span> | ++ ~~
</span><span>
</span><span>error[E0621]: explicit lifetime required in the type of `_self`
</span><span> --> playground\async-codegen\src\lib.rs:31:9
</span><span> |
</span><span>29 | pub fn desugared_using_alias<'a>(_self: &Alias, arg: &'a ()) -> impl Future<Output = &()> + '_ {
</span><span> | ------ help: add explicit lifetime `'a` to the type of `_self`: `&'a Foo<'static>`
</span><span>30 | let _self = _self;
</span><span>31 | / async move {
</span><span>32 | | let _self = _self;
</span><span>33 | | arg
</span><span>34 | | }
</span><span> | |_________^ lifetime `'a` required
</span></code></pre>
<p>Now it is giving different errors and different suggestions, namely to just use
<code>'a</code> everywhere.
And, to my surprise, even the diagnostics will just inline <code>Alias</code> as <code>Foo<'static></code>.</p>
<hr />
<p>Circling back to our original code with <code>self</code>, and applying these suggestions
to just use <code>'a</code> everywhere does solve the problem and the code finally compiles,
but it is not entirely correct, as now both <code>self</code> and <code>arg</code> are tied to the same lifetime.</p>
<p>We can demonstrate this with another snippet:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub</span><span> async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">use_lifetimes</span><span>() {
</span><span> </span><span style="color:#fa6e32;">let</span><span> _self</span><span style="color:#61676ccc;">:</span><span> Alias </span><span style="color:#ed9366;">=</span><span> Foo(</span><span style="color:#ed9366;">&</span><span>())</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> arg </span><span style="color:#ed9366;">= </span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> arg_ref </span><span style="color:#ed9366;">=</span><span> _self</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">desugared_using_alias</span><span>(</span><span style="color:#ed9366;">&</span><span>arg)</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">drop</span><span>(_self)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">println!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{arg_ref:?}</span><span style="color:#86b300;">"</span><span>)</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>error[E0505]: cannot move out of `_self` because it is borrowed
</span><span> --> playground\async-codegen\src\lib.rs:42:10
</span><span> |
</span><span>41 | let arg_ref = _self.desugared_using_alias(&arg).await;
</span><span> | --------------------------------- borrow of `_self` occurs here
</span><span>42 | drop(_self);
</span><span> | ^^^^^ move out of `_self` occurs here
</span><span>43 | println!("{arg_ref:?}");
</span><span> | ------- borrow later used here
</span></code></pre>
<p>The diagnostics now say that <code>_self</code> is tied to <code>arg_ref</code>, which we said above
by making it the same lifetime, but did not really intend. So how can we fix that?</p>
<p>By introducing separate lifetimes, and adding an explicit lifetime bound after
the compiler told us to:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">impl </span><span style="color:#399ee6;">Alias </span><span>{
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">desugared_using_alias</span><span><</span><span style="color:#fa6e32;">'arg</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">'slf</span><span>, </span><span style="color:#fa6e32;">'slf</span><span>>(
</span><span> </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'slf </span><span style="color:#ff8f40;">self</span><span>,
</span><span> </span><span style="color:#ff8f40;">arg</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'arg </span><span>(),
</span><span> ) </span><span style="color:#61676ccc;">-></span><span> impl Future<Output = </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'arg </span><span>()> </span><span style="color:#ed9366;">+ </span><span style="color:#fa6e32;">'slf </span><span>{
</span><span> </span><span style="color:#fa6e32;">let</span><span> _self </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#61676ccc;">;
</span><span> async </span><span style="color:#fa6e32;">move </span><span>{
</span><span> </span><span style="color:#fa6e32;">let</span><span> _self </span><span style="color:#ed9366;">=</span><span> _self</span><span style="color:#61676ccc;">;
</span><span> arg
</span><span> }
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">pub</span><span> async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">use_lifetimes</span><span>() {
</span><span> </span><span style="color:#fa6e32;">let</span><span> _self</span><span style="color:#61676ccc;">:</span><span> Alias </span><span style="color:#ed9366;">=</span><span> Foo(</span><span style="color:#ed9366;">&</span><span>())</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> arg </span><span style="color:#ed9366;">= </span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> arg_ref </span><span style="color:#ed9366;">=</span><span> _self</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">desugared_using_alias</span><span>(</span><span style="color:#ed9366;">&</span><span>arg)</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">drop</span><span>(_self)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">println!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{arg_ref:?}</span><span style="color:#86b300;">"</span><span>)</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<hr />
<p>So where am I going with all this? Lifetimes are hard!</p>
<p>Either way, this was a very long post already, and still some things to solve.</p>
<p>I hope at least I could raise some excitement about the improvements I’m trying
to make. Having cleaner and more readable stack traces is definitely a win.</p>
<p>I also anticipate that there will be smaller wins elsewhere. Less code for the
compiler to inline and optimize away, less debuginfo to generate. It could
potentially reduce compiletimes, output binary sizes, and even improve the
runtime performance of the generated code. I haven’t measured that effect yet,
and the Rust performance test suite did not yet run on my PR either.</p>
Implementation Details of async Rust2022-11-09T00:00:00+00:002022-11-09T00:00:00+00:00
Unknown
https://swatinem.de/blog/async-codegen/<p>I have been looking at a lot of Rust async stack traces lately.
This was mostly related to profiling some heavily async code locally, as well as
profiling some production systems in the cloud test driving Sentrys new profiling support for Rust.</p>
<p>We don’t need to go all that big and fancy, we can observe the problem already
with a tiny example. Now that <code>Backtrace</code> is finally stable, we can capture one
directly in stable Rust today without any external dependencies, though I do
need to pull in an async executor.</p>
<p>I could just reuse my <code>ready_or_diverge</code> noop executor I <a href="https://swatinem.de/blog/zero-cost-async/">used previously</a>,
but I settled on <code>tokio</code> instead.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">use </span><span>std</span><span style="color:#ed9366;">::</span><span>backtrace</span><span style="color:#ed9366;">::</span><span>Backtrace</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">pub</span><span> async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">a</span><span>(</span><span style="color:#ff8f40;">arg</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>) </span><span style="color:#61676ccc;">-></span><span> Backtrace {
</span><span> </span><span style="color:#fa6e32;">let</span><span> bt </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">b</span><span>()</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> _arg </span><span style="color:#ed9366;">=</span><span> arg</span><span style="color:#61676ccc;">;
</span><span> bt
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">pub</span><span> async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">b</span><span>() </span><span style="color:#61676ccc;">-></span><span> Backtrace {
</span><span> Backtrace</span><span style="color:#ed9366;">::</span><span>force_capture()
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">cfg</span><span>(test)]
</span><span style="color:#fa6e32;">mod </span><span style="color:#399ee6;">tests </span><span>{
</span><span> </span><span style="color:#fa6e32;">use super</span><span style="color:#ed9366;">::*</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">tokio</span><span>::</span><span style="color:#f29718;">test</span><span>]
</span><span> async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">test_stack</span><span>() {
</span><span> </span><span style="color:#fa6e32;">let</span><span> backtrace </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">a</span><span>(</span><span style="color:#ff8f40;">0</span><span>)</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">println!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{}</span><span style="color:#86b300;">"</span><span style="color:#61676ccc;">,</span><span> backtrace)</span><span style="color:#61676ccc;">;
</span><span> }
</span><span>}
</span></code></pre>
<p>So what kind of stack trace does this produce? A <strong>humongous</strong> one!
Most of that is thread setup, <code>#[test]</code> infrastructure, and the tokio runtime
scheduler. Closer to the top we will find the async functions we actually want to look at:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span> 4: async_codegen::b::{{closure}}
</span><span> at ./src/lib.rs:10:5
</span><span> 5: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
</span><span> at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
</span><span> 6: async_codegen::a::{{closure}}
</span><span> at ./src/lib.rs:4:17
</span><span> 7: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
</span><span> at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
</span><span> 8: async_codegen::tests::test_stack::{{closure}}
</span><span> at ./src/lib.rs:19:29
</span><span> 9: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
</span><span> at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
</span><span> 10: <core::pin::Pin<P> as core::future::future::Future>::poll
</span><span> at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/future.rs:124:9
</span></code></pre>
<p>Every second frame is the same, a <code>from_generator::GenFuture<T></code>, which is not really that helpful.</p>
<p>There is a <a href="https://github.com/rust-lang/rust/issues/74779">Rust issue</a> about this.
A second <a href="https://github.com/rust-lang/rust/issues/65978#issuecomment-1289334054">related issue</a>
suggests to use <code>RUSTFLAGS="-Csymbol-mangling-version=v0"</code> to improve that stack
trace a little, so lets try that.</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span> 4: async_codegen::b::{closure#0}
</span><span> at ./src/lib.rs:10:5
</span><span> 5: <core::future::from_generator::GenFuture<async_codegen::b::{closure#0}> as core::future::future::Future>::poll
</span><span> at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
</span><span> 6: async_codegen::a::{closure#0}
</span><span> at ./src/lib.rs:4:17
</span><span> 7: <core::future::from_generator::GenFuture<async_codegen::a::{closure#0}> as core::future::future::Future>::poll
</span><span> at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
</span><span> 8: async_codegen::tests::test_stack::{closure#0}
</span><span> at ./src/lib.rs:19:29
</span><span> 9: <core::future::from_generator::GenFuture<async_codegen::tests::test_stack::{closure#0}> as core::future::future::Future>::poll
</span><span> at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/mod.rs:91:19
</span><span> 10: <core::pin::Pin<&mut core::future::from_generator::GenFuture<async_codegen::tests::test_stack::{closure#0}>> as core::future::future::Future>::poll
</span><span> at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/future/future.rs:124:9
</span></code></pre>
<p>There is a lot more detail, thats for sure. We can now see that generic argument to <code>GenFuture</code>.
However all that detail is redundant and not meaningful.</p>
<p>And what are all those <code>{closure#0}</code> things?</p>
<hr />
<p>We will start this journey by looking at what this <code>GetFuture</code> is. We find it
<a href="https://github.com/rust-lang/rust/blob/4603ac31b0655793a82f110f544dc1c6abc57bb7/library/core/src/future/mod.rs#L64">here in the <code>core</code> crate</a>.</p>
<p>Its definition is quite simple, as is its <code>impl Future</code>:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">GenFuture</span><span><T</span><span style="color:#61676ccc;">: </span><span>Generator<ResumeTy, Yield = ()>>(T)</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">impl</span><span><T</span><span style="color:#61676ccc;">: </span><span>Generator<ResumeTy, Yield = ()>> Future </span><span style="color:#fa6e32;">for </span><span style="color:#399ee6;">GenFuture</span><span><T> {
</span><span> </span><span style="color:#fa6e32;">type </span><span style="color:#399ee6;">Output </span><span style="color:#ed9366;">= </span><span>T</span><span style="color:#ed9366;">::</span><span>Return</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">poll</span><span>(</span><span style="color:#ff8f40;">self</span><span>: Pin<</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut Self</span><span>>, </span><span style="color:#ff8f40;">cx</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut </span><span>Context<'</span><span style="color:#ed9366;">_</span><span>>) </span><span style="color:#61676ccc;">-> </span><span>Poll<</span><span style="color:#fa6e32;">Self</span><span style="color:#ed9366;">::</span><span>Output> {
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// SAFETY: Safe because we're !Unpin + !Drop, and this is just a field projection.
</span><span> </span><span style="color:#fa6e32;">let</span><span> gen </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{ Pin</span><span style="color:#ed9366;">::</span><span>map_unchecked_mut(</span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#61676ccc;">, </span><span>|</span><span style="color:#ff8f40;">s</span><span>| </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> s</span><span style="color:#ed9366;">.</span><span style="color:#ff8f40;">0</span><span>) }</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// Resume the generator, turning the `&mut Context` into a `NonNull` raw pointer. The
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// `.await` lowering will safely cast that back to a `&mut Context`.
</span><span> </span><span style="color:#fa6e32;">match</span><span> gen</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">resume</span><span>(ResumeTy(NonNull</span><span style="color:#ed9366;">::</span><span>from(cx)</span><span style="color:#ed9366;">.</span><span>cast</span><span style="color:#ed9366;">::</span><span><Context<</span><span style="color:#fa6e32;">'static</span><span>>>())) {
</span><span> GeneratorState</span><span style="color:#ed9366;">::</span><span>Yielded(()) </span><span style="color:#ed9366;">=> </span><span>Poll</span><span style="color:#ed9366;">::</span><span>Pending</span><span style="color:#61676ccc;">,
</span><span> GeneratorState</span><span style="color:#ed9366;">::</span><span>Complete(x) </span><span style="color:#ed9366;">=> </span><span>Poll</span><span style="color:#ed9366;">::</span><span>Ready(x)</span><span style="color:#61676ccc;">,
</span><span> }
</span><span> }
</span><span>}
</span></code></pre>
<p>There is also a few helpers there, but we can look at them later.</p>
<p>What this, and the surrounding <code>from_generator</code> fn, tells us is that async
functions are based on generators internally.</p>
<p>What is a <code>Generator</code> then?</p>
<p>Generators are an unstable Rust feature that is documented in the
<a href="https://doc.rust-lang.org/unstable-book/language-features/generators.html">unstable book</a>.</p>
<p>Lets look at an abbreviated definition. The complete docs for the trait are
<a href="https://doc.rust-lang.org/core/ops/trait.Generator.html">here</a>.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub trait </span><span style="color:#399ee6;">Generator</span><span><R = ()> {
</span><span> </span><span style="color:#fa6e32;">type </span><span style="color:#399ee6;">Yield</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">type </span><span style="color:#399ee6;">Return</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">resume</span><span>(
</span><span> </span><span style="color:#ff8f40;">self</span><span>: Pin<</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut Self</span><span>>,
</span><span> </span><span style="color:#ff8f40;">arg</span><span style="color:#61676ccc;">:</span><span> R
</span><span> ) </span><span style="color:#61676ccc;">-> </span><span>GeneratorState<</span><span style="color:#fa6e32;">Self</span><span style="color:#ed9366;">::</span><span>Yield, </span><span style="color:#fa6e32;">Self</span><span style="color:#ed9366;">::</span><span>Return></span><span style="color:#61676ccc;">;
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">pub enum </span><span style="color:#399ee6;">GeneratorState</span><span><Y, R> {
</span><span> Yielded(Y)</span><span style="color:#61676ccc;">,
</span><span> Complete(R)</span><span style="color:#61676ccc;">,
</span><span>}
</span></code></pre>
<p>This is indeed very similar to futures, hence async functions are built on them.</p>
<hr />
<p>But how are async functions turned into generators? That is done in the compiler code
when transforming the AST (abstract syntax tree) of your Rust program into the
HIR (high-level intermediate representation).</p>
<p>The <a href="https://github.com/rust-lang/rust/blob/1286ee23e4e2dec8c1696d3d76c6b26d97bbcf82/compiler/rustc_ast_lowering/src/expr.rs#L566"><code>make_async_expr</code></a> function is responsible for turning an <code>async {}</code> block into code
similar to <code>std::future::from_generator(<generator>)</code>.
Immediately below is
<a href="https://github.com/rust-lang/rust/blob/1286ee23e4e2dec8c1696d3d76c6b26d97bbcf82/compiler/rustc_ast_lowering/src/expr.rs#L665"><code>lower_expr_await</code></a>.
That function turns an <code>await</code> into a loop that will <code>poll</code> the underlying future
and <code>yield</code> when it is <code>Poll::Pending</code>.</p>
<p>I would advise you to take a look at those functions. They are well documented
and quite understandable, even if you are not a compiler expert.</p>
<p>So now we know where exactly our <code>GenFuture</code> stack frames are coming from.</p>
<p>There is one missing piece though. Why do we have <code>{closure#0}</code> all over the place?</p>
<p>In a different part of the AST to HIR lowering step we will find the
<a href="https://github.com/rust-lang/rust/blob/75c239402c8fafc89246a26bd066d6ff647e3794/compiler/rustc_ast_lowering/src/item.rs#L1062"><code>lower_maybe_async_body</code></a> fn.</p>
<p>Its job is to transform a <code>async fn foo() {}</code> into a <code>fn foo() -> impl Future { async {} }</code>.
This function also calls into <code>make_async_expr</code> mentioned above, which then
further turns that <code>async {}</code> block into our generator. That generator is just
a special kind of closure internally in the compiler.</p>
<h1 id="can-we-do-better"><a class="anchor-link" href="#can-we-do-better" aria-label="Anchor link for: can-we-do-better">#</a>
Can we do better?</h1>
<p>Well that is the remaining question now. Is it possible to remove these
confusing and distracting stack frames? Is <code>GenFuture</code> really necessary?
The compiler turns our <code>async {}</code> block into an <code>impl Generator</code> by some magic.
and this <code>Generator</code> trait is <em>extremely</em> similar to the <code>Future</code> trait.
Can’t the compiler just, well… create a <code>impl Future</code> by that same magic somehow?</p>
<p>And what about this <code>{closure#0}</code>? Here, even though it is a bit ugly, I do
agree that the function that returns the lazy future is distinct from the
<em>actual</em> future body. I have blogged before how this can be confusing and
<a href="https://swatinem.de/blog/non-lazy-futures/">even dangerous</a> sometimes.
You can yourself create a non-lazy future that does some real work on <em>call</em>,
vs lazily on <code>poll</code>. The <em>call</em> has a “normal” fn name, and the <code>poll</code> has this
weird <code>{closure#0}</code> appended at the end.</p>
<p>Things can get even more complex if you add more explicit, or implicit closures
into the mix. Consider this snippet for example:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span>async </span><span style="color:#f07171;">do_tasks</span><span>(tasks</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span>[</span><span style="color:#fa6e32;">u32</span><span>]) </span><span style="color:#61676ccc;">-> </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span><</span><span style="color:#fa6e32;">u32</span><span>> {
</span><span> futures</span><span style="color:#ed9366;">::</span><span>future</span><span style="color:#ed9366;">::</span><span>join_all(tasks</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">iter</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(|</span><span style="color:#ff8f40;">num</span><span>| async { num }))</span><span style="color:#ed9366;">.</span><span>await
</span><span>}
</span></code></pre>
<p>That is one implicit closure for the outer <code>async fn</code>, one explicit closure for
the <code>map</code>, and a third implicit one for the <code>async</code> block. So this will show up as
<code>do_tasks::{closure#0}::{closure#0}::{closure#0}</code> in your stack trace.
Not particularly great, but it also reflects the reality when you peel away
the abstractions.</p>
<hr />
<p>So again, can we do any better? I’m actually intrigued to find out, and I will
spend some weekend coding time to dig deeper into how the compiler magic creates
<code>impl Generator</code> internally.</p>
<p>Similarly, it should be possible somehow to distinguish between <em>real</em> closures
and async constructs in the stack trace.
<code>do_tasks::{async-fn#0}::{closure#0}::{async-block#0}</code> does look a little nicer.</p>
Rustdoc doctests need fixing2022-10-28T00:00:00+00:002022-10-28T00:00:00+00:00
Unknown
https://swatinem.de/blog/fix-rustdoc/<p>Before going on a slight rant about why rustdoc / doctests are broken, I first
want to highlight that <strong>rustdoc / doctests are amazing !!!</strong></p>
<p>I believe that great documentation and great tooling is a major contributor to
Rusts success. And one part of that is rustdoc, and doctests.</p>
<p>The fact that you can write documentation and examples, and have those at the same
time be part of your testsuite is an extreme productivity booster on the one hand,
and equally valuable for potential library users on the other.
What makes this even better is the fact that your documentation and examples will
never go out of date because they are an integrated part of your testsuite.</p>
<h1 id="whats-wrong"><a class="anchor-link" href="#whats-wrong" aria-label="Anchor link for: whats-wrong">#</a>
Whats wrong?</h1>
<p>But if we look behind the curtain, we can see that one of the greatest features
of the Rust ecosystem does not look as pretty on the inside. Let us explore
some of the more gruesome sides of it. Maybe you will have the impression that
things are barely being held together with doc-tape, pun intended.</p>
<h2 id="the-compilation-model"><a class="anchor-link" href="#the-compilation-model" aria-label="Anchor link for: the-compilation-model">#</a>
The compilation model</h2>
<p>So how do rustdoc doctests work internally?</p>
<p>Rustdoc integrates tightly with the rust compiler, and as a first step it will
invoke the rust compiler in a limited capacity. Just enough to resolve <code>#[cfg]</code>
attributes and know which items there are and what you are <code>use</code>-ing.</p>
<p>Fun fact: Triple-slash comments are just syntactic sugar for <code>#[doc = "..."]</code>
attributes. Also, did you know that you can combine that with <code>cfg_attr</code> too?</p>
<p>Anyway. Now that rustc has resolved all the attributes, and rustdoc has collected
all the items it needs to document with their desugared doc attributes, it will
then collect individual doctests.</p>
<p>Then, it will do a <em>purely textual</em> transformation to create a small <code>main</code>
program for each of the doctests.</p>
<p>Next, each of these snippets will be compiled <em>individually</em> via separate <code>rustc</code>
invocations. Some secret environment variables are provided to <code>rustc</code> to try
to re-map line numbers as best as possible, though there are bugs.</p>
<p>Finally, the resulting executable will then be run, obviously, and deleted
immediately afterwards. Unless you pass the unstable <code>--persist-doctests</code> option.</p>
<p>This is not ideal.</p>
<p>People often criticize Rust for its slow compile times. Clearly those people have
never run <code>webpack</code> or the clang static analyzer in cross-translation-unit mode.</p>
<p>But the problem still stands. Rustdoc will compile <strong>and link</strong> each doctest as
an individual executable.</p>
<p>Cargo itself has a similar, but less severe problem as it will compile and link
individual executables for each integration test. Hence it is common knowledge
that you should <a href="https://matklad.github.io/2021/02/27/delete-cargo-integration-tests.html">delete (all but one) cargo integration tests</a>.
I have read previously that some bigger projects even have a "no doctests" policy,
though I can’t seem to find a linkable blog post for that. But the reason mentioned
there was also the unreasonable blowup in compilation and linking times.</p>
<h2 id="workspaces-files-and-line-numbers"><a class="anchor-link" href="#workspaces-files-and-line-numbers" aria-label="Anchor link for: workspaces-files-and-line-numbers">#</a>
Workspaces, files and line numbers</h2>
<p>To further highlight some of the problems with doctests,
I will use the following example workspace with three crates:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="font-style:italic;color:#abb0b6;">// # crate-a/src/lib.rs:
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">//! Crate A
</span><span style="font-style:italic;color:#abb0b6;">//!
</span><span style="font-style:italic;color:#abb0b6;">//! Some random docs
</span><span style="font-style:italic;color:#abb0b6;">//!
</span><span style="font-style:italic;color:#abb0b6;">//! ```
</span><span style="font-style:italic;color:#abb0b6;">//! assert_eq!("a" "b");
</span><span style="font-style:italic;color:#abb0b6;">//! // ^ crate-a line 6, and yes the typo is intentional ;-)
</span><span style="font-style:italic;color:#abb0b6;">//! ```
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// # crate-b/src/lib.rs:
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">//! Crate B
</span><span style="font-style:italic;color:#abb0b6;">//!
</span><span style="font-style:italic;color:#abb0b6;">//! # Examples
</span><span style="font-style:italic;color:#abb0b6;">//!
</span><span style="font-style:italic;color:#abb0b6;">//! ```
</span><span style="font-style:italic;color:#abb0b6;">//! assert_eq!(1, 2);
</span><span style="font-style:italic;color:#abb0b6;">//! // ^ crate-b line 6
</span><span style="font-style:italic;color:#abb0b6;">//! ```
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// # crate-c/src/lib.rs:
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">/// Says hellew
</span><span style="font-style:italic;color:#abb0b6;">///
</span><span style="font-style:italic;color:#abb0b6;">/// # Examples
</span><span style="font-style:italic;color:#abb0b6;">///
</span><span style="font-style:italic;color:#abb0b6;">/// ```
</span><span style="font-style:italic;color:#abb0b6;">/// crate_c::hellew();
</span><span style="font-style:italic;color:#abb0b6;">/// ```
</span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">hellew</span><span>() {
</span><span> ( </span><span style="font-style:italic;color:#abb0b6;">// <- intentional typo
</span><span>}
</span></code></pre>
<p>The examples I chose all have different kinds of errors in them, lets see them
in action.</p>
<p>First, <code>crate-c</code> has a typo in its Rust source:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>> cargo test --doc -p doctest-c
</span><span> Compiling doctest-c v0.1.0 (/home/swatinem/Coding/swatinem.de/playground/doctest-c)
</span><span>error: mismatched closing delimiter: `}`
</span><span> --> playground/doctest-c/src/lib.rs:9:5
</span><span> |
</span><span>8 | pub fn hellew() {
</span><span> | - closing delimiter possibly meant for this
</span><span>9 | ( // <- intentional typo
</span><span> | ^ unclosed delimiter
</span><span>10 | }
</span><span> | ^ mismatched closing delimiter
</span><span>
</span><span>error: could not compile `doctest-c` due to previous error
</span></code></pre>
<p>As we have discussed, doctests link to the underlying Rust library. So cargo
will first try to compile that and fail. In this case rustdoc is not even being
invoked. Moving on.</p>
<hr />
<p>Next up, lets compile <code>crate-a</code> which has a typo in the doctest:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>> cargo test --doc -p doctest-a
</span><span> Doc-tests doctest-a
</span><span>
</span><span>running 1 test
</span><span>test src/lib.rs - (line 5) ... FAILED
</span><span>
</span><span>failures:
</span><span>
</span><span>---- src/lib.rs - (line 5) stdout ----
</span><span>error: no rules expected the token `"b"`
</span><span> --> src/lib.rs:6:16
</span><span> |
</span><span>3 | assert_eq!("a" "b");
</span><span> | -^^^ no rules expected this token in macro call
</span><span> | |
</span><span> | help: missing comma here
</span><span>
</span><span>error: aborting due to previous error
</span><span>
</span><span>Couldn't compile the test.
</span><span>
</span><span>failures:
</span><span> src/lib.rs - (line 5)
</span><span>
</span><span>test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s
</span></code></pre>
<p>So far so good, we ran some tests which eventually failed.</p>
<p><code>test src/lib.rs</code>, okay. I have a workspace with multiple crates.
Which <code>src/lib.rs</code> are you talking about exactly?</p>
<p>The source location is also not quite exact. Line <code>6</code> is good enough, but column
<code>16</code> is a bit off. Off by <code>4</code>, or <code>"//! ".len()</code> to be exact. But okay, I can
live with that.</p>
<p>But the provided source snippet says line <code>3</code>? Where is that coming from?</p>
<hr />
<p>Lets look at the third example, <code>crate-b</code> which should compile and fail at runtime.</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>> cargo test --doc -p doctest-b
</span><span> Doc-tests doctest-b
</span><span>
</span><span>running 1 test
</span><span>test src/lib.rs - (line 5) ... FAILED
</span><span>
</span><span>failures:
</span><span>
</span><span>---- src/lib.rs - (line 5) stdout ----
</span><span>Test executable failed (exit status: 101).
</span><span>
</span><span>stderr:
</span><span>thread 'main' panicked at 'assertion failed: `(left == right)`
</span><span> left: `1`,
</span><span> right: `2`', src/lib.rs:3:1
</span></code></pre>
<p>The doctest (beginning on line <code>5</code>) panicked in file <code>src/lib.rs</code> on line <code>3</code>.
Okay? This ominous line <code>3</code> again.</p>
<h2 id="lets-go-nightly"><a class="anchor-link" href="#lets-go-nightly" aria-label="Anchor link for: lets-go-nightly">#</a>
Lets go nightly</h2>
<p>Rustdoc and cargo have some unstable nightly-only options that can help a little
bit with the encountered problems.</p>
<p>I originally implemented these options to help with better code coverage reports.
The <code>-C instrument-coverage</code> option has been stabilized by now. But in order to
create code coverage reports you need the unstable <code>--persist-doctests</code> rustdoc
option.</p>
<p>Running with code coverage manually is quite a complicated procedure, though at
least it is <a href="https://doc.rust-lang.org/rustc/instrument-coverage.html">well documented</a>,
including instructions on how to use it with rustdoc.</p>
<p>Luckily there is <code>cargo-llvm-cov</code> which makes this a lot more pleasant.
Though it has <a href="https://github.com/taiki-e/cargo-llvm-cov/issues/2">limited support for doctests</a> for
reasons.</p>
<p>To demonstrate the problem with code coverage, I will invoke all the necessary
tools manually.</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>> RUSTFLAGS="-C instrument-coverage" \
</span><span> RUSTDOCFLAGS="-C instrument-coverage -Z unstable-options --persist-doctests doctestbins" \
</span><span> LLVM_PROFILE_FILE="doctests.profraw" \
</span><span> cargo +nightly test --doc -p doctest-b
</span><span>
</span><span>[…] same output as before
</span></code></pre>
<p>I end up with a <code>playground/doctest-b/doctestbins/src_lib_rs_5_0/rust_out</code> executable, and
the profiler output in <code>playground/doctest-b/doctests.profraw</code>. Note that both these
files ended up in the crate directory, more on that later.</p>
<p>Next up, creating the coverage report:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>> llvm-profdata merge -sparse doctest-b/doctests.profraw -o doctest-b/doctests.profdata
</span><span>> llvm-cov show --object doctest-b/doctestbins/src_lib_rs_5_0/rust_out --instr-profile doctest-b/doctests.profdata
</span><span> 1| |//! Crate B
</span><span> 2| |//!
</span><span> 3| |//! # Examples
</span><span> 4| |//!
</span><span> 5| 1|//! ```
</span><span> 6| 1|//! assert_eq!(1, 2);
</span><span> 7| 1|//! // ^ crate-b line 6
</span><span> 8| 1|//! ```
</span></code></pre>
<p>So far so good. <code>llvm-cov report --summary-only</code> will also print full file names
and reveals to me that I am dealing with a full absolute path.</p>
<hr />
<p>Now that we have briefly looked at code coverage, lets revisit the earlier
examples and use the unstable <code>-Z doctest-in-workspace</code> cargo flag, which
internally passes <code>--test-run-directory</code> to rustdoc.</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>> cargo +nightly test --doc -p doctest-a -Z doctest-in-workspace
</span><span> Doc-tests doctest-a
</span><span>
</span><span>running 1 test
</span><span>test playground/doctest-a/src/lib.rs - (line 5) ... FAILED
</span><span>
</span><span>failures:
</span><span>
</span><span>---- playground/doctest-a/src/lib.rs - (line 5) stdout ----
</span><span>error: no rules expected the token `"b"`
</span><span> --> playground/doctest-a/src/lib.rs:6:16
</span><span> |
</span><span>3 | assert_eq!("a" "b");
</span><span> | -^^^ no rules expected this token in macro call
</span><span> | |
</span><span> | help: missing comma here
</span><span>
</span><span>error: aborting due to previous error
</span><span>
</span><span>Couldn't compile the test.
</span><span>
</span><span>failures:
</span><span> playground/doctest-a/src/lib.rs - (line 5)
</span></code></pre>
<p>Nice, now I know which exact file is failing, instead of having to look at the
<code>Doc-tests</code> header.</p>
<p>The line/column numbers are still slightly off though.</p>
<hr />
<p>The failing doctest:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>> cargo +nightly test --doc -p doctest-b -Z doctest-in-workspace
</span><span> Doc-tests doctest-b
</span><span>
</span><span>running 1 test
</span><span>test playground/doctest-b/src/lib.rs - (line 5) ... FAILED
</span><span>
</span><span>failures:
</span><span>
</span><span>---- playground/doctest-b/src/lib.rs - (line 5) stdout ----
</span><span>Test executable failed (exit status: 101).
</span><span>
</span><span>stderr:
</span><span>thread 'main' panicked at 'assertion failed: `(left == right)`
</span><span> left: `1`,
</span><span> right: `2`', playground/doctest-b/src/lib.rs:3:1
</span></code></pre>
<p>Same here. I get better workspace-relative filenames, similar to other kinds of
tests. But again, the line number is off.</p>
<hr />
<p>To my own surprise, there is no change when running code coverage tests.
In both cases the llvm tools report full absolute paths.</p>
<p>Maybe things have improved here. I remember there were similar issue as with the
cargo output, as I developed the <code>doctest-in-workspace</code> option specifically with
code coverage in mind. Or maybe my example was too simplistic and I would have
needed to have multiple doctests from multiple workspace crates merged into a
single code coverage report.</p>
<h1 id="where-do-we-go-from-here"><a class="anchor-link" href="#where-do-we-go-from-here" aria-label="Anchor link for: where-do-we-go-from-here">#</a>
Where do we go from here?</h1>
<p>Well, I initially got the urge to write this blog post as I
<a href="https://github.com/rust-lang/rust/pull/103682">opened a PR</a> today to stabilize
<code>rustdoc --test-run-directory</code>, which itself is just an implementation detail
for <code>cargo --doctest-in-workspace</code> which is what I actually
<a href="https://github.com/rust-lang/cargo/issues/9427">want to stabilize</a>.</p>
<p>I hope I have demonstrated with these examples here that <code>cargo --doctest-in-workspace</code>
is a nice thing to have. And to even make it the default eventually.</p>
<p>But <code>rustdoc --test-run-directory</code>? Not so sure. This feels like more doc-tape
piled on the already way too brittle doctest infrastructure.</p>
<p><strong>Rustdoc doctests need an overhaul.</strong></p>
<p>Instead of a testsuite driven by rustdoc that compiles, links and runs each
doctest individually, we should rather have rustdoc
<a href="https://github.com/rust-lang/rust/issues/75341">output a single binary</a>
with a testsuite.</p>
<p>Decouple the compilation of doctests from how they run, and have cargo control
the whole process. That way it would better match the way rustc and other kinds
of tests are being handled.</p>
<p>It should <a href="https://github.com/rust-lang/rust/issues/56232">integrate with check/clippy</a>.
With more sophisticated source location tracking, we could have better lines/column
numbers in error messages like above,
in <a href="https://github.com/rust-lang/rust/issues/79417">code coverage reports</a>, or
even in <a href="https://github.com/rust-lang/rust/issues/81070"><code>#[doc = include_str!(...)]</code></a>.</p>
<p>With a well generated test harness, we could also have a usable
<a href="https://github.com/rust-lang/rust/issues/98550"><code>--nocapture</code></a>.</p>
<p>Last but not least, it could lead to
<a href="https://github.com/nextest-rs/nextest/issues/16">better integration with nextest</a>
as well.</p>
<hr />
<p>In the end, rustdoc is still an amazing tool, and doctests an amazing concept.</p>
<p>But there are some mighty skeletons lurking in the closet. I have looked into
the belly of the beast and I can say that, sadly, I don’t have the endurance to
see such a transformation through. I’m even exhausted after proposing my
stabilization PR and writing this blog post.</p>
<p>I do hope that someone will tackle this eventually. As I mentioned in the beginning,
documentation and great tooling are a big driver for Rusts continued success,
and I am looking forward to seeing things improve over time.</p>
Inspiration2022-10-27T00:00:00+00:002022-10-27T00:00:00+00:00
Unknown
https://swatinem.de/blog/inspiration/<p>In my opinion, one of the most important, but also very underappreciated skills
is to think outside the box. Or put differently, to challenge the status quo
and re-think some deeply rooted thought patterns.</p>
<blockquote>
<p>We have always done things this way</p>
</blockquote>
<p>… is the worst of arguments to do things a certain way.</p>
<p>I recently stumbled upon a <a href="https://twitter.com/elonmusk/status/1584817409651007488">quote</a>
thats being attributed to Elon Musk (who himself replied to the tweet so it might have been him):</p>
<blockquote>
<p>Innovation comes from questioning the way things have been done before.</p>
</blockquote>
<p>Another great inspiration comes from John Carmack:</p>
<div class="video" >
<iframe
src="https://www.youtube-nocookie.com/embed/YOZnqjHkULc"
webkitallowfullscreen
mozallowfullscreen
allowfullscreen
></iframe>
</div>
<blockquote>
<p>Many times things are the way they are for important and valid historical reasons.</p>
<p>But sometimes things are the way they are because we just didn’t know any better.</p>
<p>Or, because we didn’t have time to actually make something good.</p>
<p>[…]</p>
<p>Many times things that might even have been optimal originally no longer are,
and there are better ways to do things.</p>
<p>So in many areas it’s almost perceived wisdom that you shouldn’t reinvent the wheel.</p>
<p>But I would urge you to occasionally try anyways.</p>
<p>You will be better for the effort, and this is how eventually we get better wheels.
People just going ahead and trying.</p>
</blockquote>
<hr />
<p>Change is hard. Overcoming the resistance to change might be the hardest thing.</p>
<p>So what are some of the things you would change, but which seem impossible to
do so, just because it sounds like insanity to even think differently?</p>
<p>For me, one societal thing might be the classical view of weeks with weekends.
I have been <a href="https://swatinem.de/blog/balanced-weeks/">daydreaming about this before</a>.</p>
<p>On the technical side, maybe the C compilation model and fundamentals of how
programs work and interact. But I don’t have any better ideas either.</p>
<p>Not changing things for the sake of backwards compatibility has its value. But
it also has a cost. Can we quantify that somehow?</p>
Non-abbreviated Abbreviations2022-09-19T00:00:00+00:002022-09-19T00:00:00+00:00
Unknown
https://swatinem.de/blog/abbreviations/<p>I have recently investigated a very interesting performance problem in Sentrys
symbolication infrastructure.</p>
<p>We got reports of an increasing number of out-of-memory situations of our infrastructure.
This started rather randomly, and was not correlated to any deploys. I had the
hunch that it might be related to some new form of data that customers were
throwing at us.</p>
<p>And indeed, after some time, we were able to track this down a customer project
that was fetching gigantic debug files from a custom symbol source. These files
were on the order of <strong>5.5 G</strong> in size, which made them the largest valid debug
files I have seen thus far.</p>
<p>The next step was reproducing the issue locally. Sure enough, processing this
debug file took an unreasonably long amount of time, and a whooping <strong>18 G</strong> in
peak memory usage. Attaching a profiler to that long running process revealed that
it was indeed spending a large portion of its running time in <em>dropping</em> <code>gimli::Abbreviations</code>.</p>
<p>Looking through the <code>gimli</code> code, it became clear that none of the involved types
had custom <code>Drop</code> implementations, but some nested <code>Vec</code>s. This could potentially
explain the large memory usage, and with that comes the runtime for allocations
and so on.</p>
<p>But where do all the <code>Abbreviations</code> come from?</p>
<h1 id="what-are-abbreviations"><a class="anchor-link" href="#what-are-abbreviations" aria-label="Anchor link for: what-are-abbreviations">#</a>
What are Abbreviations</h1>
<p>Put simply, abbreviations in DWARF describe the schema, or the blueprint of
debug information entries (DIE).
This schema describes the type of DIEs and its attributes and children. The DIE
itself then just has a code referring to its abbreviation / schema, and then just
the raw contents of its attributes and children.</p>
<p>These abbreviations are meant to be reused a lot. There can be more than one list
of abbreviations. A compilation unit (CU) can refer to its abbreviations list via an
offset, and the DIE code is just the index in this list.</p>
<p>Depending on the linker, you can end up with a single global abbreviations list,
or with smaller lists, one for each CU.
Unfortunately most linkers are quite dumb, they mostly just concatenate
raw bytes and patch up some offset here and there. Optimizing and deduplicating
DWARF data is complex and slow after all.</p>
<p>Turns out, the notoriously slow MacOS <code>dsymutil</code> actually does some DWARF optimization.
Specially to our case here, it does merge and deduplicate all the abbreviations.</p>
<h1 id="the-gimli-problem"><a class="anchor-link" href="#the-gimli-problem" aria-label="Anchor link for: the-gimli-problem">#</a>
The <code>gimli</code> Problem</h1>
<p>Getting back to my investigation, I found out that the crate we use for DWARF handling,
<code>gimli</code>, was rather optimized for the case where each CU has its own abbreviations.
each CU would parse (and allocate) the abbreviations it was referring to.
Abbreviations were not shared across CUs as is the case with smarter linkers.
So far this was never a problem because we were dealing with relatively "small"
files. But the file I was looking at was gigantic, and had a huge number of CUs.
doing all this duplicate work and memory allocations for each CU got very expensive
very quickly.</p>
<p>I reported a problem in detail and also suggested a PR to fix it. There is also
<a href="https://github.com/gimli-rs/gimli/pull/628">another PR</a> that offers the same
benefits with a bit simpler API.</p>
<h1 id="doing-some-more-tests"><a class="anchor-link" href="#doing-some-more-tests" aria-label="Anchor link for: doing-some-more-tests">#</a>
Doing some more tests</h1>
<p>Okay, lets look at some more examples and run some tests.</p>
<p>A well known and public project with very large debug files is Electron. You can
download these debug files from <a href="https://github.com/electron/electron/releases/tag/v20.1.4">https://github.com/electron/electron/releases/tag/v20.1.4</a>
if you are interested to reproduce my experiments.</p>
<p>I have downloaded the MacOS x64 and Linux x64 debug files. The Linux debug file
in the electron archive additionally has zlib compressed debug sections.
We can unpack those ahead of time using <code>llvm-objcopy --decompress-debug-sections electron.debug</code>.</p>
<p>Using <code>llvm-objdump --section-headers</code> we can look at both the MacOS and Linux
debug files.</p>
<p>The relevant line for the MacOS symbol is: <code>__debug_abbrev 00004e61</code>, and for
Linux it is <code>.debug_abbrev 02df0539</code>. Or written as decimal, the MacOS
abbreviations are feathery light with only about 20K, whereas the linux file
has a whooping 50M of abbreviations.</p>
<p>We can also count these with a bit of shell magic: <code>llvm-dwarfdump --debug-abbrev electron.dsym | grep DW_TAG | wc -l</code>.
There is <code>1_112</code> for MacOS, and <code>3_475_269</code> for Linux. That is a lot.</p>
<p>And how expensive is having redundant abbreviations in the raw data, vs redundant
parsing, vs deduplicated parsing?</p>
<p>I wrote a very simple test that just iterates over all of the CUs in a file:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span> </span><span style="color:#fa6e32;">let mut</span><span> units </span><span style="color:#ed9366;">=</span><span> dwarf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">units</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">while let </span><span style="font-style:italic;color:#55b4d4;">Some</span><span>(header) </span><span style="color:#ed9366;">=</span><span> units</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">next</span><span>()</span><span style="color:#ed9366;">? </span><span>{
</span><span> </span><span style="color:#fa6e32;">let</span><span> _unit </span><span style="color:#ed9366;">=</span><span> dwarf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unit</span><span>(header)</span><span style="color:#ed9366;">?</span><span style="color:#61676ccc;">;
</span><span> }
</span></code></pre>
<p>Running this code on the published version of gimli, vs the PR linked above
gives me the following change for the Linux file that has one abbreviation per CU:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>Benchmark #1: ./gimli-0.26.2 ./tests/abbrevs/electron.debug
</span><span> Time (mean ± σ): 852.6 ms ± 23.9 ms [User: 800.5 ms, System: 52.6 ms]
</span><span> Range (min … max): 820.0 ms … 879.8 ms 20 runs
</span><span>
</span><span>Benchmark #2: ./gimli-patched-626 ./tests/abbrevs/electron.debug
</span><span> Time (mean ± σ): 860.2 ms ± 28.1 ms [User: 801.0 ms, System: 52.8 ms]
</span><span> Range (min … max): 819.4 ms … 916.0 ms 20 runs
</span></code></pre>
<p>A tiny regression from more indirection. Lets try the same for the deduplicated MacOS DWARF:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>Benchmark #3: ./gimli-0.26.2 ./tests/abbrevs/electron.dsym
</span><span> Time (mean ± σ): 4.780 s ± 0.052 s [User: 4.705 s, System: 0.066 s]
</span><span> Range (min … max): 4.719 s … 4.874 s 20 runs
</span><span>
</span><span>Benchmark #4: ./gimli-patched-626 ./tests/abbrevs/electron.dsym
</span><span> Time (mean ± σ): 225.1 ms ± 1.8 ms [User: 185.5 ms, System: 36.3 ms]
</span><span> Range (min … max): 219.6 ms … 230.7 ms 20 runs
</span></code></pre>
<p>That is a <em>huge</em> difference right there.</p>
<p>And finally with the original customer debug file I investigated:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>Benchmark #5: ./gimli-0.26.2 ./tests/abbrevs/giant
</span><span> Time (mean ± σ): 105.556 s ± 1.733 s [User: 104.921 s, System: 0.514 s]
</span><span> Range (min … max): 102.769 s … 108.016 s 10 runs
</span><span>
</span><span>Benchmark #6: ./gimli-patched-626 ./tests/abbrevs/giant
</span><span> Time (mean ± σ): 760.7 ms ± 28.7 ms [User: 647.5 ms, System: 104.7 ms]
</span><span> Range (min … max): 725.7 ms … 814.6 ms 10 runs
</span></code></pre>
<p>That is indeed a night and day difference.</p>
<hr />
<p>In both cases, this makes a two orders of magnitude difference. However the example
only tests an extremely limited part of our DWARF processing.</p>
<p>But it does highlight how important it is to cache redundant computations, as well
as to deduplicate the raw data in the first place.</p>
A deep dive into Portable PDB Sequence Points2022-09-02T00:00:00+00:002022-09-02T00:00:00+00:00
Unknown
https://swatinem.de/blog/sequence-points/<p>Following up my last post about SourceMaps, this one here is about Portable PDB
Sequence Points.</p>
<p>Only took me about a month to procrastinate ;-)</p>
<h1 id="sequence-points-abstractly"><a class="anchor-link" href="#sequence-points-abstractly" aria-label="Anchor link for: sequence-points-abstractly">#</a>
Sequence Points, abstractly</h1>
<p>Similar to SourceMaps and other debug formats, the sequence points allow
mapping from IL offsets to source information.</p>
<p>The Portable PDB Format is specified in a
<a href="https://github.com/dotnet/runtime/blob/main/docs/design/specs/PortablePdb-Metadata.md">markdown document here</a>
and is complementary to the main
<a href="https://www.ecma-international.org/publications-and-standards/standards/ecma-335/">ECMA-335 specification</a>
that is available in PDF format.</p>
<p>In particular, Portable PDB defines a new <code>#Pdb</code> stream, a bunch of new tables
contained in the <code>#~</code> stream, as well as new Blob formats that are within the
<code>#Blob</code> heap.</p>
<p>Section <code>II.23.2</code> of the main <code>ECMA-335</code> spec describes a very specific way to
save compressed integers that does not look very familiar, and comes with the
tradeoff of only allowing at most <code>29</code> usable bits.</p>
<ul>
<li><code>0b0xxx_xxxx</code>: 7 usable bits encoded as 1 byte.</li>
<li><code>0b10xx_xxxx 0bxxxx_xxxx</code>: 14 usable bits encoded as 2 bytes.</li>
<li><code>0b110x_xxxx 0bxxxx_xxxx 0bxxxx_xxxx 0bxxxx_xxxx</code>: 29 usable bits encoded as 4 bytes.</li>
</ul>
<p>The encoding is using big endian byte order, and the signed encoding is using
rotation to move the sign bit into the last position.</p>
<p>One of the additional tables defined in the Portable PDB spec is the
<code>MethodDebugInformation</code> which references a blob in <code>#Blob</code> heap containing
sequence points. The <code>MethodDebugInformation</code> and the sequence points blob can
also reference source files in the <code>Document</code> table.</p>
<p>These sequence points have the following information:</p>
<ul>
<li>the start IL offset,</li>
<li>the document,</li>
<li>the start line / column,</li>
<li>the end line / column.</li>
</ul>
<p>There is a bunch of things to note here:</p>
<ul>
<li>Only the start IL offset is explicitly given, so similar to SourceMaps, each
sequence point implicitly extends to the next one.</li>
<li>There are also "hidden" sequence points, probably to denote gaps in the mappings.</li>
<li>One specialty here is that the sequence points do not give a <em>position</em> in the
source code, but rather a <em>span</em>.</li>
</ul>
<h1 id="state-machine"><a class="anchor-link" href="#state-machine" aria-label="Anchor link for: state-machine">#</a>
State Machine</h1>
<p>Similar to the other mapping formats, the sequence points blob also acts as a
state machine.</p>
<p>You have some mutable state, and have instructions and deltas that modify that
state.</p>
<p>In this case, we start out with a document, and the blob can have an instruction
that changes that document. The IL offset, line and column are also given as a delta to
the previous record. And the source span is also delta-encoded.</p>
<p>The encoding is further complicated by the fact that either signed or unsigned
encoding is used based on some condition. For example, the column delta is
unsigned in case the source span does not span multiple lines. It is signed
otherwise. This totally makes sense, as a source span should never go backwards.
But it does add complexity to the decoder / encoder.</p>
<h1 id="decoding-a-mapping"><a class="anchor-link" href="#decoding-a-mapping" aria-label="Anchor link for: decoding-a-mapping">#</a>
Decoding a mapping</h1>
<p>As an exercise, lets try to decode the following blob, and walk through the
bytes one by one.</p>
<p>Our initial state machine starts out at all <code>0</code> values.</p>
<pre data-lang="text" style="background-color:#fafafa;color:#61676c;" class="language-text "><code class="language-text" data-lang="text"><span>blob: 00 00 18 2e 09 06 00 12 04 08 06 00 01 02 79
</span><span>
</span><span>0x00: add 0 to the IL offset
</span><span>0x00: set source span line delta to 0
</span><span>0x18: set source span column delta to 24
</span><span>0x2e: add 46 to the start line, unsigned for the first entry
</span><span>0x09: add 9 to the start column, unsigned for the first entry
</span><span>- Sequence Point: { il_offset: 0, source_span: [46:9 - 46:33] }
</span><span>0x06: add 6 to the IL offset
</span><span>0x00: set source span line delta to 0
</span><span>0x12: set source span column delta to 18
</span><span>0x04: add 2 to the start line, signed
</span><span>0x08: add 4 to the start column, signed
</span><span>- Sequence Point: { il_offset: 6, source_span: [48:13 - 48:31] }
</span><span>0x06: add 6 to the IL offset
</span><span>0x00: set source span line delta to 0
</span><span>0x01: set source span column delta to 1
</span><span>0x02: add 1 to the start line, signed
</span><span>0x79 (0b0111_1001, 0b1111_1100 rotated): subtract 4 from the start column
</span><span>- Sequence Point: { il_offset: 12, source_span: [49:9 - 49:10] }
</span></code></pre>
<p>Mind you, this was a very simple (but real-life) example. We did not have any
hidden sequence points, document changes or source spans that span multiple lines.
But it did highlight how parsing the sequence points blob work, and also
that we can get along with 5 bytes per sequence point for simple cases. Not bad.</p>
<h1 id="how-to-use-these-mappings"><a class="anchor-link" href="#how-to-use-these-mappings" aria-label="Anchor link for: how-to-use-these-mappings">#</a>
How to use these mappings</h1>
<p>So how do we make use of these mappings?</p>
<p>Assuming we have a "normal" .NET runtime, we can get the IL offset trivially via the
<a href="https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.stackframe.getiloffset"><code>StackFrame.GetILOffset</code></a>
method. However, what might not be entirely obvious from our look at the format
so far is that the IL offset is <em>per method</em>.</p>
<p>Getting the method index is not particularly obvious or well documented.
Starting from the
<a href="https://docs.microsoft.com/en-us/dotnet/api/system.diagnostics.stackframe.getmethod"><code>Method</code></a>
of a <code>StackFrame</code>, we can access the
<a href="https://docs.microsoft.com/en-us/dotnet/api/system.reflection.memberinfo.metadatatoken"><code>MetadataToken</code></a>.</p>
<p>Section <code>II.22</code> of the <code>ECMA-335</code> spec says how to interpret this:</p>
<blockquote>
<p>Uncoded metadata tokens are 4-byte unsigned integers, which contain the metadata
table index in the most significant byte and a 1-based record index in the three least-significant bytes.</p>
</blockquote>
<p>The table index for <code>MethodDef</code>s is <code>0x06</code> which we can assert, and the rest
is the method index that also corresponds to the index inside our <code>MethodDebugInformation</code>
table.</p>
<p>And there you have it. With these two pieces of information, we can resolve a
<code>StackFrame</code> to its source location, or even source span.</p>
<h1 id="the-elephant-in-the-room"><a class="anchor-link" href="#the-elephant-in-the-room" aria-label="Anchor link for: the-elephant-in-the-room">#</a>
The elephant in the room</h1>
<p>What is missing now is actually finding the Portable PDB file.</p>
<p>The PDB file has a self-describing UUID inside its <code>#Pdb</code> stream. And the
corresponding executable file has a special <code>CodeView</code> record that is slightly
different from normal <code>CodeView</code> records though.
The difference is documented in
<a href="https://github.com/dotnet/runtime/blob/main/docs/design/specs/PE-COFF.md#codeview-debug-directory-entry-type-2">this specification</a>
though. I have previously written about some
<a href="https://swatinem.de/blog/format-ossification/">pitfalls related to CodeView records</a> btw.</p>
<p>Either way, getting the <code>CodeView</code> record and thus the UUID at runtime is not
trivial. It requires reading that record directly from the PE file via the
<a href="https://docs.microsoft.com/en-us/dotnet/api/system.reflection.portableexecutable.pereader"><code>PEReader</code></a>
class. Creating a file stream to access that file from disk might not always
be possible. Neither is getting a hold of the memory region where the PE file
might already be mapped at.</p>
<p>This is still an unsolved problem right now unfortunately. Though even ahead-of-time
compiled mobile apps ship the PE files in their app bundles. Most likely for
the embedded runtime metadata. Which makes me hopeful that we can access those
at runtime somehow and close the loop.</p>
<h1 id="summary"><a class="anchor-link" href="#summary" aria-label="Anchor link for: summary">#</a>
Summary</h1>
<p>We took a deep dive into the Portable PDB format, and we learned a bunch of
things about it:</p>
<ul>
<li>Portable PDBs extend the <code>ECMA-335</code> format. Both are reasonably well documented.</li>
<li>The PDB has a list of <code>Document</code>s and <code>MethodDebugInformation</code> with sequence points.</li>
<li>The sequence points blob forms a state machine that yields sequence points.</li>
<li>These sequence points have an IL offset, a document and source span.</li>
<li>You can get the IL offset and the method index at runtime fairly easily.</li>
</ul>
<p>The Portable PDB does not include the data needed to pretty print function
signatures. That is embedded in the <code>ECMA-335</code> metadata of the executable file.</p>
<p>Speaking of which, the executable also has a reference to the Portable PDB via
its UUID. But that is not readily available at runtime.</p>
<hr />
<p>Two down, one to go. I previously explained SourceMaps in detail, and Portable
PDB Sequence Points today. I plan to take a look at more formats
in future posts, so look out for:</p>
<ul>
<li>DWARF line programs</li>
<li>PDB line programs</li>
</ul>
A deep dive into SourceMaps2022-08-08T00:00:00+00:002022-08-08T00:00:00+00:00
Unknown
https://swatinem.de/blog/sourcemaps/<p>In my last post I committed to the idea of doing a deep dive series into a couple
of debug formats, or more specifically, how their <em>line mappings</em> / <em>line programs</em> work.</p>
<p>To start things off, we will be learning how the SourceMap <code>mappings</code> work.</p>
<h1 id="sourcemaps-abstractly"><a class="anchor-link" href="#sourcemaps-abstractly" aria-label="Anchor link for: sourcemaps-abstractly">#</a>
SourceMaps, abstractly</h1>
<p>For people not familiar with the matter, SourceMaps are a building block used in
the JavaScript ecosystem. They are used to map from a location in the "final"
(minified, transpiled) JavaScript code back to the original source, which might
not even be JavaScript.</p>
<p>The SourceMap <em>specification</em> lives in a
<a href="https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiOFze0b-_2gc6fAH0KY0k/edit#">Google Doc</a>
which I would argue is a weird format, but adequate to understand how to interpret it.</p>
<p>Here is an example of some minified JS, plus its corresponding SourceMap.</p>
<!-- prettier-ignore -->
<pre data-lang="js" style="background-color:#fafafa;color:#61676c;" class="language-js "><code class="language-js" data-lang="js"><span style="color:#fa6e32;">function </span><span style="color:#f29718;">t</span><span>(){}</span><span style="color:#fa6e32;">export default </span><span>t</span><span style="color:#61676ccc;">;
</span></code></pre>
<pre data-lang="json" style="background-color:#fafafa;color:#61676c;" class="language-json "><code class="language-json" data-lang="json"><span>{
</span><span> </span><span style="color:#86b300;">"version"</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">3</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#86b300;">"names"</span><span style="color:#61676ccc;">: </span><span>[</span><span style="color:#86b300;">"abcd"</span><span>]</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#86b300;">"sources"</span><span style="color:#61676ccc;">: </span><span>[</span><span style="color:#86b300;">"tests/fixtures/simple/original.js"</span><span>]</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#86b300;">"sourcesContent"</span><span style="color:#61676ccc;">: </span><span>[
</span><span> </span><span style="color:#86b300;">"// ./node_modules/.bin/terser -c -m --module tests/fixtures/simple/original.js --source-map includeSources -o tests/fixtures/simple/minified.js</span><span style="color:#4cbf99;">\n</span><span style="color:#86b300;">function abcd() {}</span><span style="color:#4cbf99;">\n</span><span style="color:#86b300;">export default abcd;</span><span style="color:#4cbf99;">\n</span><span style="color:#86b300;">"
</span><span> ]</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#86b300;">"mappings"</span><span style="color:#61676ccc;">: </span><span style="color:#86b300;">"AACA,SAASA,oBACMA"
</span><span>}
</span></code></pre>
<p>As you can see, the SourceMap is a human readable JSON file. It has a list of filenames in <code>sources</code>, and optionally
their contents in <code>sourcesContent</code>. We also have a list of <code>names</code>, which is used
to refer to original non obfuscated identifiers.</p>
<p>And then we have the <code>mappings</code> we want to look at in more detail. As they are
embedded in a JSON file, we have some restrictions on the type of data we can
put here. We can’t use plain binary data directly. SourceMaps thus use an ASCII
friendly base-64 <a href="https://en.wikipedia.org/wiki/Variable-length_quantity">Variable-length quantity (VLQ)</a> encoding for this purpose.</p>
<h1 id="state-machines"><a class="anchor-link" href="#state-machines" aria-label="Anchor link for: state-machines">#</a>
State Machines</h1>
<p>The <code>mappings</code> do not contain individual entries, but rather operate on a <em>state machine</em>.
This means you have to keep some internal state around, which is being incrementally updated
by <em>instructions</em> or <em>deltas</em> from the <code>mappings</code>.
Every now and then this state is then flushed out and represents a concrete mapping entry.</p>
<p>One of such entries, called a <code>Token</code> in SourceMap terminology, can have the following:</p>
<ul>
<li>the "minified" line number,</li>
<li>the "minified" column number, encoded as delta,</li>
<li>(optionally), an index into the <code>sources</code>, encoded as delta,</li>
<li>(optionally), the line and column, encoded as delta,</li>
<li>(optionally), an index into the <code>names</code>, encoded as delta.</li>
</ul>
<p>There are two special "instructions" for the state machine:</p>
<ul>
<li><code>';'</code> increases the "minified" line number by 1, and resets the "minified" column back to <code>0</code>.</li>
<li><code>','</code> yields the current state as a token and "resets" the optional fields. The "reset" is not back to <code>0</code> but rather to
<code>None</code>, which means the next token yielded will not have a <code>source</code> for example.</li>
</ul>
<p>Otherwise we have a number of <em>Base 64 VLQ</em> entries, either:</p>
<ul>
<li>1, updating the "minified" column number,</li>
<li>4, additionally updating and yielding the source index, line and column,</li>
<li>or 5, which additionally yields updates the name index and yields it.</li>
</ul>
<p>The resulting tokens are sorted by "minified" line, and "minified" column.</p>
<p>In the end, the most gains from this format come from the delta encoding. The
<em>Base 64 VLQ</em> on itself is not very efficient. A raw byte has <code>256</code> unique
values. Base 64 encoding reduces that to <code>64</code>. Another "continue" bitflag
reduces that to <code>32</code>. Or 5 useful bits per byte.</p>
<h1 id="decoding-the-mappings"><a class="anchor-link" href="#decoding-the-mappings" aria-label="Anchor link for: decoding-the-mappings">#</a>
Decoding the mappings</h1>
<p>Lets look at the concrete <code>mappings</code> above in more detail and decode it.
As a reminder, our <code>mappings</code> are <code>AACA,SAASA,oBACMA</code>.</p>
<pre data-lang="text" style="background-color:#fafafa;color:#61676c;" class="language-text "><code class="language-text" data-lang="text"><span>'A' (b64: 0b0000_0000): add 0 to the minified column number
</span><span>'A' (b64: 0b0000_0000): add 0 to the sources index
</span><span>'C' (b64: 0b0000_0010): add 1 to the line number
</span><span>'A' (b64: 0b0000_0000): add 0 to the column number
</span><span>',': yield the token: {0, 0, 0, 1, 0, None}
</span><span>'S' (b64: 0b0001_0010): add 9 to the minified column number
</span><span>'A' (b64: 0b0000_0000): add 0 to the sources index
</span><span>'A' (b64: 0b0000_0000): add 0 to the line number
</span><span>'S' (b64: 0b0001_0010): add 9 to the column number
</span><span>'A' (b64: 0b0000_0000): add 0 to the name index
</span><span>',': yield the token: {0, 9, 0, 1, 9, 0}
</span><span>'o' (b64: 0b0010_1000): continue with next byte, lowest 5 bits are `0b0_1000`
</span><span>'B' (b64: 0b0000_0001): next 5 bits `0b0_0001` are prepended to the number, resulting in `0b0010_1000`:
</span><span> add 20 to the minified column number
</span><span>'A' (b64: 0b0000_0000): add 0 to the sources index
</span><span>'C' (b64: 0b0000_0010): add 1 to the line number
</span><span>'M' (b64: 0b0000_1100): add 6 to the column number
</span><span>'A' (b64: 0b0000_0000): add 0 to the name index
</span><span>end: yield the token: {0, 29, 0, 2, 15, 0}
</span></code></pre>
<p>Decoding these <code>mappings</code> thus yields the following tokens:</p>
<pre data-lang="text" style="background-color:#fafafa;color:#61676c;" class="language-text "><code class="language-text" data-lang="text"><span>{ minified_line: 0, minified_column: 0, source_index: 0, source_line: 1, source_column: 0, name_index: None }
</span><span>{ minified_line: 0, minified_column: 9, source_index: 0, source_line: 1, source_column: 9, name_index: 0 }
</span><span>{ minified_line: 0, minified_column: 29, source_index: 0, source_line: 2, source_column: 15, name_index: 0 }
</span></code></pre>
<h1 id="how-to-use-these-mappings"><a class="anchor-link" href="#how-to-use-these-mappings" aria-label="Anchor link for: how-to-use-these-mappings">#</a>
How to use these <code>mappings</code></h1>
<p>We have a pretty simple example with only a single source file in <code>sources</code>.
Simple enough so we can look at the minified and the original source side by side:</p>
<!-- prettier-ignore -->
<pre data-lang="js" style="background-color:#fafafa;color:#61676c;" class="language-js "><code class="language-js" data-lang="js"><span style="font-style:italic;color:#abb0b6;">// --- minified ---
</span><span style="color:#fa6e32;">function </span><span style="color:#f29718;">t</span><span>(){}</span><span style="color:#fa6e32;">export default </span><span>t</span><span style="color:#61676ccc;">;
</span><span style="font-style:italic;color:#abb0b6;">// - line 0, column 0 corresponds to line 0, column 0 in `original.js`
</span><span style="font-style:italic;color:#abb0b6;">// ^- line 0, column 9 corresponds to line 1, column 9 in `original.js` and has name `abcd`
</span><span style="font-style:italic;color:#abb0b6;">// ^- line 0, column 29 corresponds to line 2, column 15 in `original.js` and has name `abcd`
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// --- original ---
</span><span style="font-style:italic;color:#abb0b6;">// ./node_modules/.bin/terser -c -m --module tests/fixtures/simple/original.js --source-map includeSources -o tests/fixtures/simple/minified.js
</span><span style="color:#fa6e32;">function </span><span style="color:#f29718;">abcd</span><span>() {}
</span><span style="font-style:italic;color:#abb0b6;">// ^- the second token points here on line 1
</span><span style="color:#fa6e32;">export default </span><span>abcd</span><span style="color:#61676ccc;">;
</span><span style="font-style:italic;color:#abb0b6;">// ^- the third token points here on line 2
</span></code></pre>
<p>One thing to note here is that the SourceMap tokens only represent a single
point in the minified file, not a <em>range</em>.
To do a lookup, you can exploit the fact that these tokens are properly sorted
by line and column to do a binary search.
Since we don’t have an <em>explicit</em> range, most implementations assume that a
token has an <em>implicit</em> range up to the next token, or to infinity for the last
token.</p>
<p>For example, if we perform the following lookup:</p>
<!-- prettier-ignore -->
<pre data-lang="js" style="background-color:#fafafa;color:#61676c;" class="language-js "><code class="language-js" data-lang="js"><span style="color:#fa6e32;">function </span><span style="color:#f29718;">t</span><span>(){}</span><span style="color:#fa6e32;">export default </span><span>t</span><span style="color:#61676ccc;">;
</span><span style="font-style:italic;color:#abb0b6;">// ^- line 0, column 19
</span></code></pre>
<p>That lookup would resolve to:</p>
<pre data-lang="text" style="background-color:#fafafa;color:#61676c;" class="language-text "><code class="language-text" data-lang="text"><span>{ minified_line: 0, minified_column: 9, source_index: 0, source_line: 1, source_column: 9, name_index: 0 }
</span></code></pre>
<p>Which is not entirely true, as the token points to the wrong original source line.
The resolution here depends on the tool producing the source map. In most cases
though we are close enough. And the tools are good enough to insert tokens in
all "interesting" places.</p>
<h1 id="summary"><a class="anchor-link" href="#summary" aria-label="Anchor link for: summary">#</a>
Summary</h1>
<p>To summarize the SourceMap format, lets look at a few properties that it has,
what kind of data it encodes, and which lookups we can use it for.</p>
<ul>
<li>SourceMaps are JSON, and have a ASCII-encoded <code>mappings</code>.</li>
<li>They have a list of <code>sources</code> with optional <code>sourcesContent</code> and <code>names</code>.</li>
<li>The <code>mappings</code> encodes deltas that operate on a state machine which yields Tokens.</li>
<li>These tokens can map from <code>line</code>/<code>column</code> pairs to:</li>
<li>… the original source location given by an index into <code>sources</code>, a <code>line</code> and <code>column</code>,</li>
<li>plus optionally an index into <code>names</code>.</li>
</ul>
<p>SourceMaps thus allow us to look up a minified location, mapping it to an
approximate position in the original source.</p>
<p>Most importantly, SourceMaps do not directly encode information about function
scopes and names. Though there are extensions that can do that, but those are
not widely used.</p>
<hr />
<p>There you have it. A deep dive into the SourceMap format, with a focus on its
<em>Base 64 VLQ</em> <code>mappings</code>. This was just one example of debug file formats and
the way they encode information compactly. I plan to take a look at more formats
in future posts, so look out for:</p>
<ul>
<li>Portable PDB sequence points</li>
<li>DWARF line programs</li>
<li>PDB line programs</li>
</ul>
The Magic of zerocopy2022-08-06T00:00:00+00:002022-08-06T00:00:00+00:00
Unknown
https://swatinem.de/blog/magic-zerocopy/<p>If you want to parse binary formats in Rust, you have a few crates to chose from
apart from rolling your own.</p>
<p>Some popular contenders are <a href="https://docs.rs/zerocopy/latest/zerocopy/index.html"><code>zerocopy</code></a>
and <a href="https://docs.rs/scroll/latest/scroll/index.html"><code>scroll</code></a>.</p>
<p>I would like to take this chance to explain the difference between the two,
which one you likely want to use in which situation, and why <code>zerocopy</code> truely
is magical.</p>
<p>However, neither is perfect, there is some papercuts and ideas for improvement
that I will explain in the end as well.</p>
<h1 id="what-does-zero-copy-mean"><a class="anchor-link" href="#what-does-zero-copy-mean" aria-label="Anchor link for: what-does-zero-copy-mean">#</a>
What does zero-copy mean?</h1>
<p>To start off, we assume that are dealing with a byte slice, <code>&[u8]</code>. We can
either read a complete file from disk, or rather just <code>mmap</code> into our address
space. The topic of incremental / streaming parsing that works with network
streams is something else entirely that I do not want to touch now.</p>
<p>So our complete binary file content is available as a <code>&'data [u8]</code>, and we want
to parse it into its logical format. As much as possible, we want to refer to
data inside that buffer directly rather than <em>copying</em> things out.</p>
<p><code>scroll</code> has partial support, as it allows to parse a <code>&'data str</code> which points
directly to the original buffer without allocating and copying a new <code>String</code>.</p>
<p>Parsing other data-types however, <code>scroll</code> tends to rather copy the contents
out of the buffer when parsing. Whereas <code>zerocopy</code> will give you a <code>&'data T</code>
by default.</p>
<p>Both approaches have advantages and disadvantages that we will look at.</p>
<h1 id="examples"><a class="anchor-link" href="#examples" aria-label="Anchor link for: examples">#</a>
Examples</h1>
<p>Lets look at a small example of how to use both crates. In both cases we want
to write, and then read, a simple nested struct to/from a buffer:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">use </span><span>scroll</span><span style="color:#ed9366;">::</span><span>{IOwrite</span><span style="color:#61676ccc;">,</span><span> Pread</span><span style="color:#61676ccc;">,</span><span> Pwrite</span><span style="color:#61676ccc;">,</span><span> SizeWith}</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">use </span><span>zerocopy</span><span style="color:#ed9366;">::</span><span>{AsBytes</span><span style="color:#61676ccc;">,</span><span> FromBytes</span><span style="color:#61676ccc;">,</span><span> LayoutVerified}</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C)]
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Copy</span><span style="color:#61676ccc;">,</span><span> Clone</span><span style="color:#61676ccc;">,</span><span> Debug</span><span style="color:#61676ccc;">,</span><span> PartialEq</span><span style="color:#61676ccc;">,</span><span> AsBytes</span><span style="color:#61676ccc;">,</span><span> FromBytes</span><span style="color:#61676ccc;">,</span><span> Pread</span><span style="color:#61676ccc;">,</span><span> Pwrite</span><span style="color:#61676ccc;">,</span><span> IOwrite</span><span style="color:#61676ccc;">,</span><span> SizeWith)]
</span><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">MyNestedPodStruct </span><span>{
</span><span> a</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> b</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u16</span><span>,
</span><span> _pad</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u16</span><span>,
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C)]
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Copy</span><span style="color:#61676ccc;">,</span><span> Clone</span><span style="color:#61676ccc;">,</span><span> Debug</span><span style="color:#61676ccc;">,</span><span> PartialEq</span><span style="color:#61676ccc;">,</span><span> AsBytes</span><span style="color:#61676ccc;">,</span><span> FromBytes</span><span style="color:#61676ccc;">,</span><span> Pread</span><span style="color:#61676ccc;">,</span><span> Pwrite</span><span style="color:#61676ccc;">,</span><span> IOwrite</span><span style="color:#61676ccc;">,</span><span> SizeWith)]
</span><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">MyPodStruct </span><span>{
</span><span> nested</span><span style="color:#61676ccc;">:</span><span> MyNestedPodStruct,
</span><span> c</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u64</span><span>,
</span><span>}
</span></code></pre>
<p>This is already a mouthful. My structs are <code>#[repr(C)]</code> so that I have full
control over their memory layout. <code>zerocopy</code> also has the additional requirement
that one has to be explicit about padding, in between members, or at the end.
A <code>#[reps(packed)]</code> annotation would avoid the need for that, at a cost that we
will discuss soon.</p>
<p><code>zerocopy</code> requires us to derive the <code>AsBytes</code> and <code>FromBytes</code> traits, that as
their name makes clear allows us to read a struct from raw bytes, or interpret
it as raw bytes that we can write.</p>
<p><code>scroll</code> on the other hand has a bunch of traits that we can derive. <code>Pread</code>,
<code>Pwrite</code> to read and write respectively, <code>IOWrite</code> to be able to write a
struct to a <code>std::io::Write</code> stream, and <code>SizeWith</code> for structs that have a
fixed size that does not depend on any <code>Context</code>, more on that later.</p>
<p>Lets define a few instances of our struct we want to write and then read back:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">let</span><span> structs</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span>[MyPodStruct] </span><span style="color:#ed9366;">= &</span><span>[
</span><span> MyPodStruct {
</span><span> nested</span><span style="color:#61676ccc;">:</span><span> MyNestedPodStruct {
</span><span> a</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">1</span><span style="color:#61676ccc;">,
</span><span> b</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">1</span><span style="color:#61676ccc;">,
</span><span> _pad</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">,
</span><span> }</span><span style="color:#61676ccc;">,
</span><span> c</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">1</span><span style="color:#61676ccc;">,
</span><span> }</span><span style="color:#61676ccc;">,
</span><span> MyPodStruct {
</span><span> nested</span><span style="color:#61676ccc;">:</span><span> MyNestedPodStruct {
</span><span> a</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">2</span><span style="color:#61676ccc;">,
</span><span> b</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">2</span><span style="color:#61676ccc;">,
</span><span> _pad</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">,
</span><span> }</span><span style="color:#61676ccc;">,
</span><span> c</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">2</span><span style="color:#61676ccc;">,
</span><span> }</span><span style="color:#61676ccc;">,
</span><span>]</span><span style="color:#61676ccc;">;
</span></code></pre>
<hr />
<p><code>zerocopy</code> lets us turn this whole slice into a <code>&[u8]</code> which we can then
copy around or write as we see fit:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">let mut</span><span> buf </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span style="color:#ed9366;">::</span><span>new()</span><span style="color:#61676ccc;">;
</span><span>buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">write_all</span><span>(structs</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_bytes</span><span>())</span><span style="color:#ed9366;">?</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>For reading, there is a wide range of options. You can read/cast an exactly-sized
buffer, a prefix or a suffix. You can specifically chose to read <code>unaligned</code>.
All these different methods are implemented as constructors of the
<code>LayoutVerified</code> struct, which can then be turned into a reference or a slice.</p>
<p>Here is an example of how to get a single struct or the whole slice from our
buffer:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">let</span><span> lv </span><span style="color:#ed9366;">= </span><span>LayoutVerified</span><span style="color:#ed9366;">::</span><span><</span><span style="color:#ed9366;">_</span><span>, [MyPodStruct]></span><span style="color:#ed9366;">::</span><span>new_slice(buf)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap</span><span>()</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let</span><span> parsed_slice </span><span style="color:#ed9366;">=</span><span> parsed_slice</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">into_slice</span><span>()</span><span style="color:#61676ccc;">;
</span><span style="color:#f07171;">assert_eq!</span><span>(structs</span><span style="color:#61676ccc;">,</span><span> parsed_slice)</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">let </span><span>(lv</span><span style="color:#61676ccc;">,</span><span> _rest) </span><span style="color:#ed9366;">= </span><span>LayoutVerified</span><span style="color:#ed9366;">::</span><span><</span><span style="color:#ed9366;">_</span><span>, MyPodStruct></span><span style="color:#ed9366;">::</span><span>new_from_prefix(buf)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap</span><span>()</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let</span><span> parsed_one </span><span style="color:#ed9366;">=</span><span> lv</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">into_ref</span><span>()</span><span style="color:#61676ccc;">;
</span><span style="color:#f07171;">assert_eq!</span><span>(</span><span style="color:#ed9366;">&</span><span>structs[</span><span style="color:#ff8f40;">0</span><span>]</span><span style="color:#61676ccc;">,</span><span> parsed_one)</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>One thing to note here is that we have to provide explicit type annotations, as
for some reason the compiler is not able to infer it automatically.</p>
<hr />
<p>As far as I know, <code>scroll</code> on the other hand does not allow to directly write
either a slice, or a reference. That is the reason why I derived <code>Copy</code> and by
extension <code>Clone</code> for our structs above.
Please reach out to me and prove me wrong here.</p>
<p>We thus write owned copies one by one here:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">let mut</span><span> buf </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span style="color:#ed9366;">::</span><span>new()</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">for</span><span> s </span><span style="color:#ed9366;">in</span><span> structs {
</span><span> buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">iowrite</span><span>(</span><span style="color:#ed9366;">*</span><span>s)</span><span style="color:#ed9366;">?</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>Parsing also does not work for a whole slice as far as I know (please prove me wrong),
and as <code>scroll</code> is not zero-copy, we have to collect parsed structs into a
<code>Vec</code> manually:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">let</span><span> offset </span><span style="color:#ed9366;">= &</span><span style="color:#fa6e32;">mut </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let mut</span><span> parsed </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span style="color:#ed9366;">::</span><span>new()</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">while </span><span style="color:#ed9366;">*</span><span>offset </span><span style="color:#ed9366;"><</span><span> buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">len</span><span>() {
</span><span> parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">push</span><span>(buf</span><span style="color:#ed9366;">.</span><span>gread</span><span style="color:#ed9366;">::</span><span><MyPodStruct>(offset)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap</span><span>())</span><span style="color:#61676ccc;">;
</span><span>}
</span><span>
</span><span style="color:#f07171;">assert_eq!</span><span>(structs</span><span style="color:#61676ccc;">,</span><span> parsed)</span><span style="color:#61676ccc;">;
</span></code></pre>
<h1 id="why-this-matters"><a class="anchor-link" href="#why-this-matters" aria-label="Anchor link for: why-this-matters">#</a>
Why this matters</h1>
<p>When parsing binary files, we want that parsing to be as fast as possible, we
also want to allocate / copy as little memory as possible.</p>
<p>The <code>zerocopy</code> crate truely is zero-copy. It does a fixed number of pointer
arithmetic (essentially an alignment check, and bounds check) to verify the
layout of our buffer. I guess that’s why its main type is called <code>LayoutVerified</code>.</p>
<p><code>scroll</code> on the other hand parses each struct (in fact, each member) one by
one and copied them into a <code>Vec</code> that needs to be allocated. It is thus a lot
more expensive. You don’t necessarily need to parse and collect <em>everything</em>.
You can parse structs on demand. And if your structs have a fixed size
(we derived <code>SizeWith</code>), you can skip ahead in the source buffer to do some
random access.</p>
<h1 id="endianness-and-other-context"><a class="anchor-link" href="#endianness-and-other-context" aria-label="Anchor link for: endianness-and-other-context">#</a>
Endianness and other Context</h1>
<p>Which approach is better is, as always, a matter of tradeoffs.
<code>zerocopy</code> is the better choice if all your raw data structures have a fixed
size and endianness. <code>scroll</code> is the better choice if your data structures have
a variable size, or you want to parse files with dynamic endianness.
<code>scroll</code> calls this <code>Context</code>, and there is a
<a href="https://docs.rs/scroll/latest/scroll/ctx/index.html#example">complete example</a>
how to create a custom parser that is aware of both endianness and the size of
certain fields.</p>
<p>While <code>scroll</code> supports endianness aware parsing based on a runtime context,
<code>zerocopy</code> is very different here. It supports types that are byteorder aware,
but their byteorder is fixed at compile time. A <code>zerocopy</code> <code>U64<LE></code> is
statically typed, and its <code>get</code> method is optimized at compile to only read LE
data.</p>
<h1 id="making-zero-copy-context-aware"><a class="anchor-link" href="#making-zero-copy-context-aware" aria-label="Anchor link for: making-zero-copy-context-aware">#</a>
Making zero-copy context-aware</h1>
<p>With that <code>zerocopy</code> limitation, I thought is was a fun exercise to make it
<em>somehow</em> handle formats of all kinds of endianness and field sizes at runtime.</p>
<p>As example, I will choose the ELF header, which has differently sized fields
for 32-bit and 64-bit variants, as well as different endianness. The header is
also self-describing as it has two flags for bit-width and endianness, which can
be read without knowing either as it is just a bunch of bytes. It looks like this:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C)]
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(FromBytes)]
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">ElfIdent </span><span>{
</span><span> </span><span style="font-style:italic;color:#abb0b6;">/// ELF Magic, must be `b"\x7fELF"`
</span><span> e_mag</span><span style="color:#61676ccc;">:</span><span> [</span><span style="color:#fa6e32;">u8</span><span>; 4],
</span><span> </span><span style="font-style:italic;color:#abb0b6;">/// Field size flag: 1 = 32-bit variant, 2 = 64-bit.
</span><span> e_class</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u8</span><span>,
</span><span> </span><span style="font-style:italic;color:#abb0b6;">/// Endianness flag: 1 = LE, 2 = BE.
</span><span> e_data</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u8</span><span>,
</span><span>
</span><span> e_version</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u8</span><span>,
</span><span> e_abi</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u8</span><span>,
</span><span> e_abiversion</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u8</span><span>,
</span><span> e_pad</span><span style="color:#61676ccc;">:</span><span> [</span><span style="color:#fa6e32;">u8</span><span>; 7],
</span><span>}
</span></code></pre>
<p>Next, we can define different structures for each of the variants. As you might
have guessed, this leads to combinatorial explosion as we have to define four
different variants. Here are two to keep things simple:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">use </span><span>zerocopy</span><span style="color:#ed9366;">::</span><span>byteorder</span><span style="color:#ed9366;">::*</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C</span><span style="color:#61676ccc;">, </span><span style="color:#f29718;">align</span><span>(8))]
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(FromBytes)]
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">ElfHeader_L64 </span><span>{
</span><span> e_ident</span><span style="color:#61676ccc;">:</span><span> ElfIdent,
</span><span> e_type</span><span style="color:#61676ccc;">: </span><span>U16<LE>,
</span><span> e_machine</span><span style="color:#61676ccc;">: </span><span>U16<LE>,
</span><span> e_version</span><span style="color:#61676ccc;">: </span><span>U32<LE>,
</span><span> e_entry</span><span style="color:#61676ccc;">: </span><span>U64<LE>,
</span><span> e_phoff</span><span style="color:#61676ccc;">: </span><span>U64<LE>,
</span><span> e_shoff</span><span style="color:#61676ccc;">: </span><span>U64<LE>,
</span><span> e_flags</span><span style="color:#61676ccc;">: </span><span>U32<LE>,
</span><span> e_ehsize</span><span style="color:#61676ccc;">: </span><span>U16<LE>,
</span><span> e_phentsize</span><span style="color:#61676ccc;">: </span><span>U16<LE>,
</span><span> e_phnum</span><span style="color:#61676ccc;">: </span><span>U16<LE>,
</span><span> e_shentsize</span><span style="color:#61676ccc;">: </span><span>U16<LE>,
</span><span> e_shnum</span><span style="color:#61676ccc;">: </span><span>U16<LE>,
</span><span> e_shstrndx</span><span style="color:#61676ccc;">: </span><span>U16<LE>,
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C</span><span style="color:#61676ccc;">, </span><span style="color:#f29718;">align</span><span>(4))]
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(FromBytes)]
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">ElfHeader_B32 </span><span>{
</span><span> e_ident</span><span style="color:#61676ccc;">:</span><span> ElfIdent,
</span><span> e_type</span><span style="color:#61676ccc;">: </span><span>U16<BE>,
</span><span> e_machine</span><span style="color:#61676ccc;">: </span><span>U16<BE>,
</span><span> e_version</span><span style="color:#61676ccc;">: </span><span>U32<BE>,
</span><span> e_entry</span><span style="color:#61676ccc;">: </span><span>U32<BE>,
</span><span> e_phoff</span><span style="color:#61676ccc;">: </span><span>U32<BE>,
</span><span> e_shoff</span><span style="color:#61676ccc;">: </span><span>U32<BE>,
</span><span> e_flags</span><span style="color:#61676ccc;">: </span><span>U32<BE>,
</span><span> e_ehsize</span><span style="color:#61676ccc;">: </span><span>U16<BE>,
</span><span> e_phentsize</span><span style="color:#61676ccc;">: </span><span>U16<BE>,
</span><span> e_phnum</span><span style="color:#61676ccc;">: </span><span>U16<BE>,
</span><span> e_shentsize</span><span style="color:#61676ccc;">: </span><span>U16<BE>,
</span><span> e_shnum</span><span style="color:#61676ccc;">: </span><span>U16<BE>,
</span><span> e_shstrndx</span><span style="color:#61676ccc;">: </span><span>U16<BE>,
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">test</span><span>]
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">test_struct_layout</span><span>() {
</span><span> </span><span style="color:#fa6e32;">use </span><span>core</span><span style="color:#ed9366;">::</span><span>mem</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(mem</span><span style="color:#ed9366;">::</span><span>align_of</span><span style="color:#ed9366;">::</span><span><ElfHeader_L64>()</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">8</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(mem</span><span style="color:#ed9366;">::</span><span>size_of</span><span style="color:#ed9366;">::</span><span><ElfHeader_L64>()</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">64</span><span>)</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(mem</span><span style="color:#ed9366;">::</span><span>align_of</span><span style="color:#ed9366;">::</span><span><ElfHeader_B32>()</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">4</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(mem</span><span style="color:#ed9366;">::</span><span>size_of</span><span style="color:#ed9366;">::</span><span><ElfHeader_B32>()</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">52</span><span>)</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>Implementing this example, I was a bit surprised that I had to specify the
alignment of my structures manually. Turns out the <code>zerocopy::U64</code> and similar
types are unaligned. Which means reading from them needs to use the appropriate
instructions that do unaligned loads which might be a bit slower, but I guess
this is a wash in the grand scheme of things.</p>
<p>A recommendation here would be to write tests that explicitly check the size
and alignment of your structs. Very helpful. I wouldn’t have caught this issue
otherwise.</p>
<p>With these two variants defined, we can then create a context-aware wrapper
around that, which choses the variant at runtime depending on its input:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub enum </span><span style="color:#399ee6;">ElfHeader</span><span><'data> {
</span><span> </span><span style="color:#ff8f40;">L64</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> ElfHeader_L64)</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ff8f40;">B32</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> ElfHeader_B32)</span><span style="color:#61676ccc;">,
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// TODO: L32, B64
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl</span><span><</span><span style="color:#fa6e32;">'data</span><span>> </span><span style="color:#399ee6;">ElfHeader</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">parse</span><span>(</span><span style="color:#ff8f40;">buf</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> [</span><span style="color:#fa6e32;">u8</span><span>]) </span><span style="color:#61676ccc;">-> </span><span style="font-style:italic;color:#55b4d4;">Option</span><span><(</span><span style="color:#fa6e32;">Self</span><span>, </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> [</span><span style="color:#fa6e32;">u8</span><span>])> {
</span><span> </span><span style="color:#fa6e32;">let </span><span>(e_ident</span><span style="color:#61676ccc;">,</span><span> _rest) </span><span style="color:#ed9366;">= </span><span>LayoutVerified</span><span style="color:#ed9366;">::</span><span><</span><span style="color:#ed9366;">_</span><span>, ElfIdent></span><span style="color:#ed9366;">::</span><span>new_from_prefix(buf)</span><span style="color:#ed9366;">?</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">if</span><span> e_ident</span><span style="color:#ed9366;">.</span><span>e_mag </span><span style="color:#ed9366;">!= *</span><span style="color:#fa6e32;">b</span><span style="color:#86b300;">"</span><span style="color:#4cbf99;">\x7f</span><span style="color:#86b300;">ELF" </span><span>{
</span><span> </span><span style="color:#fa6e32;">return </span><span style="font-style:italic;color:#55b4d4;">None</span><span style="color:#61676ccc;">;
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">match</span><span> e_ident</span><span style="color:#ed9366;">.</span><span>e_class {
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// 32-bit
</span><span> </span><span style="color:#ff8f40;">1 </span><span style="color:#ed9366;">=> </span><span>{
</span><span> </span><span style="color:#fa6e32;">match</span><span> e_ident</span><span style="color:#ed9366;">.</span><span>e_data {
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// LE
</span><span> </span><span style="color:#ff8f40;">1 </span><span style="color:#ed9366;">=> </span><span style="color:#f07171;">todo!</span><span>()</span><span style="color:#61676ccc;">,
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// BE
</span><span> </span><span style="color:#ff8f40;">2 </span><span style="color:#ed9366;">=> </span><span>{
</span><span> </span><span style="color:#fa6e32;">let </span><span>(e_header</span><span style="color:#61676ccc;">,</span><span> rest) </span><span style="color:#ed9366;">=
</span><span> LayoutVerified</span><span style="color:#ed9366;">::</span><span><</span><span style="color:#ed9366;">_</span><span>, ElfHeader_B32></span><span style="color:#ed9366;">::</span><span>new_from_prefix(buf)</span><span style="color:#ed9366;">?</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Some</span><span>((</span><span style="color:#fa6e32;">Self</span><span style="color:#ed9366;">::</span><span style="color:#ff8f40;">B32</span><span>(e_header</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">into_ref</span><span>())</span><span style="color:#61676ccc;">,</span><span> rest))
</span><span> }
</span><span> </span><span style="color:#ed9366;">_ => </span><span style="font-style:italic;color:#55b4d4;">None</span><span style="color:#61676ccc;">,
</span><span> }
</span><span> }
</span><span>
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// 64-bit
</span><span> </span><span style="color:#ff8f40;">2 </span><span style="color:#ed9366;">=> </span><span>{
</span><span> </span><span style="color:#fa6e32;">match</span><span> e_ident</span><span style="color:#ed9366;">.</span><span>e_data {
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// LE
</span><span> </span><span style="color:#ff8f40;">1 </span><span style="color:#ed9366;">=> </span><span>{
</span><span> </span><span style="color:#fa6e32;">let </span><span>(e_header</span><span style="color:#61676ccc;">,</span><span> rest) </span><span style="color:#ed9366;">=
</span><span> LayoutVerified</span><span style="color:#ed9366;">::</span><span><</span><span style="color:#ed9366;">_</span><span>, ElfHeader_L64></span><span style="color:#ed9366;">::</span><span>new_from_prefix(buf)</span><span style="color:#ed9366;">?</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Some</span><span>((</span><span style="color:#fa6e32;">Self</span><span style="color:#ed9366;">::</span><span style="color:#ff8f40;">L64</span><span>(e_header</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">into_ref</span><span>())</span><span style="color:#61676ccc;">,</span><span> rest))
</span><span> }
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// BE
</span><span> </span><span style="color:#ff8f40;">2 </span><span style="color:#ed9366;">=> </span><span style="color:#f07171;">todo!</span><span>()</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ed9366;">_ => </span><span style="font-style:italic;color:#55b4d4;">None</span><span style="color:#61676ccc;">,
</span><span> }
</span><span> }
</span><span> </span><span style="color:#ed9366;">_ => </span><span style="font-style:italic;color:#55b4d4;">None</span><span style="color:#61676ccc;">,
</span><span> }
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">e_shoff</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#ff8f40;">self</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">u64 </span><span>{
</span><span> </span><span style="color:#fa6e32;">match </span><span style="font-style:italic;color:#55b4d4;">self </span><span>{
</span><span> ElfHeader</span><span style="color:#ed9366;">::</span><span style="color:#ff8f40;">L64</span><span>(header) </span><span style="color:#ed9366;">=></span><span> header</span><span style="color:#ed9366;">.</span><span>e_shoff</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get</span><span>()</span><span style="color:#61676ccc;">,
</span><span> ElfHeader</span><span style="color:#ed9366;">::</span><span style="color:#ff8f40;">B32</span><span>(header) </span><span style="color:#ed9366;">=></span><span> header</span><span style="color:#ed9366;">.</span><span>e_shoff</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get</span><span>() </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">u64</span><span style="color:#61676ccc;">,
</span><span> }
</span><span> }
</span><span>}
</span></code></pre>
<p>This wrapper can have accessors that check at runtime which variant the
underlying data has and do the appropriate access in a typesafe manner.
That wrapper is also lightweight and zero-copy. It only has the enum discriminant,
plus a pointer to the <em>same</em> underlying data in all cases. So it is essentially
a tagged pointer.</p>
<h1 id="api-papercuts"><a class="anchor-link" href="#api-papercuts" aria-label="Anchor link for: api-papercuts">#</a>
API Papercuts</h1>
<p>Well there you have it. A detailed explanation of <code>zerocopy</code> and <code>scroll</code>, the
difference between the two, and how <code>zerocopy</code> can be extremely lightweight as
it only validates the correct size and alignment of things without doing any
parsing at all. It only "parses" things when you start to access that data.</p>
<p>Both of these have their pros and cons, both have different use cases and
strength. <code>zerocopy</code> is better if you have fixed size structs and don’t need to
care about endianness, although it <em>is</em> possible to make that work with some
effort as shown above.</p>
<p><code>scroll</code> makes these use cases trivial, at the cost of parsing everything ahead
of time and copying things out into agnostic structs.</p>
<hr />
<p>Unfortunately though, both these libraries are a bit hard to work with, and
their APIs could use some streamlining.</p>
<p>The API surface of <code>scroll</code> is huge, as it supports a ton of features. But
the main APIs that you interact with are very unintuitive and confusing. There
is <code>pread</code> and <code>gread</code>. Whats the difference between the two? I honestly can’t
tell you without looking at the docs, which I have to do constantly as I simply
can’t remember that myself.</p>
<p>There are quite some papercuts with <code>zerocopy</code> as well. First of, why is
<code>LayoutVerified</code> a concrete type to begin with? I can’t think of a good use-case
for which you want to actually keep that type around. You rather want to
immediately turn it <code>into_ref</code> or <code>into_slice</code>. Free functions would serve that
use-case a lot better.</p>
<p>The API is also extremely repetitive, as we have seen in this example:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">let</span><span> mystructs </span><span style="color:#ed9366;">= </span><span>LayoutVerified</span><span style="color:#ed9366;">::</span><span><</span><span style="color:#ed9366;">_</span><span>, [MyPodStruct]></span><span style="color:#ed9366;">::</span><span>new_slice(buf)</span><span style="color:#ed9366;">?.</span><span style="color:#f07171;">into_slice</span><span>()</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>I repeat the <code>slice</code> three times in this line. <code>new_slice</code> and <code>into_slice</code>,
plus the fact that these two functions only exist if the type parameter itself
is a slice. Can I rather have a single free function instead of this?</p>
<p>Usage of the <code>Unaligned</code> APIs is a bit confusing, and I was surprised to see
that the endian-aware types such as <code>U64</code> are unaligned as well.</p>
<p>The usage of custom derive for <code>FromBytes</code> is interesting as it validates safe
usage. But it also means that it is impossible to derive if you have some
foreign types such as <code>Uuid</code> which are <code>#[repr(C)]</code> but do not implement
<code>FromBytes</code> themselves. <code>uuid</code> btw has unstable support for <code>zerocopy</code>, but it
<a href="https://docs.rs/uuid/latest/uuid/#unstable-features">requires passing in custom RUSTFLAGS</a>
which is inconvenient.</p>
<h1 id="variable-size-and-compressed-data"><a class="anchor-link" href="#variable-size-and-compressed-data" aria-label="Anchor link for: variable-size-and-compressed-data">#</a>
Variable-size and Compressed data</h1>
<p>A use-case that I have not explored in this post, which is rather trivial
to handle in <code>scroll</code> but close to impossible in <code>zerocopy</code> is truly variable
sized data, such as structures that embed length-prefixed or nul-terminated
strings inline. Those are the devil.</p>
<p>When picking tradeoffs, you can either have something that as simple and fast.
Or something that is compact and small. A compact format is almost certainly
variable sized, which means you can’t use zerocopy patterns, and you lose the
ability of random access. A very clear example of this is delta compression,
where you <em>have to</em> parse things in order.</p>
<h1 id="watch-this-space"><a class="anchor-link" href="#watch-this-space" aria-label="Anchor link for: watch-this-space">#</a>
Watch this space</h1>
<p>As a matter of fact, most debug formats use some clever tricks to represent
source information very compactly. I want to explore some of these formats in
a lot more detail in the future. More specifically, watch out for future blog
posts about:</p>
<ul>
<li>DWARF line programs</li>
<li>PDB line programs</li>
<li>Portable PDB sequence points</li>
<li>SourceMap VLQ mappings</li>
</ul>
Format Ossification2022-07-29T00:00:00+00:002022-07-29T00:00:00+00:00
Unknown
https://swatinem.de/blog/format-ossification/<p>Before going into the details of my recent discovery, lets define the term
<em>Ossification</em>, as likely a lot of people have never heard that word before.</p>
<p>Imagine you have an extensible format or protocol. As example, we can take a
list of elements of different type. The list is extensible. It can have an
arbitrary number of elements, and over time the different types can also be
extended.</p>
<p>This is great. But what happens if you never <em>use</em> this extensibility?
Lets say that, for years and years, your list has always had exactly one element,
and that element has always been of a very specific type.</p>
<p>Well, users of that format or protocol will start relying on that very fact,
and will assert this assumption in code, or worse, in hardware.</p>
<p>So your format is extensible in theory, but you can never extend it in practice
because tools have come to rely on a very specific size and order.</p>
<p>That is called <em>Ossification</em> and is sadly a reality, especially in network
protocols.</p>
<p>And as I found out recently, it is also a thing for the COFF/PE file format,
the format of Windows <code>.exe</code>/<code>.dll</code> files.</p>
<hr />
<p>My journey starts with a Sentry Customer Issue. We got a report about a
processing error that complained about an invalid "image type", whatever that
means. (Image here is a loaded library/executable)</p>
<p>The image in question indeed was missing its <code>type</code> field, but it did have other
fields that are normal for images in the sentry protocol. The event also made
it clear that it was coming from Windows.</p>
<p>With that information I was looking at the code in the <code>sentry-native</code> SDK that
collected these images, and indeed found some early-returns that would leave
an image entry without a <code>type</code>. I fixed the issue by
<a href="https://github.com/getsentry/sentry-native/pull/732">reordering the code</a> so
we still get a <code>type</code> even though we can’t find a CodeView record for the image.</p>
<p>A while later while investigating how to link from a C# stack trace to the
corresponding portable PDB, I stumbled across the
<a href="https://docs.microsoft.com/en-us/dotnet/api/system.reflection.portableexecutable.pereader.readdebugdirectory?view=net-6.0">PEReader.ReadDebugDirectory</a>
method.</p>
<p>This method returned an Array of <code>DebugDirectoryEntry</code>, whereas the code from
<code>sentry-native</code> I was looking at just two weeks earlier was reading a single
entry. Interesting.</p>
<p>Fast forward to today, where I am again investigating a customer issue related
to a <code>.dll</code> that does not seem to have a valid <code>debug_id</code> (which comes from the
CodeView record mentioned above).</p>
<p>It took some time until the things I have seen clicked in my brain. What if
our tools make wrong assumptions about the shape of a PE file and its
Debug Directory Entries? What if for years all the PE files always had a single
Debug Directory Entry that happened to be the CodeView record?
What if suddenly some new compiler version is generating PE files that have
more than one Debug Directory Entry, and the CodeView record is not the first
one anymore?</p>
<p>Well, classic case of Ossification. Things are extensible in theory, but since
that extensibility was never practiced for years, all the tools developed around
this format came to expect things that are not true anymore.</p>
<h2 id="how-did-this-happen"><a class="anchor-link" href="#how-did-this-happen" aria-label="Anchor link for: how-did-this-happen">#</a>
How did this happen?</h2>
<p>Well, the simple answer is that the available documentation around all this is
quite lacking to put it mildly.</p>
<p>The main documentation for <a href="https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-image_data_directory"><code>IMAGE_DATA_DIRECTORY</code></a> mentions a <code>Size</code> that is described as:</p>
<blockquote>
<p>The size of the table, in bytes.</p>
</blockquote>
<p>Okay, yeah, great. There is no documentation or example of what to do with this.
It is not at all obvious this is supposed to be the number of bytes of an array,
and that the resulting array has <code>total_size / sizeof(IMAGE_DEBUG_DIRECTORY)</code>
elements.</p>
<p>The documentation for <a href="https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-image_debug_directory"><code>IMAGE_DEBUG_DIRECTORY</code></a>
is also quite outdated. The docs online describe the <code>Type</code> field up to number
<code>9</code>. The <code>winnt.h</code> header has defines up to number <code>20</code>, without any description
either.</p>
<p>If you happen to stumble upon the specification of the
<a href="https://github.com/dotnet/runtime/blob/main/docs/design/specs/PE-COFF.md#debug-directory">.NET/C# extension to PE/COFF</a>,
that document does indeed say this is an array:</p>
<blockquote>
<p>This directory consists of an array of debug directory entries whose location and size are indicated in the image optional header.</p>
</blockquote>
<p>Hooray, big success! The doc also describes some of the <code>Type</code>s missing from
the <code>winnt.h</code> header and the other documentation.</p>
<p>It also has a description for the CodeView record itself, which is lacking from
the other Windows docs and from the <code>winnt.h</code> header.</p>
<p>In particular, this <code>RSDS</code> (PDB 7.0) CodeView format is being read by a huge
number of tools, but I can’t find any <em>official</em> documentation anywhere.
This .NET extension linked above is the closest I could find.</p>
<p>The <a href="https://docs.microsoft.com/en-us/windows/win32/api/minidumpapiset/ns-minidumpapiset-minidump_module"><code>MINIDUMP_MODULE</code></a> documentation also mentions a CodeView record,
but it is also missing a description of how to interpret it.</p>
<p>So to summarize, the PE format has very incomplete or outright missing
documentation. And the tools dealing with it are probably cargo-culting wrong
assumptions from one implementation to the next.</p>
<h2 id="what-now"><a class="anchor-link" href="#what-now" aria-label="Anchor link for: what-now">#</a>
What now?</h2>
<p>Well, we figured out that a PE file can have multiple Debug Directory entries,
and either one of them can be the CodeView record we are looking for.</p>
<p>Time to see which tool got this right, and fix the ones that got it wrong.</p>
<p>Here are PRs for <a href="https://github.com/getsentry/sentry-native/pull/740"><code>sentry-native</code></a>,
<a href="https://github.com/m4b/goblin/pull/319"><code>goblin</code></a> and
<a href="https://github.com/gimli-rs/object/pull/451"><code>object</code></a>.</p>
<p>To my surprise, <a href="https://chromium.googlesource.com/crashpad/crashpad/+/refs/heads/main/snapshot/win/pe_image_reader.cc#162"><code>crashpad</code></a>
actually got this right.
To my surprise because I was also looking at a customer minidump created by
crashpad that was missing CodeView records for some of the minidump modules.
(Yes, the loaded executable code is called <code>image</code> in PE and Sentry terminology,
whereas minidumps call them <code>module</code>s. Confused yet?)</p>
<p>Looking at the customer <code>.dll</code> again, it became clear that it did have a
Debug Directory entry, but it wasn’t a CodeView one. Maybe if it had one, it
would indeed be the first? Even if, the point here is to not make any assumptions
around that.</p>
<p>So in the end I was chasing a ghost all along. But at least I learned a ton in
the process, and de-ossified a bunch of tools along the way.</p>
<p>The specific customer issue boils down to "fix your build system", and that is
the end of the story.</p>
Please delete your Snapshot Tests2022-07-23T00:00:00+00:002022-07-23T00:00:00+00:00
Unknown
https://swatinem.de/blog/rm-snapshots/<p>Snapshot testing is quite popular. And unfortunately most of the time it is the
wrong tool for the job.</p>
<p>In software projects, testing is extremely important. A piece of software can
only be as good as its testsuite, at least when it is changing over time.</p>
<h1 id="tests-abstractly"><a class="anchor-link" href="#tests-abstractly" aria-label="Anchor link for: tests-abstractly">#</a>
Tests, abstractly</h1>
<p>So what is the purpose of tests in the first place? Well, you test that for
certain inputs, your program returns some outputs or performs some side effects.</p>
<p>For each test case you should be able to very specifically say what it is
asserting. For snapshot tests, the answer to the question
“what is your test asserting” is most often “hm… I guess… everything?”</p>
<p>Maybe when you created the snapshot test, you had an idea of what exactly you
wanted to assert, but over time this is being lost.</p>
<p>Another reason snapshot tests are bad is because they change way too often.
Completely unrelated changes in the codebase can change your snapshot output.
Because over time you lose track of what you actually wanted to assert with the
test, a snapshot that frequently changes will just be accepted by both the
developer and the reviewer. The test assertions thus lose their purpose.</p>
<h1 id="good-assertions-don-t-change"><a class="anchor-link" href="#good-assertions-don-t-change" aria-label="Anchor link for: good-assertions-don-t-change">#</a>
Good assertions don’t change</h1>
<p>Ideally, each test should assert something very specific. A ground truth that
you know is true and is set in stone for infinity. Good test assertions should
never change. Sure, your test code will change as your API evolves. But the
assertions should not.</p>
<p>If a test assertion changes, it means either the assertion was bad from the
start, or you have a regression somewhere that needs to be investigated.</p>
<p>Snapshot tests change way too frequently, and they are way too broad which means
people get into the habit of “yeah, whatever”.</p>
<p>As mentioned before, you should assert very specific outcomes and side effects.
Snapshot tests frequently assert intermediate artifacts which are not
interesting and change a lot.</p>
<h1 id="good-snapshot-tests"><a class="anchor-link" href="#good-snapshot-tests" aria-label="Anchor link for: good-snapshot-tests">#</a>
Good snapshot tests</h1>
<p>I have only seen a few snapshot tests that were done right, and it very much
depends on the software under test if snapshot tests make sense or not.</p>
<p>As all other kind of tests, the testcases should be as minimal as possible.
They should just test one specific case, and not the whole world.</p>
<p>The cases where I think snapshot testing makes sense if you have transformations
on text.</p>
<p>I do snapshot testing in <a href="https://github.com/Swatinem/rollup-plugin-dts"><code>rollup-plugin-dts</code></a>.
I test very specific use-cases with each testcase. I test end to end, asserting
the final output as a snapshot. These tests are stable and should never change
when I make changes to the core logic. They do change however in the very rare
case that my main dependency <code>rollup</code> changes some of its logic.</p>
<p>Another example of where snapshot testing makes sense would be <code>rust-analyzer</code>.
You give it a snippet of code and a cursor position. Then you apply a suggestion
and you assert the final output.</p>
<p>Snapshotting intermediate artifacts like abstract syntax trees in this example
would be bad as those can change frequently without influencing the final
output.</p>
<h1 id="general-testing-advice"><a class="anchor-link" href="#general-testing-advice" aria-label="Anchor link for: general-testing-advice">#</a>
General testing advice</h1>
<p>I would like to end this post with some general recommendation towards testing:</p>
<ul>
<li>It should be very clear what your test is asserting</li>
<li>Assert facts that you know are, and will stay, true</li>
<li>Keep your tests as small as possible</li>
<li>Assert outputs, not intermediate artifacts</li>
<li>Make sure you have code coverage enabled, which can help you discover missed
edge cases that are not tested yet</li>
<li>Your tests should serve as real world use-cases of your APIs</li>
</ul>
Pitfalls of fallible Iterators2022-07-08T00:00:00+00:002022-07-08T00:00:00+00:00
Unknown
https://swatinem.de/blog/fallible-iterators/<p>I wanted to write about this topic quite some time ago, but it seems my
tendency to procrastinate won in the end. Until today, so lets get started.</p>
<p>The topic at hand are fallible Iterators, and this post is motivated by a real
world problem that I fixed both at the producer end in
<a href="https://github.com/bytecodealliance/wasm-tools/pull/472">wasmparser</a> and the
consumer in <a href="https://github.com/getsentry/symbolic/pull/500">symbolic</a>.</p>
<p>When parsing some binary data files, for example <code>wasm</code> as in the example above,
errors can happen all the time. Files might be truncated, they might be
corrupted either by some random bitflips or bad compression, by a faulty writer
or they might be malicious and created by an attacker specifically to exploit
bugs in the parser.</p>
<p>Especially for the last reason, parsers in general need to be very robust. As
we are dealing with Rust we are lucky because in safe Rust at least we can’t
corrupt our internal program state and thus we should be safe against executing
untrusted code.
But as we will see, bad things can still happen.</p>
<p>When parsing such files, we at Sentry want to be as forgiving as possible though.
We want to extract as much usable information from a file as possible. Even if
our parsers are incomplete and can’t handle some cases, or the original producers
are buggy and produce invalid files (yes, this happens more often than I would
like to admit), we still want to make use of the stuff that was correct and that
we can parse.</p>
<p>We don’t want to reject the complete file because a single string was not
utf-8 encoded for example. We also don’t want to early-return on the first
invalid string, but rather skip ahead to next valid one.</p>
<p>So what are fallible Iterators? There is two general patterns that I saw:</p>
<ul>
<li><code>next(&mut self) -> Option<Result<T, E>></code></li>
<li><code>next(&mut self) -> Result<Option<T>, E></code></li>
</ul>
<p>They are both very similar, and you can almost convert between the two using
<code>transpose</code>, but I will argue that they express slightly different intentions
that I will also highlight. So lets look at both in some detail.</p>
<h2 id="next-mut-self-option-result-t-e"><a class="anchor-link" href="#next-mut-self-option-result-t-e" aria-label="Anchor link for: next-mut-self-option-result-t-e">#</a>
<code>next(&mut self) -> Option<Result<T, E>></code></h2>
<p>This pattern is nice because we are dealing with a <em>real</em> <code>impl Iterator</code> that
we can use in <code>for</code> loops.</p>
<p>Lets see what else we can do with these:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="font-style:italic;color:#abb0b6;">// we can propagate errors early:
</span><span style="color:#fa6e32;">for</span><span> item </span><span style="color:#ed9366;">in</span><span> iter {
</span><span> </span><span style="color:#fa6e32;">let</span><span> item </span><span style="color:#ed9366;">=</span><span> item</span><span style="color:#ed9366;">?</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// ...
</span><span>}
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// break on first error:
</span><span style="color:#fa6e32;">for</span><span> item </span><span style="color:#ed9366;">in</span><span> iter {
</span><span> </span><span style="color:#fa6e32;">let</span><span> item </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">match</span><span> item {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(item) </span><span style="color:#ed9366;">=></span><span> item</span><span style="color:#61676ccc;">,
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Err</span><span>(</span><span style="color:#ed9366;">_</span><span>) </span><span style="color:#ed9366;">=> </span><span style="color:#fa6e32;">break</span><span style="color:#61676ccc;">,
</span><span> }</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// ...
</span><span>}
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// or we can skip over errors:
</span><span style="color:#fa6e32;">for</span><span> item </span><span style="color:#ed9366;">in</span><span> iter {
</span><span> </span><span style="color:#fa6e32;">let</span><span> item </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">match</span><span> item {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(item) </span><span style="color:#ed9366;">=></span><span> item</span><span style="color:#61676ccc;">,
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Err</span><span>(</span><span style="color:#ed9366;">_</span><span>) </span><span style="color:#ed9366;">=> </span><span style="color:#fa6e32;">continue</span><span style="color:#61676ccc;">,
</span><span> }</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// ...
</span><span>}
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// or even simpler, since `Result` implements `IntoIterator`:
</span><span style="color:#fa6e32;">for</span><span> item </span><span style="color:#ed9366;">in</span><span> iter</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">flatten</span><span>() {
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// ...
</span><span>}
</span></code></pre>
<p>We can also directly <code>collect</code> this into a <code>Result<Vec<T>, E></code> which might
or might not be useful.</p>
<h2 id="next-mut-self-result-option-t-e"><a class="anchor-link" href="#next-mut-self-result-option-t-e" aria-label="Anchor link for: next-mut-self-result-option-t-e">#</a>
<code>next(&mut self) -> Result<Option<T>, E></code></h2>
<p>This pattern is slightly more cumbersome to deal with, as it is not a <em>real</em>
<code>impl Iterator</code>. But there is even the <a href="https://docs.rs/fallible-iterator"><code>fallible-iterator</code> crate</a>
to make this more convenient.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="font-style:italic;color:#abb0b6;">// we can propagate errors early:
</span><span style="color:#fa6e32;">while let </span><span style="font-style:italic;color:#55b4d4;">Some</span><span>(item) </span><span style="color:#ed9366;">=</span><span> iter</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">next</span><span>()</span><span style="color:#ed9366;">? </span><span>{
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// ...
</span><span>}
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// stop on first error:
</span><span style="color:#fa6e32;">while let </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(</span><span style="font-style:italic;color:#55b4d4;">Some</span><span>(item)) </span><span style="color:#ed9366;">=</span><span> item</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">next</span><span>() {
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// ...
</span><span>}
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// or we can skip over errors in a weird way:
</span><span style="color:#fa6e32;">loop </span><span>{
</span><span> </span><span style="color:#fa6e32;">let</span><span> item </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">match</span><span> iter</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">next</span><span>() {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(</span><span style="font-style:italic;color:#55b4d4;">Some</span><span>(item)) </span><span style="color:#ed9366;">=></span><span> item</span><span style="color:#61676ccc;">,
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(</span><span style="font-style:italic;color:#55b4d4;">None</span><span>) </span><span style="color:#ed9366;">=> </span><span style="color:#fa6e32;">break</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ed9366;">_ => </span><span style="color:#fa6e32;">continue</span><span style="color:#61676ccc;">,
</span><span> }</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// ...
</span><span>}
</span></code></pre>
<h2 id="whats-the-difference"><a class="anchor-link" href="#whats-the-difference" aria-label="Anchor link for: whats-the-difference">#</a>
Whats the difference?</h2>
<p>Well not a whole lot to be honest. And I would say it comes down to a matter of
taste which variant people chose.</p>
<p>However, in my opinion they do express different intentions, let me explain.</p>
<p>To me, the <code>Option<Result<T, E>></code> pattern signals that the produces knows there
is something to parse, or phrased the other way around: the producer knows when
the end is reached without actively parsing something.</p>
<p>The <code>Result<Option<T>, E></code> pattern however says that the producer has no idea if
it is at the end, unless it tries to parse more, which can obviously fail.</p>
<h2 id="can-i-ignore-errors"><a class="anchor-link" href="#can-i-ignore-errors" aria-label="Anchor link for: can-i-ignore-errors">#</a>
Can I ignore errors?</h2>
<p>Well, its complicated.</p>
<p>If you want to be safe, the answer is no. You gotta propagate the first error
you see. As the two issues I linked in the beginning show, you can never
assume that the producer is well behaved. It might just return the same parse
error over and over again till infinity, which is a super bad place to be in.
Or it might skip ahead to the next parsable item. Or it might behave like a
"fused" Iterator and return <code>None</code> afterwards. You really can’t tell. And that
is the problem I wanted to highlight.</p>
<h1 id="conclusion"><a class="anchor-link" href="#conclusion" aria-label="Anchor link for: conclusion">#</a>
Conclusion</h1>
<p>Parsing is hard, errors will inevitably happen. How to deal with those depends
on the use cases. In general, I do have some tips from experience:</p>
<ul>
<li>As a <em>consumer</em>, assume the worst and always propagate errors by default.</li>
<li>If you want to be lenient, audit the producer code carefully to make sure it
recovers or terminates after errors correctly.</li>
<li>Be mindful that any update can break these assumptions!</li>
<li>If in doubt, error out.</li>
<li>As a <em>producer</em>, make sure that your iterators always terminate no matter what.</li>
<li>It might be good to have tests that just do <code>for _ in iter {}</code> or the equivalent
for <code>Result<Option<T>, E></code> to check that iterators do not loop infinitely.</li>
<li>The above test is perfect for fuzzing btw ;-)</li>
<li><em>Document</em> the behavior of your iterators!</li>
</ul>
<p>Well this is it. Which of these two pattern do you prefer? Do you agree with my
assessment about which intentions these patterns express?</p>
Self-referential structs and alternatives2022-05-01T00:00:00+00:002022-05-01T00:00:00+00:00
Unknown
https://swatinem.de/blog/self-reference-alternatives/<p>Today I want to talk about the need for better tools to work with self-referential
structs, and also (safe) alternatives that alleviate that need for certain use-cases.</p>
<p>Lets start by giving a very concrete example of what I want to achieve. I want
to get the <code>nth</code> line of a string <em>quickly</em>. The idiomatic way to do it would
be <code>string.lines().nth(nth)</code>, and it can hardly get any simpler than that. The
problem with it is that it is not really <em>quick</em>. It is an <code>O(n)</code> operation, as
it has to walk the string from beginning up to the end in the worst case when
there is no newline at all. We don’t want to be doing that in a tight loop.</p>
<p>We can trade some memory usage for speed by doing the line splitting once and
caching the result. With that, getting the <code>nth</code> line becomes a <code>O(1)</code> operation.</p>
<p>This could be as simple as <code>string.lines().collect::<Vec<_>>()</code>, or as a complete
example which is literally a piece a code <a href="https://swatinem.de/blog/self-reference-alternatives/bcsymbolmap">we use in production</a>:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">BorrowedCachedLines</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> lines</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span><</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data str</span><span>>,
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl</span><span><</span><span style="color:#fa6e32;">'data</span><span>> </span><span style="color:#399ee6;">BorrowedCachedLines</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">new</span><span>(</span><span style="color:#ff8f40;">string</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data str</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">Self </span><span>{
</span><span> </span><span style="color:#fa6e32;">Self </span><span>{
</span><span> lines</span><span style="color:#61676ccc;">:</span><span> string</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">lines</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">collect</span><span>()</span><span style="color:#61676ccc;">,
</span><span> }
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">nth</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">line</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="font-style:italic;color:#55b4d4;">Option</span><span><</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">str</span><span>> {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>lines</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get</span><span>(line)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">copied</span><span>()
</span><span> }
</span><span>}
</span></code></pre>
<p>The problem with that piece of code is that it has a lifetime, and is thus not
self-contained, and we thus can’t capture it in an <code>async move</code> future that we
want to spawn.</p>
<h1 id="unsafe-self-references"><a class="anchor-link" href="#unsafe-self-references" aria-label="Anchor link for: unsafe-self-references">#</a>
Unsafe self-references</h1>
<p>As I have shown last time, <a href="https://swatinem.de/blog/magic-asref/"><code>AsRef</code> is magic</a>
and we can use it to create self-contained structs. However, our <code>Vec<&'data str></code>
has to have <em>some</em> lifetime. The closest one to “no lifetime” would be the
<code>'static</code> lifetime, which is special as we can put a <code>'static</code> reference into
a self-contained struct.</p>
<p>Lets try using <code>AsRef<str></code> and combine it with a <code>Vec<&'static str></code>. This
clearly won’t work, but lets try it anyway to see if the compiler helps us out.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">AsRefUnsafeCachedLines</span><span><Buf> {
</span><span> buf</span><span style="color:#61676ccc;">:</span><span> Buf,
</span><span> lines</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span><</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'static str</span><span>>,
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl</span><span><Buf</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">AsRef</span><span><</span><span style="color:#fa6e32;">str</span><span>>> </span><span style="color:#399ee6;">AsRefUnsafeCachedLines</span><span><Buf> {
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">new</span><span>(</span><span style="color:#ff8f40;">buf</span><span style="color:#61676ccc;">:</span><span> Buf) </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">Self </span><span>{
</span><span> </span><span style="color:#fa6e32;">let</span><span> lines </span><span style="color:#ed9366;">=</span><span> buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ref</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">lines</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">collect</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">Self </span><span>{ buf</span><span style="color:#61676ccc;">,</span><span> lines }
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">nth</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">line</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="font-style:italic;color:#55b4d4;">Option</span><span><</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">str</span><span>> {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>lines</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get</span><span>(line)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">copied</span><span>()
</span><span> }
</span><span>}
</span></code></pre>
<p>This will give us the following compiler error, which I must admit is confusing
and especially the “help” annotation does not help at all.</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>error[E0310]: the parameter type `Buf` may not live long enough
</span><span> --> playground\self-referential\src\lib.rs:24:21
</span><span> |
</span><span>22 | impl<Buf: AsRef<str>> AsRefUnsafeCachedLines<Buf> {
</span><span> | ---- help: consider adding an explicit lifetime bound...: `Buf: 'static +`
</span><span>23 | pub fn new(buf: Buf) -> Self {
</span><span>24 | let lines = buf.as_ref().lines().collect();
</span><span> | ^^^ ...so that the type `Buf` is not borrowed for too long
</span></code></pre>
<p>Lets try the “help” anyway and see how that changes things:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>error[E0597]: `buf` does not live long enough
</span><span> --> playground\self-referential\src\lib.rs:24:21
</span><span> |
</span><span>24 | let lines = buf.as_ref().lines().collect();
</span><span> | ^^^^^^^^^^^^
</span><span> | |
</span><span> | borrowed value does not live long enough
</span><span> | argument requires that `buf` is borrowed for `'static`
</span><span>25 | Self { buf, lines }
</span><span>26 | }
</span><span> | - `buf` dropped here while still borrowed
</span><span>
</span><span>error[E0505]: cannot move out of `buf` because it is borrowed
</span><span> --> playground\self-referential\src\lib.rs:25:16
</span><span> |
</span><span>24 | let lines = buf.as_ref().lines().collect();
</span><span> | ------------
</span><span> | |
</span><span> | borrow of `buf` occurs here
</span><span> | argument requires that `buf` is borrowed for `'static`
</span><span>25 | Self { buf, lines }
</span><span> | ^^^ move out of `buf` occurs here
</span></code></pre>
<p>While having two error messages here does not make too much sense either, at
least they are better than than before and hint at the problem: Our <code>lines</code>
has a lifetime that is tied to <code>buf</code>, but it is required to have a <code>'static</code>
lifetime.</p>
<p>What we essentially want to do is to turn our <code>&'data str</code> into a <code>&'static str</code>,
which for obvious reasons is not allowed and we need to resort to unsafe code.</p>
<p>We can essentially create a <code>&'static str</code> “out of thin air” by using <code>from_raw_parts</code>:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">AsRefUnsafeCachedLines</span><span><Buf> {
</span><span> buf</span><span style="color:#61676ccc;">:</span><span> Buf,
</span><span> lines</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span><</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'static str</span><span>>,
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl</span><span><Buf</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">AsRef</span><span><</span><span style="color:#fa6e32;">str</span><span>>> </span><span style="color:#399ee6;">AsRefUnsafeCachedLines</span><span><Buf> {
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">new</span><span>(</span><span style="color:#ff8f40;">buf</span><span style="color:#61676ccc;">:</span><span> Buf) </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">Self </span><span>{
</span><span> </span><span style="color:#fa6e32;">let</span><span> lines </span><span style="color:#ed9366;">=</span><span> buf
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ref</span><span>()
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">lines</span><span>()
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(|</span><span style="color:#ff8f40;">s</span><span>| </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="color:#f07171;">make_static</span><span>(s) })
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">collect</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">Self </span><span>{ buf</span><span style="color:#61676ccc;">,</span><span> lines }
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">nth</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">line</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="font-style:italic;color:#55b4d4;">Option</span><span><</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">str</span><span>> {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>lines</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get</span><span>(line)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">copied</span><span>()
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">unsafe fn </span><span style="color:#f29718;">make_static</span><span>(</span><span style="color:#ff8f40;">s</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">str</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'static str </span><span>{
</span><span> </span><span style="color:#fa6e32;">let</span><span> ptr </span><span style="color:#ed9366;">=</span><span> s</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> len </span><span style="color:#ed9366;">=</span><span> s</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">len</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> static_slice</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'static </span><span>[</span><span style="color:#fa6e32;">u8</span><span>] </span><span style="color:#ed9366;">= </span><span>std</span><span style="color:#ed9366;">::</span><span>slice</span><span style="color:#ed9366;">::</span><span>from_raw_parts(ptr</span><span style="color:#61676ccc;">,</span><span> len)</span><span style="color:#61676ccc;">;
</span><span> std</span><span style="color:#ed9366;">::</span><span>str</span><span style="color:#ed9366;">::</span><span>from_utf8_unchecked(static_slice)
</span><span>}
</span></code></pre>
<p>This example works, but the compiler says something very interesting:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>warning: field is never read: `buf`
</span><span> --> playground\self-referential\src\lib.rs:18:5
</span><span> |
</span><span>18 | buf: Buf,
</span><span> | ^^^^^^^^
</span><span> |
</span><span> = note: `#[warn(dead_code)]` on by default
</span></code></pre>
<p>It is completely right. We never use <code>buf</code> directly. It only exists to, well,
<em>exist</em>.</p>
<p>As a reminder, this code is extremely unsafe! While we are dealing with <code>&str</code>
in our example, remember that a type such as <code>[u8; N]</code> implements <code>AsRef<[u8]></code>.
I leave it as an exercise for the reader to reproduce the case that moving a
stack allocated <code>[u8; N]</code> will make you read from a dangling pointer.</p>
<p>Having something along the lines of <code>StableAsRef<T></code> would help here. By which
I mean any type that guarantees that the reference returned by <code>as_ref()</code> does
not change whenever the type itself is moved. <code>Box<[T]></code> and <code>Vec<T></code> would
implement this, but not <code>[T; N]</code>. A trait with such a guarantee would make it
possible to write a safe abstraction provided that code behind the abstraction
upholds the safety invariants (as in: not moving/modifying the data in the buffer).</p>
<h1 id="almost-safe-alternatives"><a class="anchor-link" href="#almost-safe-alternatives" aria-label="Anchor link for: almost-safe-alternatives">#</a>
(Almost) Safe alternatives</h1>
<p>The lifetime issues and the need for unsafe code to work around them come from
the fact that we are caching Rust references (aka pointers). What if we cache
sub-slice offsets into our source buffer instead? Lets try that.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">SafeCachedLines</span><span><Buf> {
</span><span> buf</span><span style="color:#61676ccc;">:</span><span> Buf,
</span><span> line_offsets</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span><std</span><span style="color:#ed9366;">::</span><span>ops</span><span style="color:#ed9366;">::</span><span>Range<</span><span style="color:#fa6e32;">usize</span><span>>>,
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl</span><span><Buf</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">AsRef</span><span><</span><span style="color:#fa6e32;">str</span><span>>> </span><span style="color:#399ee6;">SafeCachedLines</span><span><Buf> {
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">new</span><span>(</span><span style="color:#ff8f40;">buf</span><span style="color:#61676ccc;">:</span><span> Buf) </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">Self </span><span>{
</span><span> </span><span style="color:#fa6e32;">let</span><span> as_ref </span><span style="color:#ed9366;">=</span><span> buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ref</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> start_ptr </span><span style="color:#ed9366;">=</span><span> as_ref</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> line_offsets </span><span style="color:#ed9366;">=</span><span> as_ref
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">lines</span><span>()
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(|</span><span style="color:#ff8f40;">s</span><span>| {
</span><span> </span><span style="color:#fa6e32;">let</span><span> start </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{ s</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">offset_from</span><span>(start_ptr) </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize </span><span>}</span><span style="color:#61676ccc;">;
</span><span> start</span><span style="color:#ed9366;">..</span><span>start </span><span style="color:#ed9366;">+</span><span> s</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">len</span><span>()
</span><span> })
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">collect</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">Self </span><span>{ buf</span><span style="color:#61676ccc;">,</span><span> line_offsets }
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">nth</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">line</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="font-style:italic;color:#55b4d4;">Option</span><span><</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">str</span><span>> {
</span><span> </span><span style="color:#fa6e32;">let</span><span> range </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>line_offsets</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get</span><span>(line)</span><span style="color:#ed9366;">?.</span><span style="color:#f07171;">clone</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ref</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get</span><span>(range)
</span><span> }
</span><span>}
</span></code></pre>
<p>NOTE: While working on this example, I was a bit surprised there is no safe
function in the standard library for “give me the <code>Range</code> of a sub-string if it
is contained within <code>self</code>”. I’m also surprised that <code>offset_from</code> is <code>unsafe</code>,
but reading the documentation though makes it clear why.</p>
<p>Anyhow, the added safety means that we have to do an additional slice operation
on access with the added overhead of a bounds check.</p>
<h1 id="optimizing-offsets"><a class="anchor-link" href="#optimizing-offsets" aria-label="Anchor link for: optimizing-offsets">#</a>
Optimizing offsets</h1>
<p>Depending on the tradeoffs we want to make, we can even go one step further and
try to optimize our lookup index a bit. A full <code>Range<usize></code> has <em>16</em> bytes on
64-bit platforms. We can get that down to <em>4</em> bytes if we assume our input
buffer is smaller than 4 GiB, which, if we are dealing in plain text is a lot.</p>
<p>The tradeoff here however is some added error handling when constructing the
cache, and a few more bounds checks and stripping trailing line terminators
when accessing. Here is the full example:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">CompressedCachedLines</span><span><Buf> {
</span><span> buf</span><span style="color:#61676ccc;">:</span><span> Buf,
</span><span> line_offsets</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span><</span><span style="color:#fa6e32;">u32</span><span>>,
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl</span><span><Buf</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">AsRef</span><span><</span><span style="color:#fa6e32;">str</span><span>>> </span><span style="color:#399ee6;">CompressedCachedLines</span><span><Buf> {
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">new</span><span>(</span><span style="color:#ff8f40;">buf</span><span style="color:#61676ccc;">:</span><span> Buf) </span><span style="color:#61676ccc;">-> </span><span style="font-style:italic;color:#55b4d4;">Result</span><span><</span><span style="color:#fa6e32;">Self</span><span>, std</span><span style="color:#ed9366;">::</span><span>num</span><span style="color:#ed9366;">::</span><span>TryFromIntError> {
</span><span> </span><span style="color:#fa6e32;">let</span><span> as_ref </span><span style="color:#ed9366;">=</span><span> buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ref</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> start_ptr </span><span style="color:#ed9366;">=</span><span> as_ref</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> line_offsets </span><span style="color:#ed9366;">=</span><span> as_ref
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">lines</span><span>()
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(|</span><span style="color:#ff8f40;">s</span><span>| </span><span style="color:#fa6e32;">unsafe </span><span>{ s</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">offset_from</span><span>(start_ptr)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">try_into</span><span>() })
</span><span> </span><span style="color:#ed9366;">.</span><span>collect</span><span style="color:#ed9366;">::</span><span><</span><span style="font-style:italic;color:#55b4d4;">Result</span><span><</span><span style="font-style:italic;color:#55b4d4;">Vec</span><span><</span><span style="color:#ed9366;">_</span><span>>, </span><span style="color:#ed9366;">_</span><span>>>()</span><span style="color:#ed9366;">?</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(</span><span style="color:#fa6e32;">Self </span><span>{ buf</span><span style="color:#61676ccc;">,</span><span> line_offsets })
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">nth</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">line</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="font-style:italic;color:#55b4d4;">Option</span><span><</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">str</span><span>> {
</span><span> </span><span style="color:#fa6e32;">let</span><span> buf </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ref</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> start </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>line_offsets</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get</span><span>(line)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">copied</span><span>()</span><span style="color:#ed9366;">? as </span><span style="color:#fa6e32;">usize</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> end </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">self
</span><span> </span><span style="color:#ed9366;">.</span><span>line_offsets
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get</span><span>(line</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">checked_add</span><span>(</span><span style="color:#ff8f40;">1</span><span>)</span><span style="color:#ed9366;">?</span><span>)
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(|</span><span style="color:#ff8f40;">offset</span><span>| </span><span style="color:#ed9366;">*</span><span>offset </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span>)
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap_or</span><span>(buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">len</span><span>())</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#fa6e32;">let</span><span> line_buf </span><span style="color:#ed9366;">=</span><span> buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get</span><span>(start</span><span style="color:#ed9366;">..</span><span>end)</span><span style="color:#ed9366;">?</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> line_buf </span><span style="color:#ed9366;">=</span><span> line_buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">strip_suffix</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#4cbf99;">\r\n</span><span style="color:#86b300;">"</span><span>)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap_or</span><span>(line_buf)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> line_buf </span><span style="color:#ed9366;">=</span><span> line_buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">strip_suffix</span><span>(</span><span style="color:#86b300;">'</span><span style="color:#4cbf99;">\n</span><span style="color:#86b300;">'</span><span>)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap_or</span><span>(line_buf)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Some</span><span>(line_buf)
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">test</span><span>]
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">compressed</span><span>() {
</span><span> </span><span style="color:#fa6e32;">let</span><span> string </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">String</span><span style="color:#ed9366;">::</span><span>from(</span><span style="color:#86b300;">"some</span><span style="color:#4cbf99;">\r\n</span><span style="color:#86b300;">trailing</span><span style="color:#4cbf99;">\r\r\n</span><span style="color:#86b300;">lines"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> cached_lines </span><span style="color:#ed9366;">= </span><span>CompressedCachedLines</span><span style="color:#ed9366;">::</span><span>new(string)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(cached_lines</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">nth</span><span>(</span><span style="color:#ff8f40;">0</span><span>)</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#55b4d4;">Some</span><span>(</span><span style="color:#86b300;">"some"</span><span>))</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(cached_lines</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">nth</span><span>(</span><span style="color:#ff8f40;">1</span><span>)</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#55b4d4;">Some</span><span>(</span><span style="color:#86b300;">"trailing</span><span style="color:#4cbf99;">\r</span><span style="color:#86b300;">"</span><span>))</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(cached_lines</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">nth</span><span>(</span><span style="color:#ff8f40;">2</span><span>)</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#55b4d4;">Some</span><span>(</span><span style="color:#86b300;">"lines"</span><span>))</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>Having such a list of line offsets allows us to not only get the <code>nth</code> line with
<code>O(1)</code> time complexity (just a constant number of bounds checks, and some
constant-time <code>strip_suffix</code>), it also allows us a reverse lookup from byte
offset to line number by doing a <code>O(log N)</code> binary search bounded by the number
of lines we have, which is an added benefit.</p>
<h1 id="conclusion"><a class="anchor-link" href="#conclusion" aria-label="Anchor link for: conclusion">#</a>
Conclusion</h1>
<p>The last example is very close to my real world use-case. I essentially want to
<code>mmap</code> such indexed files directly from disk. The mentioned bounds checks are
very necessary in that case since I would be dealing with untrusted, possibly
malicious data in that case.</p>
The magic of AsRef2022-04-20T00:00:00+00:002022-04-20T00:00:00+00:00
Unknown
https://swatinem.de/blog/magic-asref/<p>Both at work, and also personally, I do think about efficient parsers and data
formats a lot. Some time ago, I also wrote an article about
<a href="https://swatinem.de/blog/binary-formats/">writing a custom binary format</a> and
associated parser. That exercise started something like this:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C)]
</span><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">Header </span><span>{
</span><span> version</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> num_a</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> num_b</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">Format</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> buf</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> [</span><span style="color:#fa6e32;">u8</span><span>],
</span><span> header</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> Header,
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl</span><span><</span><span style="color:#fa6e32;">'data</span><span>> </span><span style="color:#399ee6;">Format</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">parse</span><span>(</span><span style="color:#ff8f40;">buf</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> [</span><span style="color:#fa6e32;">u8</span><span>]) </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">Self </span><span>{
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// TODO:
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// * actually verify the version
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// * ensure the buffer is actually valid
</span><span> Format {
</span><span> buf</span><span style="color:#61676ccc;">,
</span><span> header</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="color:#ed9366;">&*</span><span>(buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>() </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const</span><span> Header) }</span><span style="color:#61676ccc;">,
</span><span> }
</span><span> }
</span><span>}
</span></code></pre>
<p>While this works perfectly fine, and the <code>Format</code> is truly zero-copy, it does
have one major drawback. It has the lifetime parameter <code>'data</code>, and is thus not
<code>'static</code>. I can’t capture it by an <code>async move</code> closure and <code>tokio::spawn</code> it.
Also for reasons that I must admit I don’t fully understand, trait objects also
always carry an explicit <code>'static</code> bound on them. Well, although now thinking
about this again, is becomes a bit more obvious to me. If I want to package
up a callback function into a struct of mine that does not carry a lifetime
itself, I have to use a <code>Box<dyn Fn() + 'static></code> or equivalent container.</p>
<p>Either way, for various reasons, we want to have fully “self-owned” types that
are <code>'static</code>, and our example <code>Format</code> above is not self-contained.</p>
<p>There are a couple of different approaches to this, but what I have found as
the go-to solution which offers the most flexibility to API users might be to
use <code>AsRef<T></code>, and in our specific case <code>AsRef<[u8]></code>, so lets try to use that.</p>
<p>Without further ado, here is the finished demo code, along with tests that
ensure things work as intended, and that our final <code>Format</code> is indeed <code>'static</code>.
We can use any kind of underlying buffer type, no matter if its an array, a <code>Vec</code>,
a <code>Cow</code> or a memory mapped file, as long as it implements <code>AsRef<[u8]></code>.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">use </span><span>core</span><span style="color:#ed9366;">::</span><span>{mem</span><span style="color:#61676ccc;">,</span><span> ptr}</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C)]
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Clone</span><span style="color:#61676ccc;">,</span><span> Copy)]
</span><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">Header </span><span>{
</span><span> version</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> num_a</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> num_b</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">Format</span><span><Buf> {
</span><span> buf</span><span style="color:#61676ccc;">:</span><span> Buf,
</span><span> header</span><span style="color:#61676ccc;">:</span><span> Header,
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C)]
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Debug</span><span style="color:#61676ccc;">,</span><span> PartialEq</span><span style="color:#61676ccc;">,</span><span> Eq)]
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">A</span><span>(</span><span style="color:#fa6e32;">u32</span><span>)</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C)]
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Debug</span><span style="color:#61676ccc;">,</span><span> PartialEq</span><span style="color:#61676ccc;">,</span><span> Eq)]
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">B</span><span>(</span><span style="color:#fa6e32;">u32</span><span>)</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">impl</span><span><Buf</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">AsRef</span><span><[</span><span style="color:#fa6e32;">u8</span><span>]>> </span><span style="color:#399ee6;">Format</span><span><Buf> {
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">parse</span><span>(</span><span style="color:#ff8f40;">buf</span><span style="color:#61676ccc;">:</span><span> Buf) </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">Self </span><span>{
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// TODO:
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// * actually verify the version
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// * ensure the buffer is actually valid
</span><span> </span><span style="color:#fa6e32;">let</span><span> header </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="color:#ed9366;">*</span><span>(buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ref</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>() </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const</span><span> Header) }</span><span style="color:#61676ccc;">;
</span><span> Format { buf</span><span style="color:#61676ccc;">,</span><span> header }
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">into_inner</span><span>(</span><span style="color:#ff8f40;">self</span><span>) </span><span style="color:#61676ccc;">-></span><span> Buf {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>buf
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">get_as</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#ff8f40;">self</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="color:#ed9366;">&</span><span>[A] {
</span><span> </span><span style="color:#fa6e32;">let</span><span> a_start </span><span style="color:#ed9366;">=
</span><span> </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ref</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">add</span><span>(mem</span><span style="color:#ed9366;">::</span><span>size_of</span><span style="color:#ed9366;">::</span><span><Header>()) </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const</span><span> A }</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> a_slice </span><span style="color:#ed9366;">= </span><span>ptr</span><span style="color:#ed9366;">::</span><span>slice_from_raw_parts(a_start</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>header</span><span style="color:#ed9366;">.</span><span>num_a </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="color:#ed9366;">&*</span><span>a_slice }
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">get_bs</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#ff8f40;">self</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="color:#ed9366;">&</span><span>[B] {
</span><span> </span><span style="color:#fa6e32;">let</span><span> b_start </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{
</span><span> </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>buf
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ref</span><span>()
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>()
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">add</span><span>(mem</span><span style="color:#ed9366;">::</span><span>size_of</span><span style="color:#ed9366;">::</span><span><Header>())
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">add</span><span>(mem</span><span style="color:#ed9366;">::</span><span>size_of</span><span style="color:#ed9366;">::</span><span><A>() </span><span style="color:#ed9366;">* </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>header</span><span style="color:#ed9366;">.</span><span>num_a </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span>) </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const</span><span> B
</span><span> }</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> b_slice </span><span style="color:#ed9366;">= </span><span>ptr</span><span style="color:#ed9366;">::</span><span>slice_from_raw_parts(b_start</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>header</span><span style="color:#ed9366;">.</span><span>num_b </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="color:#ed9366;">&*</span><span>b_slice }
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">test</span><span>]
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">format_works</span><span>() {
</span><span> </span><span style="color:#fa6e32;">use </span><span>std</span><span style="color:#ed9366;">::</span><span>borrow</span><span style="color:#ed9366;">::</span><span>Cow</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">is_static</span><span><T</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">'static</span><span>>(</span><span style="color:#ed9366;">_</span><span>: </span><span style="color:#ed9366;">&</span><span>T) {}
</span><span>
</span><span> </span><span style="color:#fa6e32;">let</span><span> array_buf</span><span style="color:#61676ccc;">: </span><span>[</span><span style="color:#fa6e32;">u8</span><span style="color:#61676ccc;">; </span><span style="color:#ff8f40;">24</span><span>] </span><span style="color:#ed9366;">= </span><span>[
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// there are all little-endian:
</span><span> </span><span style="color:#ff8f40;">1</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// version
</span><span> </span><span style="color:#ff8f40;">1</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// num_a
</span><span> </span><span style="color:#ff8f40;">2</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// num_b
</span><span> </span><span style="color:#ff8f40;">3</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[0]
</span><span> </span><span style="color:#ff8f40;">4</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// b[0]
</span><span> </span><span style="color:#ff8f40;">5</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// b[1]
</span><span> ]</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#fa6e32;">let</span><span> parsed</span><span style="color:#61676ccc;">: </span><span>Format<[</span><span style="color:#fa6e32;">u8</span><span style="color:#61676ccc;">; </span><span style="color:#ff8f40;">24</span><span>]> </span><span style="color:#ed9366;">= </span><span>Format</span><span style="color:#ed9366;">::</span><span>parse(array_buf)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">is_static</span><span>(</span><span style="color:#ed9366;">&</span><span>parsed)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_as</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>[A(</span><span style="color:#ff8f40;">3</span><span>)])</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_bs</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>[B(</span><span style="color:#ff8f40;">4</span><span>)</span><span style="color:#61676ccc;">,</span><span> B(</span><span style="color:#ff8f40;">5</span><span>)])</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#fa6e32;">let</span><span> vec_buf </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span style="color:#ed9366;">::</span><span>from(array_buf)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> parsed</span><span style="color:#61676ccc;">: </span><span>Format<</span><span style="font-style:italic;color:#55b4d4;">Vec</span><span><</span><span style="color:#ed9366;">_</span><span>>> </span><span style="color:#ed9366;">= </span><span>Format</span><span style="color:#ed9366;">::</span><span>parse(vec_buf)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">is_static</span><span>(</span><span style="color:#ed9366;">&</span><span>parsed)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_as</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>[A(</span><span style="color:#ff8f40;">3</span><span>)])</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_bs</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>[B(</span><span style="color:#ff8f40;">4</span><span>)</span><span style="color:#61676ccc;">,</span><span> B(</span><span style="color:#ff8f40;">5</span><span>)])</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> vec_buf </span><span style="color:#ed9366;">=</span><span> parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">into_inner</span><span>()</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#fa6e32;">let</span><span> cow_buf</span><span style="color:#61676ccc;">: </span><span>Cow<</span><span style="color:#fa6e32;">'static</span><span>, [</span><span style="color:#fa6e32;">u8</span><span>]> </span><span style="color:#ed9366;">= </span><span>Cow</span><span style="color:#ed9366;">::</span><span>Owned(vec_buf)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> parsed</span><span style="color:#61676ccc;">: </span><span>Format<Cow<</span><span style="color:#ed9366;">_</span><span>>> </span><span style="color:#ed9366;">= </span><span>Format</span><span style="color:#ed9366;">::</span><span>parse(cow_buf)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">is_static</span><span>(</span><span style="color:#ed9366;">&</span><span>parsed)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_as</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>[A(</span><span style="color:#ff8f40;">3</span><span>)])</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_bs</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>[B(</span><span style="color:#ff8f40;">4</span><span>)</span><span style="color:#61676ccc;">,</span><span> B(</span><span style="color:#ff8f40;">5</span><span>)])</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#fa6e32;">let</span><span> slice_buf</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span>[</span><span style="color:#fa6e32;">u8</span><span>] </span><span style="color:#ed9366;">= &</span><span>array_buf</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> parsed</span><span style="color:#61676ccc;">: </span><span>Format<</span><span style="color:#ed9366;">&</span><span>[</span><span style="color:#fa6e32;">u8</span><span>]> </span><span style="color:#ed9366;">= </span><span>Format</span><span style="color:#ed9366;">::</span><span>parse(slice_buf)</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// is_static(&parsed);
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// ^ this would fail with:
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// error[E0597]: `array_buf` does not live long enough
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// --> playground/asref/src/lib.rs:89:28
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// |
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// 89 | let slice_buf: &[u8] = &array_buf;
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// | ^^^^^^^^^^
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// | |
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// | borrowed value does not live long enough
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// | cast requires that `array_buf` is borrowed for `'static`
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// ...
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// 94 | }
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// | - `array_buf` dropped here while still borrowed
</span><span>
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_as</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>[A(</span><span style="color:#ff8f40;">3</span><span>)])</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_bs</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>[B(</span><span style="color:#ff8f40;">4</span><span>)</span><span style="color:#61676ccc;">,</span><span> B(</span><span style="color:#ff8f40;">5</span><span>)])</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>The one shortcoming that this format has though is that it is not fully zero-copy
anymore. The <code>parse()</code> method does copy the header bytes. In order not to do that,
we would need to have better (and safe) ways to declare self-referencial structs.
But that is a topic for another post ;-)</p>
Dreaming of a balanced week2022-02-20T00:00:00+00:002022-02-20T00:00:00+00:00
Unknown
https://swatinem.de/blog/balanced-weeks/<p>As an engineer by heart, I sometimes think of social engineering challenges.
Or rather, how to approach some social problems with out of the box thinking.</p>
<p>This time, I want to reflect of some ideas on how to solve the problem an
unbalanced work week creates.</p>
<h1 id="problem-statement"><a class="anchor-link" href="#problem-statement" aria-label="Anchor link for: problem-statement">#</a>
Problem Statement</h1>
<p>To understand what I mean by that, lets first describe the current
unsatisfactory state.</p>
<p>I think we have a resource allocation problem, and the resource being time.
Considering someone who works a traditional nine-to-five (or 10 to 6) job, there
are only very limited time slots available to do chores and errands, and leisure
time. This leads to the very real problem that grocery stores are extremely
congested on saturdays, same as leisure activities such as thermal spas or
hiking routes.</p>
<p>These same places are mostly empty on weekdays, when the majority of the
population is working. The only people able to enjoy these activities are
retirees, school children during vacation, and tourists.</p>
<p>The same happens for traffic, both public and individual. Streets and public
transport are completely congested at certain weekdays and times of day, whereas
it is not a problem on others. Same for the availability of parking.</p>
<p>As a member of the "working class", I only have the ability to do certain things
at the same time when <em>everyone else</em> does. I have always felt uncomfortable in
large crowds. And if the pandemic has taught us anything, it should be that
large crowds of people is not a good thing in general, and should be avoided.</p>
<h1 id="lets-start-with-the-obvious"><a class="anchor-link" href="#lets-start-with-the-obvious" aria-label="Anchor link for: lets-start-with-the-obvious">#</a>
Lets start with the obvious</h1>
<p>So how do we work towards a proposal to solve this congestion and allocation
problem?</p>
<p>Lets start with the obvious and reconsider the standard full-time nine-to-five
workday. I very much appreciate that multiple european countries are
experimenting with lowering the weekly work hours, and introducing 4-day work
weeks.</p>
<p>As for myself, my current job is the first one where I work full-time for an
extended period of time. Previously I was only doing part-time work, in the
range of 20-30 hour weeks.</p>
<p>As I write this article, I wanted to comment on how I’m surprised myself that
I cope with this so well considering. But then I remember that I do suffer a
bit from the “I don’t have time for anything” anxiety. Not to mention that the
time I spent on recreation the past 2-3 years was essentially zero, and my
health and fitness suffered a lot.</p>
<p>Suffice it to say, the standard 40-hour work week just does not offer a good
work-life balance; and we need to re-think it. Also, with less hours per-person,
we can hire two people to do the job of one. Which creates jobs, and is a good
thing. It also helps increasing the Bus-factor.</p>
<h1 id="everything-anytime"><a class="anchor-link" href="#everything-anytime" aria-label="Anchor link for: everything-anytime">#</a>
Everything, Anytime</h1>
<p>Once we move from a 5 day work week to 4 or even 3 days, the next step is to
more evenly distribute that time. The current congestion problem comes from the
fact that everyone has weekends on the same days.</p>
<p>Along with less working hours, we need total freedom to chose <em>when</em> to work.
The goal here is to have everything available, anytime of the week. You can go
hiking, do groceries, or visit a spa, or museum any day of the week.</p>
<p>It also means that work is being done every day of the week. Which is also good
for companies.</p>
<h1 id="schools"><a class="anchor-link" href="#schools" aria-label="Anchor link for: schools">#</a>
Schools</h1>
<p>The freedom to chose might not work as well for the education sector which needs
a more rigid framework. For this to work, I propose to at the very least keep
one grade on the same schedule. I also propose to stagger classes, so a
first-graders schedule is shifted by one day compared to a second-grader.
Depending if we go with a 4-day or 3-day week, you can have friends 2 grades
above or below and share 1 or 2 days in common.</p>
<h1 id="can-we-make-it"><a class="anchor-link" href="#can-we-make-it" aria-label="Anchor link for: can-we-make-it">#</a>
Can we make it?</h1>
<p>To be honest, I think this proposal is super simple, and might work. The
question is how to get there?</p>
<p>As I started this exploration, I mentioned the <em>thinking outside the box</em> mindset.
I think the reason we have this weekday/weekend split is simply
<em>because it was always done this way</em>. In christian societies, this might come
from the fact that even the Bible said that God rested on the seventh day, and
thats why we shouldn’t work on sundays. Well, can we maybe abandon this thinking,
as it clearly does not serve us in modern times?</p>
<hr />
<p>To be honest, I think the only policy change that needs to happen is to lift
restrictions, everything else should self-organize.
Currently we have restrictions that grocery stores are prohibited by law to open
on certain days. And also labor laws restrict people from freely choosing when
to work.</p>
<p>I for one would gladly work on a Sunday, if that means I can have any weekday
off to do chores and enjoy some un-crowded leisure time.</p>
<p>Lifting restrictions would be step one. Policy makers could also go one step
further and mandate that everything works on every day. Things should
self-organize in that situation as well. Employers will create financial
incentives to encourage employees to work on “unorthodox” days, and thats that.</p>
<h1 id="but"><a class="anchor-link" href="#but" aria-label="Anchor link for: but">#</a>
But…</h1>
<p>There are tons of open questions, sure. <em>When should we do meetings?</em> might be a
questions. <em>This meeting should have been an email</em> is the obvious answer to
that. But okay, in all seriousness, there should be plenty of overlap
opportunities.</p>
<p>I think the biggest obstacle is changing the mindset of people. The problem with
the <em>it was always done this way</em> attitude is that people can’t even imagine how
things could be different. Lets be open to new ideas, and to think outside the
box.</p>
Non-Lazy Futures Considered Harmful2022-01-26T00:00:00+00:002022-01-26T00:00:00+00:00
Unknown
https://swatinem.de/blog/non-lazy-futures/<p>Now that I got your attention with the clickbait title, let me explain what I
mean by it. Current Rust code could be broken in very subtle ways because of
some assumptions we have about async Rust code that might not always be true.</p>
<p>This story starts with
<a href="https://github.com/getsentry/sentry-rust/pull/417">a bug I recently fixed in <code>sentry-rust</code></a>
(which manifested itself as a memory leak), and itself highlights both sides of
the problem.</p>
<p>The root cause of the problem is not specific to the Sentry SDK however, and it
can very easily happen in other cases as well. I will use examples from the
<code>tracing</code> ecosystem down below, as that might have a wider user base than Sentry.</p>
<p>The problem boils down to the very subtle difference between <code>async fn X()</code> and
<code>fn X() -> impl Future</code>, and the fact that the second is not guaranteed to be
fully lazy.</p>
<p>An <code>async fn</code> is by definition lazy. Calling it via <code>async_fn()</code> only captures
the arguments into an anonymous <code>Future</code> type, but is otherwise guaranteed to
not execute the body of the function <em>yet</em>, but rather later on <code>poll</code>.
This is not the case for a function that returns an <code>impl Future</code>, and by
extension for trait functions that return (generic or named) futures.
These functions can have a varying amount of code that runs at <strong>call time</strong>,
vs <strong>poll time</strong>.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">fully_lazy</span><span>() </span><span style="color:#61676ccc;">-></span><span> impl Future<Output = ()> {
</span><span> async </span><span style="color:#fa6e32;">move </span><span>{
</span><span> </span><span style="color:#f07171;">println!</span><span>(</span><span style="color:#86b300;">"print happens on *poll*"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">not_lazy</span><span>() </span><span style="color:#61676ccc;">-></span><span> impl Future<Output = ()> {
</span><span> </span><span style="color:#f07171;">println!</span><span>(</span><span style="color:#86b300;">"print happens on *call*"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> std</span><span style="color:#ed9366;">::</span><span>future</span><span style="color:#ed9366;">::</span><span>ready(())
</span><span>}
</span></code></pre>
<p>So if possible, move <em>all</em> of the code into the async block. But that might not
always be possible.</p>
<h1 id="aside-named-types"><a class="anchor-link" href="#aside-named-types" aria-label="Anchor link for: aside-named-types">#</a>
Aside: Named Types</h1>
<p>We do have the problem right now that <code>async fn</code> returns an anonymous type.
Returning <code>impl Future</code> has the same problem. You cannot <em>name</em> the type in
stable Rust today. And named types are required when you want to put a future
on a <code>struct</code> or into a collection. It is also considered good practice in
general to name every public API type of a crate, for that reason.</p>
<p>The same principle is also the reason why every <code>Iterator</code> combinator returns
its own named type. And it is the reason why there is
<a href="https://docs.rs/tracing/latest/tracing/instrument/struct.Instrumented.html"><code>tracing::instrument::Instrumented</code></a>
and <a href="https://docs.rs/sentry/latest/sentry/struct.SentryFuture.html"><code>sentry::SentryFuture</code></a>.</p>
<p>Nightly Rust offers the
<a href="https://github.com/rust-lang/rust/issues/63063"><code>type_alias_impl_trait</code> feature</a>,
which allows giving names to otherwise anonymous futures. Being a nightly feature, it is not yet available in stable Rust unfortunately.</p>
<p>The next best thing we can use on stable Rust is <code>Pin<Box<dyn Future>></code>, like so:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">dyn_future</span><span>() </span><span style="color:#61676ccc;">-> </span><span>Pin<</span><span style="font-style:italic;color:#55b4d4;">Box</span><span><dyn Future<Output = ()>>> {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Box</span><span style="color:#ed9366;">::</span><span>pin(async </span><span style="color:#fa6e32;">move </span><span>{
</span><span> </span><span style="color:#f07171;">println!</span><span>(</span><span style="color:#86b300;">"print happens on *poll*"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> })
</span><span>}
</span></code></pre>
<p>That is also the desugaring that the <code>async-trait</code> crate does (well
<a href="https://docs.rs/async-trait/latest/async_trait/#explanation">almost</a>). The
resulting type has a name, and we can put it into <code>struct</code>s and collections.
However it comes with the disadvantage of a heap allocation and dynamic dispatch.
And it is not available for <code>#![no_std]</code> builds either.</p>
<p>So the broader community seems to have settled on creating their own <code>Future</code>
types, which often wrap an inner future via generics.
That on its own comes with a lot of inconvenience, as manually implementing
<code>Future::poll</code> is a nightmare, and often requires manual <code>unsafe</code> code, or
pulling in an external dependency in the form of <a href="https://docs.rs/pin-project/latest/pin_project/">pin-project</a>.</p>
<p>For that reason, I have resorted to the following pattern, and I bet a lot of
other people do so as well:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">returns_named_future</span><span>() </span><span style="color:#61676ccc;">-></span><span> NamedFuture {
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// do as much logic as you possibly without having to use `await`
</span><span>
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// return the resulting future, which does the rest inside `poll`
</span><span> NamedFuture
</span><span>}
</span></code></pre>
<h1 id="problematic-tracing-example"><a class="anchor-link" href="#problematic-tracing-example" aria-label="Anchor link for: problematic-tracing-example">#</a>
Problematic <code>tracing</code> example</h1>
<p>To further illustrate the problem, let me demonstrate both problems with a full
practical <code>tracing</code> example, which follows the guidelines as presented in
the <a href="https://docs.rs/tracing/latest/tracing/struct.Span.html#in-asynchronous-code"><code>tracing::Span</code> docs</a>:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">use </span><span>std</span><span style="color:#ed9366;">::</span><span>future</span><span style="color:#ed9366;">::</span><span>Future</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">use </span><span>tracing</span><span style="color:#ed9366;">::</span><span>Instrument</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">fully_lazy</span><span>() </span><span style="color:#61676ccc;">-></span><span> impl Future<Output = ()> {
</span><span> async </span><span style="color:#fa6e32;">move </span><span>{
</span><span> tracing</span><span style="color:#ed9366;">::</span><span>info</span><span style="color:#ed9366;">!</span><span>(</span><span style="color:#86b300;">"log happens on *poll*"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">not_lazy</span><span>() </span><span style="color:#61676ccc;">-></span><span> impl Future<Output = ()> {
</span><span> tracing</span><span style="color:#ed9366;">::</span><span>info</span><span style="color:#ed9366;">!</span><span>(</span><span style="color:#86b300;">"log happens on *call*"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> std</span><span style="color:#ed9366;">::</span><span>future</span><span style="color:#ed9366;">::</span><span>ready(())
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">broken_parent_lazy</span><span>() </span><span style="color:#61676ccc;">-></span><span> impl Future<Output = ()> {
</span><span> </span><span style="color:#fa6e32;">let</span><span> span </span><span style="color:#ed9366;">= </span><span>tracing</span><span style="color:#ed9366;">::</span><span>info_span</span><span style="color:#ed9366;">!</span><span>(</span><span style="color:#86b300;">"broken_parent"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">fully_lazy</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">instrument</span><span>(span)
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">broken_parent_not_lazy</span><span>() </span><span style="color:#61676ccc;">-></span><span> impl Future<Output = ()> {
</span><span> </span><span style="color:#fa6e32;">let</span><span> span </span><span style="color:#ed9366;">= </span><span>tracing</span><span style="color:#ed9366;">::</span><span>info_span</span><span style="color:#ed9366;">!</span><span>(</span><span style="color:#86b300;">"broken_parent"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">not_lazy</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">instrument</span><span>(span)
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">correct_parent_lazy</span><span>() </span><span style="color:#61676ccc;">-></span><span> impl Future<Output = ()> {
</span><span> </span><span style="color:#fa6e32;">let</span><span> span </span><span style="color:#ed9366;">= </span><span>tracing</span><span style="color:#ed9366;">::</span><span>info_span</span><span style="color:#ed9366;">!</span><span>(</span><span style="color:#86b300;">"correct_parent"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> span</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">in_scope</span><span>(|| </span><span style="color:#f07171;">fully_lazy</span><span>())</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">instrument</span><span>(span)
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">correct_parent_not_lazy</span><span>() </span><span style="color:#61676ccc;">-></span><span> impl Future<Output = ()> {
</span><span> </span><span style="color:#fa6e32;">let</span><span> span </span><span style="color:#ed9366;">= </span><span>tracing</span><span style="color:#ed9366;">::</span><span>info_span</span><span style="color:#ed9366;">!</span><span>(</span><span style="color:#86b300;">"correct_parent"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> span</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">in_scope</span><span>(|| </span><span style="color:#f07171;">not_lazy</span><span>())</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">instrument</span><span>(span)
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">tokio</span><span>::</span><span style="color:#f29718;">main</span><span>]
</span><span>async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">main</span><span>() {
</span><span> tracing_subscriber</span><span style="color:#ed9366;">::</span><span>fmt</span><span style="color:#ed9366;">::</span><span>init()</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#f07171;">broken_parent_lazy</span><span>()</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">broken_parent_not_lazy</span><span>()</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#f07171;">correct_parent_lazy</span><span>()</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">correct_parent_not_lazy</span><span>()</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>Running this example outputs the following:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>2022-01-26T14:48:11.115115Z INFO broken_parent: lazy_futures: log happens on *poll*
</span><span>2022-01-26T14:48:11.115511Z INFO lazy_futures: log happens on *call*
</span><span>2022-01-26T14:48:11.115732Z INFO correct_parent: lazy_futures: log happens on *poll*
</span><span>2022-01-26T14:48:11.116020Z INFO correct_parent: lazy_futures: log happens on *call*
</span></code></pre>
<p>As we can see, the second log line (coming from <code>broken_parent_not_lazy().await</code>)
is not running in correct span as intended. The other examples work as intended,
as either the caller handles the unlikely case of code being executed in <strong>call</strong>,
or rather the callee is more correct by not executing any code in <strong>call</strong>.</p>
<h1 id="conclusion"><a class="anchor-link" href="#conclusion" aria-label="Anchor link for: conclusion">#</a>
Conclusion</h1>
<p>I have two recommendations here, covering both sides of the coin:</p>
<ul>
<li>
<p><strong>Do not make assumptions about third-party code being lazy</strong>. Foreign code
might be <em>misbehaving</em> in a way that it executes actual code in <strong>call</strong> vs <strong>poll</strong>.</p>
</li>
<li>
<p><strong>Make all your code fully lazy</strong>. Do not do any computations in <strong>call</strong> vs <strong>poll</strong>.
Foreign code may rely on futures code being fully lazy, and you might otherwise
violate expectations.</p>
</li>
</ul>
Rust Contexts2022-01-21T00:00:00+00:002022-01-21T00:00:00+00:00
Unknown
https://swatinem.de/blog/log-contexts/<p>I realize I might be a bit late to the party, considering
<a href="https://tmandry.gitlab.io/blog/posts/2021-12-21-context-capabilities/">tmandrys blog post</a>
is already a month old, and there is now even an
<a href="https://github.com/nikomatsakis/context-capabilities-initiative">initiative</a>
focused on contexts.</p>
<p>Alas, I want to explore the way in which implicit contexts could solve some very
real problems that we have with todays methods of modeling context. As practical
examples, I want to highlight the issues using the <code>sentry-rust</code>, and <code>tracing</code> APIs.</p>
<h1 id="footguns"><a class="anchor-link" href="#footguns" aria-label="Anchor link for: footguns">#</a>
Footguns</h1>
<p>IMO the currently available APIs are a bit too hard to mis-use. An example of
this is mentioned directly in the <a href="https://docs.rs/tracing/latest/tracing/#spans"><code>tracing</code> docs</a>:</p>
<blockquote>
<p>In asynchronous code that uses async/await syntax, <code>Span::enter</code> may produce incorrect traces if the returned drop guard is held across an await point.</p>
</blockquote>
<p>From a sentry point of view, the <code>configure_scope</code> and <code>with_scope</code>/<code>push_scope</code> APIs
have a similar potential of being misused. This can cause slight problems, like
tags not being applied correctly, up to bigger problems such as a <code>panic</code> if
scope manipulation is unbalanced.</p>
<h1 id="tradeoffs"><a class="anchor-link" href="#tradeoffs" aria-label="Anchor link for: tradeoffs">#</a>
Tradeoffs</h1>
<p>In essence, I think these problems stem from the tradeoff of favoring
convenience over correctness.</p>
<p>The problem here is that both <a href="https://github.com/tokio-rs/tracing/blob/6f23c128fced6409008838a3223d76d7332d79e9/tracing-core/src/dispatch.rs#L180-L202"><code>tracing</code></a> and <a href="https://github.com/getsentry/sentry-rust/blob/c75d62ac9608930193fff843e413755aa3084191/sentry-core/src/hub.rs#L18-L30"><code>sentry-rust</code></a>
use a mixture of <code>static mut</code> (which <a href="https://github.com/rust-lang/rust/issues/53639">is almost impossible to use correctly</a>), <a href="https://docs.rs/lazy_static/latest/lazy_static/"><code>lazy_static</code></a> and <a href="https://doc.rust-lang.org/std/macro.thread_local.html"><code>thread_local</code></a>.</p>
<p>In both cases, there is one <strong>global</strong> <code>Hub</code> that is automatically inherited to
all newly spawned threads. And each of these threads keeps a <strong>current state</strong> around
which is mutable. The global hub also needs to be mutable, since you have to
initialize it at some point.</p>
<p>This mutability in turns requires the use of way too many <code>Arc</code>s and <code>Mutex</code>es.
A fact of <code>sentry-rust</code> internals I recently discussed on Discord as well.</p>
<p>The convenience we get out of this is that users can just call <code>sentry::capture_event</code> <strong>anywhere</strong> in the code. Similarly, any library can annotate a function with
<code>#[tracing::instrument]</code> and things just work. Well unless they don’t.</p>
<p>The problems happen when you write something to the <strong>current state</strong>, but that
mutable state is being shared among multiple concurrent async tasks. How can we
avoid these footguns?</p>
<p>In case of <code>tracing</code>, you have to use manually
<a href="https://docs.rs/tracing/latest/tracing/trait.Instrument.html"><code>instrument()</code> a future</a>.
Similarly in <code>sentry-rust</code>, you have to bind a <code>Hub</code> to the future via <a href="https://docs.rs/sentry/latest/sentry/trait.SentryFutureExt.html"><code>bind_hub()</code></a>. But that unfortunately
is also prone to be misused when dealing with <code>join_all</code> concurrency.
The right thing to use here is <code>.bind_hub(Hub::new_from_top(Hub::current()))</code>.
Well that is a mouthful, and extremely easy to get wrong.</p>
<p>Essentially these issues boil down to shared mutable state. Something that the
Rust compiler and borrow-checker promise to solve.</p>
<h1 id="mutability"><a class="anchor-link" href="#mutability" aria-label="Anchor link for: mutability">#</a>
Mutability</h1>
<p>Which brings me to the next topic. I believe that just following Rusts normal
ownership and borrowing semantics would solve most or all of the outlined
problems. We have mutable data, so make sure to declare it as <code>&mut</code>, and the
compiler will tell us where we are tripping up.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">Context</span><span style="color:#61676ccc;">;
</span><span>
</span><span>async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">uses_mutable_ctx</span><span>(</span><span style="color:#ff8f40;">_ctx</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> Context) {}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">tokio</span><span>::</span><span style="color:#f29718;">main</span><span>]
</span><span>async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">main</span><span>() {
</span><span> </span><span style="color:#fa6e32;">let mut</span><span> ctx </span><span style="color:#ed9366;">=</span><span> Context</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// normal calls work just fine
</span><span> </span><span style="color:#f07171;">uses_mutable_ctx</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> ctx)</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">uses_mutable_ctx</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> ctx)</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// futures concurrency: nope
</span><span> </span><span style="color:#fa6e32;">let</span><span> futures </span><span style="color:#ed9366;">= </span><span>(</span><span style="color:#ff8f40;">0</span><span style="color:#ed9366;">..</span><span style="color:#ff8f40;">2</span><span>)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(|</span><span style="color:#ff8f40;">_i</span><span>| </span><span style="color:#f07171;">uses_mutable_ctx</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> ctx))</span><span style="color:#61676ccc;">;
</span><span> futures</span><span style="color:#ed9366;">::</span><span>future</span><span style="color:#ed9366;">::</span><span>join_all(futures)</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// concurrent tasks: nope
</span><span> </span><span style="color:#fa6e32;">let </span><span style="color:#ed9366;">_ = </span><span>tokio</span><span style="color:#ed9366;">::</span><span>task</span><span style="color:#ed9366;">::</span><span>spawn(</span><span style="color:#f07171;">uses_mutable_ctx</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> ctx))</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// threads: nope
</span><span> </span><span style="color:#fa6e32;">let </span><span style="color:#ed9366;">_ = </span><span>std</span><span style="color:#ed9366;">::</span><span>thread</span><span style="color:#ed9366;">::</span><span>spawn(|| {
</span><span> tokio</span><span style="color:#ed9366;">::</span><span>runtime</span><span style="color:#ed9366;">::</span><span>Runtime</span><span style="color:#ed9366;">::</span><span>new()
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap</span><span>()
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">block_on</span><span>(</span><span style="color:#f07171;">uses_mutable_ctx</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> ctx))
</span><span> })
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">join</span><span>()</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>The above example fails to compile for all cases that involve concurrency:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>error: captured variable cannot escape `FnMut` closure body
</span><span> --> playground\contexts\src\main.rs:15:35
</span><span> |
</span><span>8 | let mut ctx = Context;
</span><span> | ------- variable defined here
</span><span>...
</span><span>15 | let futures = (0..2).map(|_i| uses_mutable_ctx(&mut ctx));
</span><span> | - ^^^^^^^^^^^^^^^^^^^^^^---^
</span><span> | | | |
</span><span> | | | variable captured here
</span><span> | | returns a reference to a captured variable which escapes the closure body
</span><span> | inferred to be a `FnMut` closure
</span><span> |
</span><span> = note: `FnMut` closures only have access to their captured variables while they are executing...
</span><span> = note: ...therefore, they cannot allow references to captured variables to escape
</span><span>
</span><span>error[E0597]: `ctx` does not live long enough
</span><span> --> playground\contexts\src\main.rs:23:49
</span><span> |
</span><span>23 | let _ = tokio::task::spawn(uses_mutable_ctx(&mut ctx)).await;
</span><span> | -----------------^^^^^^^^-
</span><span> | | |
</span><span> | | borrowed value does not live long enough
</span><span> | argument requires that `ctx` is borrowed for `'static`
</span><span>...
</span><span>41 | }
</span><span> | - `ctx` dropped here while still borrowed
</span><span>
</span><span>error[E0499]: cannot borrow `ctx` as mutable more than once at a time
</span><span> --> playground\contexts\src\main.rs:28:32
</span><span> |
</span><span>23 | let _ = tokio::task::spawn(uses_mutable_ctx(&mut ctx)).await;
</span><span> | --------------------------
</span><span> | | |
</span><span> | | first mutable borrow occurs here
</span><span> | argument requires that `ctx` is borrowed for `'static`
</span><span>...
</span><span>28 | let _ = std::thread::spawn(|| {
</span><span> | ^^ second mutable borrow occurs here
</span><span>...
</span><span>31 | .block_on(uses_mutable_ctx(&mut ctx))
</span><span> | --- second borrow occurs due to use of `ctx` in closure
</span></code></pre>
<p>In all of these cases, the compiler forces us to create an explicit clone.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="font-style:italic;color:#abb0b6;">// futures concurrency
</span><span style="color:#fa6e32;">let</span><span> futures </span><span style="color:#ed9366;">= </span><span>(</span><span style="color:#ff8f40;">0</span><span style="color:#ed9366;">..</span><span style="color:#ff8f40;">2</span><span>)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(|</span><span style="color:#ff8f40;">_i</span><span>| {
</span><span> </span><span style="color:#fa6e32;">let mut</span><span> ctx </span><span style="color:#ed9366;">=</span><span> ctx</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">clone</span><span>()</span><span style="color:#61676ccc;">;
</span><span> async </span><span style="color:#fa6e32;">move </span><span>{ </span><span style="color:#f07171;">uses_mutable_ctx</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> ctx)</span><span style="color:#ed9366;">.</span><span>await }
</span><span>})</span><span style="color:#61676ccc;">;
</span><span>futures</span><span style="color:#ed9366;">::</span><span>future</span><span style="color:#ed9366;">::</span><span>join_all(futures)</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// concurrent tasks
</span><span style="color:#fa6e32;">let mut</span><span> spawn_ctx </span><span style="color:#ed9366;">=</span><span> ctx</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">clone</span><span>()</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let </span><span style="color:#ed9366;">_ = </span><span>tokio</span><span style="color:#ed9366;">::</span><span>task</span><span style="color:#ed9366;">::</span><span>spawn(async </span><span style="color:#fa6e32;">move </span><span>{ </span><span style="color:#f07171;">uses_mutable_ctx</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> spawn_ctx)</span><span style="color:#ed9366;">.</span><span>await })</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// threads
</span><span style="color:#fa6e32;">let mut</span><span> spawn_ctx </span><span style="color:#ed9366;">=</span><span> ctx</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">clone</span><span>()</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let </span><span style="color:#ed9366;">_ = </span><span>std</span><span style="color:#ed9366;">::</span><span>thread</span><span style="color:#ed9366;">::</span><span>spawn(</span><span style="color:#fa6e32;">move </span><span style="color:#ed9366;">|| </span><span>{
</span><span> tokio</span><span style="color:#ed9366;">::</span><span>runtime</span><span style="color:#ed9366;">::</span><span>Runtime</span><span style="color:#ed9366;">::</span><span>new()
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap</span><span>()
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">block_on</span><span>(</span><span style="color:#f07171;">uses_mutable_ctx</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> spawn_ctx))
</span><span>})
</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">join</span><span>()</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>Unfortunately for us, thinking of contexts as implicit function argument would
only solve half of the problem. We won’t have to thread it down all our call
chain, but it does not solve the problem what happens when there are forks in
the road.</p>
<p>From the compilers perspective, the rules of shared mutability are quite clear.
However the compiler does not automatically clone for you. Also from a users
perspective, what should "cloning" mean in these cases?</p>
<p>For example in the case of a tracing span hierarchy. Are you <code>await</code>-ing or
<code>join</code>-ing all the spawned tasks/threads? Or are they more <em>“fire and forget”</em>?</p>
<h1 id="back-to-tradeoffs"><a class="anchor-link" href="#back-to-tradeoffs" aria-label="Anchor link for: back-to-tradeoffs">#</a>
Back to Tradeoffs</h1>
<p>And here we are back where we started. We have to chose the right tradeoffs
between easy of use, and possibility of misuse. Contexts in the sense of implicit
function arguments get us quite far in easy of use though.</p>
Rust Futures and Tasks2022-01-11T00:00:00+00:002022-01-11T00:00:00+00:00
Unknown
https://swatinem.de/blog/futures-n-tasks/<p>With the recent talk about <em>“Contexts”</em> in the Rust community, and some other
thoughts I had recently, I want to explore in a bit more detail what the
difference between Futures and Tasks is in Rust.</p>
<p>The difference between Futures and Tasks is like the difference between concurrency
and parallelism.</p>
<p>The difference is quite subtle, even considering just the words. I don’t even
know if my native german language even has different words for those concepts?</p>
<p>Lets look at a small snippet of the <a href="https://en.wikipedia.org/wiki/Concurrency_(computer_science)">Wikipedia entry on Concurrency</a>:</p>
<blockquote>
<p>Concurrency is not parallelism: concurrency is about dealing with lots of things at once but parallelism is about doing lots of things at once. Concurrency is about structure, parallelism is about execution.</p>
</blockquote>
<p>Well that is not all too helpful. Lets maybe approach this question from a different perspective then.</p>
<p>I we think of everyday life, we sometimes do actual work, or we are just waiting
for something to happen. Waiting would be boring though. So what do we humans do
in such a case? We start working on something else in the meantime. We call this
<em>“multitasking”</em>.</p>
<p><em>NOTE:</em> While explaining this to my wife, we figured out that humans use the word
<em>multitasking</em> for both concepts. Oh well.</p>
<p>The basic functionality that Futures provide us with their conveniently named
<code>await</code> keyword is the ability to say:</p>
<blockquote>
<p>Hey! I’m not doing any useful work right now, I’m just waiting.</p>
</blockquote>
<p>This waiting can be anything. Waiting for time to pass, for things to arrive on
the network, or for a computation to finish.</p>
<p>We can create a simple future that simulates this:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span>async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">future</span><span>(</span><span style="color:#ff8f40;">prefix</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">str</span><span>, </span><span style="color:#ff8f40;">num</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span>) {
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// simulates some IO wait
</span><span> tokio</span><span style="color:#ed9366;">::</span><span>time</span><span style="color:#ed9366;">::</span><span>sleep(Duration</span><span style="color:#ed9366;">::</span><span>from_millis(</span><span style="color:#ff8f40;">400</span><span>))</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// simulates some CPU workload
</span><span> thread</span><span style="color:#ed9366;">::</span><span>sleep(Duration</span><span style="color:#ed9366;">::</span><span>from_millis(</span><span style="color:#ff8f40;">100</span><span>))</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#f07171;">println!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{}</span><span style="color:#86b300;">-</span><span style="color:#ff8f40;">{}</span><span style="color:#86b300;"> finished"</span><span style="color:#61676ccc;">,</span><span> prefix</span><span style="color:#61676ccc;">,</span><span> num)</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>If we want to execute a couple of these, we can first try to do so one after the
other:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">let</span><span> non_static_str </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">String</span><span style="color:#ed9366;">::</span><span>from(</span><span style="color:#86b300;">"serial"</span><span>)</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let</span><span> start </span><span style="color:#ed9366;">= </span><span>Instant</span><span style="color:#ed9366;">::</span><span>now()</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let</span><span> futures </span><span style="color:#ed9366;">= </span><span>(</span><span style="color:#ff8f40;">0</span><span style="color:#ed9366;">..</span><span>n)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(|</span><span style="color:#ff8f40;">i</span><span>| </span><span style="color:#f07171;">future</span><span>(</span><span style="color:#ed9366;">&</span><span>non_static_str</span><span style="color:#61676ccc;">,</span><span> i))</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">for</span><span> future </span><span style="color:#ed9366;">in</span><span> futures {
</span><span> future</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span>}
</span><span style="color:#f07171;">println!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{}</span><span style="color:#86b300;">: </span><span style="color:#ff8f40;">{:?}</span><span style="color:#86b300;">"</span><span style="color:#61676ccc;">,</span><span> non_static_str</span><span style="color:#61676ccc;">,</span><span> start</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">elapsed</span><span>())</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>With this output:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>serial-0 finished
</span><span>serial-1 finished
</span><span>serial-2 finished
</span><span>serial-3 finished
</span><span>serial-4 finished
</span><span>serial-5 finished
</span><span>serial-6 finished
</span><span>serial-7 finished
</span><span>serial: 4.1303916s
</span><span>
</span><span>Or showing this visually:
</span><span>
</span><span>┌─────────────┬─────────┬─────────────┬─────────┬───┐
</span><span>│ … waiting … │ working │ … waiting … │ working │ … │
</span><span>└─────────────┴─────────┴─────────────┴─────────┴───┘
</span></code></pre>
<p>Not too efficient. We haven’t gained anything. But we can improve this if we
know that the work we want to do is <em>sufficiently independent</em> of each other.</p>
<h1 id="concurrency"><a class="anchor-link" href="#concurrency" aria-label="Anchor link for: concurrency">#</a>
Concurrency</h1>
<p>I think this is what the Wikipedia article meant by <em>“Concurrency is about structure”</em>:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">let</span><span> non_static_str </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">String</span><span style="color:#ed9366;">::</span><span>from(</span><span style="color:#86b300;">"futures"</span><span>)</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let</span><span> start </span><span style="color:#ed9366;">= </span><span>Instant</span><span style="color:#ed9366;">::</span><span>now()</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let</span><span> futures </span><span style="color:#ed9366;">= </span><span>(</span><span style="color:#ff8f40;">0</span><span style="color:#ed9366;">..</span><span>n)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(|</span><span style="color:#ff8f40;">i</span><span>| </span><span style="color:#f07171;">future</span><span>(</span><span style="color:#ed9366;">&</span><span>non_static_str</span><span style="color:#61676ccc;">,</span><span> i))</span><span style="color:#61676ccc;">;
</span><span style="color:#f07171;">join_all</span><span>(futures)</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span style="color:#f07171;">println!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{}</span><span style="color:#86b300;">: </span><span style="color:#ff8f40;">{:?}</span><span style="color:#86b300;">"</span><span style="color:#61676ccc;">,</span><span> non_static_str</span><span style="color:#61676ccc;">,</span><span> start</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">elapsed</span><span>())</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>And lets run this:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>futures-0 finished
</span><span>futures-1 finished
</span><span>futures-2 finished
</span><span>futures-3 finished
</span><span>futures-4 finished
</span><span>futures-5 finished
</span><span>futures-6 finished
</span><span>futures-7 finished
</span><span>futures: 1.2765159s
</span><span>
</span><span>Or showing this visually:
</span><span>
</span><span>┌─────────────┬─────────┬─────────┬───┐
</span><span>│ … waiting … │ working │ working │ … │
</span><span>└─────────────┴─────────┴─────────┴───┘
</span></code></pre>
<p>Adding concurrency here means that we were able to compress overlapping wait times.
This is good, but not quite optimal as the future itself only uses a single
thread of execution. So if we only ever use <code>futures::future::join_all</code>, Rust is
no different from other programming languages that have a single threaded event
loop.</p>
<h1 id="parallelism"><a class="anchor-link" href="#parallelism" aria-label="Anchor link for: parallelism">#</a>
Parallelism</h1>
<p>Lets try to add the <em>“Parallelism is about execution”</em> part
by using <code>tokio::spawn</code> to turn the future into an independent task.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">let</span><span> non_static_str </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">String</span><span style="color:#ed9366;">::</span><span>from(</span><span style="color:#86b300;">"tasks"</span><span>)</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let</span><span> start </span><span style="color:#ed9366;">= </span><span>Instant</span><span style="color:#ed9366;">::</span><span>now()</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let</span><span> futures </span><span style="color:#ed9366;">= </span><span>(</span><span style="color:#ff8f40;">0</span><span style="color:#ed9366;">..</span><span>n)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(|</span><span style="color:#ff8f40;">i</span><span>| tokio</span><span style="color:#ed9366;">::</span><span>spawn(</span><span style="color:#f07171;">future</span><span>(</span><span style="color:#ed9366;">&</span><span>non_static_str</span><span style="color:#61676ccc;">,</span><span> i)))</span><span style="color:#61676ccc;">;
</span><span style="color:#f07171;">join_all</span><span>(futures)</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span style="color:#f07171;">println!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{}</span><span style="color:#86b300;">: </span><span style="color:#ff8f40;">{:?}</span><span style="color:#86b300;">"</span><span style="color:#61676ccc;">,</span><span> non_static_str</span><span style="color:#61676ccc;">,</span><span> start</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">elapsed</span><span>())</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>Well, it turns out that this is not possible quite that easily:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>error[E0597]: `non_static_str` does not live long enough
</span><span> --> playground\futuresntasks\src\main.rs:XX:55
</span><span> |
</span><span>XX | let futures = (0..n).map(|i| tokio::spawn(future(&non_static_str, i)));
</span><span> | --- --------^^^^^^^^^^^^^^----
</span><span> | | | |
</span><span> | | | borrowed value does not live long enough
</span><span> | | argument requires that `non_static_str` is borrowed for `'static`
</span><span> | value captured here
</span><span>...
</span><span>XX | }
</span><span> | - `non_static_str` dropped here while still borrowed
</span></code></pre>
<p>Turns out our futures are not <em>sufficiently independent</em>, which in the case of
tasks means they need to be fully independent. If we look at the signature of
the <code>tokio::spawn</code> function, the <code>'static</code> lifetime signifies this requirement.
And the <code>Send</code> lifetime is the part that makes this truly parallel (or rather,
which makes this parallelism safe).</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">spawn</span><span><T>(</span><span style="color:#ff8f40;">future</span><span style="color:#61676ccc;">:</span><span> T) </span><span style="color:#61676ccc;">-> </span><span>JoinHandle<</span><span style="color:#fa6e32;">T</span><span style="color:#ed9366;">::</span><span>Output> </span><span style="color:#fa6e32;">where
</span><span> T</span><span style="color:#61676ccc;">:</span><span> Future + Send + </span><span style="color:#fa6e32;">'static</span><span>,
</span><span> </span><span style="color:#fa6e32;">T</span><span style="color:#ed9366;">::</span><span>Output</span><span style="color:#61676ccc;">:</span><span> Send + </span><span style="color:#fa6e32;">'static</span><span>,
</span><span>{}
</span></code></pre>
<p>So in order to make the future fully independent, we can give it its own copy
of all the data it needs to access:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">let</span><span> non_static_str </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">String</span><span style="color:#ed9366;">::</span><span>from(</span><span style="color:#86b300;">"tasks"</span><span>)</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let</span><span> start </span><span style="color:#ed9366;">= </span><span>Instant</span><span style="color:#ed9366;">::</span><span>now()</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let</span><span> futures </span><span style="color:#ed9366;">= </span><span>(</span><span style="color:#ff8f40;">0</span><span style="color:#ed9366;">..</span><span>n)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(|</span><span style="color:#ff8f40;">i</span><span>| {
</span><span> </span><span style="color:#fa6e32;">let</span><span> cloned_str </span><span style="color:#ed9366;">=</span><span> non_static_str</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">clone</span><span>()</span><span style="color:#61676ccc;">; </span><span style="font-style:italic;color:#abb0b6;">// <-
</span><span> tokio</span><span style="color:#ed9366;">::</span><span>spawn(async </span><span style="color:#fa6e32;">move </span><span>{ </span><span style="color:#f07171;">future</span><span>(</span><span style="color:#ed9366;">&</span><span>cloned_str</span><span style="color:#61676ccc;">,</span><span> i)</span><span style="color:#ed9366;">.</span><span>await })
</span><span>})</span><span style="color:#61676ccc;">;
</span><span style="color:#f07171;">join_all</span><span>(futures)</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span style="color:#f07171;">println!</span><span>(</span><span style="color:#86b300;">"</span><span style="color:#ff8f40;">{}</span><span style="color:#86b300;">: </span><span style="color:#ff8f40;">{:?}</span><span style="color:#86b300;">"</span><span style="color:#61676ccc;">,</span><span> non_static_str</span><span style="color:#61676ccc;">,</span><span> start</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">elapsed</span><span>())</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>With each task being completely independent now, we can run out example again:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>tasks-5 finished
</span><span>tasks-4 finished
</span><span>tasks-7 finished
</span><span>tasks-0 finished
</span><span>tasks-2 finished
</span><span>tasks-1 finished
</span><span>tasks-6 finished
</span><span>tasks-3 finished
</span><span>tasks: 513.6456ms
</span><span>
</span><span>Or showing this visually:
</span><span>
</span><span> ┌─────────────┬─────────┐
</span><span>Thread 1: │ … waiting … │ working │
</span><span> ├─────────────┼─────────┤
</span><span>Thread 2: │ … waiting … │ working │
</span><span> └─────────────┴─────────┘
</span><span>Thread n: …
</span></code></pre>
<p>This is nice! We have parallelized our tasks across multiple threads. However
this comes at a price, in this case cloning of our data.</p>
<h1 id="mutability"><a class="anchor-link" href="#mutability" aria-label="Anchor link for: mutability">#</a>
Mutability</h1>
<p>We have learned that futures-level concurrency can use shared data without the
need for cloning. So how does this work with mutability?</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span>async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">mut_future</span><span>(</span><span style="color:#ff8f40;">num</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span>, </span><span style="color:#ff8f40;">finished</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span><</span><span style="color:#fa6e32;">usize</span><span>>) {
</span><span> finished</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">push</span><span>(num)</span><span style="color:#61676ccc;">;
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">let mut</span><span> finished </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">vec!</span><span>[]</span><span style="color:#61676ccc;">;
</span><span style="color:#f07171;">mut_future</span><span>(</span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> finished)</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">let</span><span> futures </span><span style="color:#ed9366;">= </span><span>(</span><span style="color:#ff8f40;">0</span><span style="color:#ed9366;">..</span><span>n)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(|</span><span style="color:#ff8f40;">i</span><span>| </span><span style="color:#f07171;">mut_future</span><span>(i</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> finished))</span><span style="color:#61676ccc;">;
</span><span style="color:#f07171;">join_all</span><span>(futures)</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#f07171;">println!</span><span>(</span><span style="color:#86b300;">"finished tasks: </span><span style="color:#ff8f40;">{:?}</span><span style="color:#86b300;">"</span><span style="color:#61676ccc;">,</span><span> finished)</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>Well, we are out of luck here, as rust enforces the same borrowing rules as for
other mutable borrows:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>error: captured variable cannot escape `FnMut` closure body
</span><span> --> playground\futuresntasks\src\main.rs:XX:34
</span><span> |
</span><span>XX | let mut finished = vec![];
</span><span> | ------------ variable defined here
</span><span>XX | let futures = (0..n).map(|i| mut_future(i, &mut finished));
</span><span> | - ^^^^^^^^^^^^^^^^^^^--------^
</span><span> | | | |
</span><span> | | | variable captured here
</span><span> | | returns a reference to a captured variable which escapes the closure body | inferred to be a `FnMut` closure
</span><span> |
</span><span> = note: `FnMut` closures only have access to their captured variables while they are executing...
</span><span> = note: ...therefore, they cannot allow references to captured variables to escape
</span></code></pre>
<p>Rust thus forces us to use on of the shared mutability primitives. Lets give that
a try:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">let</span><span> finished </span><span style="color:#ed9366;">= </span><span>Rc</span><span style="color:#ed9366;">::</span><span>new(std</span><span style="color:#ed9366;">::</span><span>cell</span><span style="color:#ed9366;">::</span><span>RefCell</span><span style="color:#ed9366;">::</span><span>new(</span><span style="color:#f07171;">vec!</span><span>[]))</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let</span><span> futures </span><span style="color:#ed9366;">= </span><span>(</span><span style="color:#ff8f40;">0</span><span style="color:#ed9366;">..</span><span>n)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(|</span><span style="color:#ff8f40;">i</span><span>| {
</span><span> </span><span style="color:#fa6e32;">let</span><span> finished </span><span style="color:#ed9366;">=</span><span> finished</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">clone</span><span>()</span><span style="color:#61676ccc;">;
</span><span> async </span><span style="color:#fa6e32;">move </span><span>{
</span><span> </span><span style="color:#fa6e32;">let mut</span><span> finished </span><span style="color:#ed9366;">=</span><span> finished</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">borrow_mut</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">mut_future</span><span>(i</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> finished)</span><span style="color:#ed9366;">.</span><span>await
</span><span> }
</span><span>})</span><span style="color:#61676ccc;">;
</span><span style="color:#f07171;">join_all</span><span>(futures)</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let</span><span> finished </span><span style="color:#ed9366;">= </span><span>Rc</span><span style="color:#ed9366;">::</span><span>try_unwrap(finished)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">into_inner</span><span>()</span><span style="color:#61676ccc;">;
</span><span style="color:#f07171;">println!</span><span>(</span><span style="color:#86b300;">"finished tasks: </span><span style="color:#ff8f40;">{:?}</span><span style="color:#86b300;">"</span><span style="color:#61676ccc;">,</span><span> finished)</span><span style="color:#61676ccc;">;
</span></code></pre>
<p><em>NOTE:</em> The example also works without <code>Rc</code>, I just wanted to highlight the usage
of <em>lightweight</em> shared mutability/ownership primitives. This also proves our
observation from above that we are <em>single threaded</em>.</p>
<p>We are also using <code>RefCell</code> here, which <em>“Panics if the value is currently borrowed”</em>.
That can very well happen in case of concurrency. We are just lucky that out
<code>mut_future</code> does not actually <code>await</code> internally. This type is thus a prime
candidate for the proposed <a href="https://github.com/rust-lang/rust/issues/83310"><code>must_not_suspend</code></a>
lint.</p>
<p>The safety guarantees of Rust are about memory safety and avoiding memory races.
You can still do dumb stuff like the above in safe Rust, along with leaking
memory, introducing deadlocks, and having logic bugs.</p>
<p>If we go further and try to introduce a <code>tokio::spawn</code> like before, Rust will
dutifully remind us that we need to use the thread safe companions of
<code>Rc</code>/<code>RefCell</code> in that case:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>error: future cannot be sent between threads safely
</span><span>--> playground\futuresntasks\src\main.rs:XXX:9
</span><span> |
</span><span>XXX | tokio::spawn(async move {
</span><span> | ^^^^^^^^^^^^ future created by async block is not `Send`
</span><span> |
</span><span> = help: within `impl futures::Future`, the trait `std::marker::Send` is not implemented for `Rc<RefCell<Vec<usize>>>`
</span><span>note: captured value is not `Send`
</span><span>--> playground\futuresntasks\src\main.rs:XXX:32
</span><span> |
</span><span>XXX | let mut finished = finished.borrow_mut();
</span><span> | ^^^^^^^^ has type `Rc<RefCell<Vec<usize>>>` which is not `Send`
</span><span>note: required by a bound in `tokio::spawn`
</span><span>--> tokio-1.15.0\src\task\spawn.rs:127:21
</span><span> |
</span><span>127 | T: Future + Send + 'static,
</span><span> | ^^^^ required by this bound in `tokio::spawn`
</span><span>error: future cannot be sent between threads safely
</span><span>--> playground\futuresntasks\src\main.rs:XXX:9
</span><span> |
</span><span>XXX | tokio::spawn(async move {
</span><span> | ^^^^^^^^^^^^ future created by async block is not `Send`
</span><span> |
</span><span> = help: the trait `Sync` is not implemented for `Cell<isize>`
</span><span>note: future is not `Send` as this value is used across an await
</span><span>--> playground\futuresntasks\src\main.rs:XXX:13
</span><span> |
</span><span>XXX | let mut finished = finished.borrow_mut();
</span><span> | ------------ has type `RefMut<'_, Vec<usize>>` which is not `Send`
</span><span>XXX | mut_future(i, &mut finished).await
</span><span> | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ await occurs here, with `mut finished` maybe used later
</span><span>XXX | })
</span><span> | - `mut finished` is later dropped here
</span><span>note: required by a bound in `tokio::spawn`
</span><span>--> tokio-1.15.0\src\task\spawn.rs:127:21
</span><span> |
</span><span>127 | T: Future + Send + 'static,
</span><span> | ^^^^ required by this bound in `tokio::spawn`
</span></code></pre>
<p>Going one step further, we even start using <code>tokio::sync::Mutex</code>, which is
a locking primitive better optimized for async contexts, as it allows us to
literally <em>“wait for the lock”</em>, and to hold it across <code>await</code> points.
If we change our type to <code>Arc<tokio::sync::Mutex<Vec<_>>></code>, along with necessary
code changes, we come up with a working solution:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">let</span><span> finished </span><span style="color:#ed9366;">= </span><span>Arc</span><span style="color:#ed9366;">::</span><span>new(tokio</span><span style="color:#ed9366;">::</span><span>sync</span><span style="color:#ed9366;">::</span><span>Mutex</span><span style="color:#ed9366;">::</span><span>new(</span><span style="color:#f07171;">vec!</span><span>[]))</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let</span><span> futures </span><span style="color:#ed9366;">= </span><span>(</span><span style="color:#ff8f40;">0</span><span style="color:#ed9366;">..</span><span>n)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(|</span><span style="color:#ff8f40;">i</span><span>| {
</span><span> </span><span style="color:#fa6e32;">let</span><span> finished </span><span style="color:#ed9366;">=</span><span> finished</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">clone</span><span>()</span><span style="color:#61676ccc;">;
</span><span> tokio</span><span style="color:#ed9366;">::</span><span>spawn(async </span><span style="color:#fa6e32;">move </span><span>{
</span><span> </span><span style="color:#fa6e32;">let mut</span><span> finished </span><span style="color:#ed9366;">=</span><span> finished</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">lock</span><span>()</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">mut_future</span><span>(i</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> finished)</span><span style="color:#ed9366;">.</span><span>await
</span><span> })
</span><span>})</span><span style="color:#61676ccc;">;
</span><span style="color:#f07171;">join_all</span><span>(futures)</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">let</span><span> finished </span><span style="color:#ed9366;">= </span><span>Arc</span><span style="color:#ed9366;">::</span><span>try_unwrap(finished)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">into_inner</span><span>()</span><span style="color:#61676ccc;">;
</span><span style="color:#f07171;">println!</span><span>(</span><span style="color:#86b300;">"finished tasks: </span><span style="color:#ff8f40;">{:?}</span><span style="color:#86b300;">"</span><span style="color:#61676ccc;">,</span><span> finished)</span><span style="color:#61676ccc;">;
</span></code></pre>
<h1 id="conclusion"><a class="anchor-link" href="#conclusion" aria-label="Anchor link for: conclusion">#</a>
Conclusion</h1>
<p>Well there we have it. There is pros and cons to both futures (well, <code>futures::join</code> in that sense) and tasks.</p>
<ul>
<li>Futures can work with non-<code>'static</code> references.</li>
<li>Tasks on the other hand need to own all their data, leading to more <code>Clone</code>-ing or <code>Arc</code>-ing.</li>
<li>Futures are essentially single-threaded.</li>
<li>While Tasks can spread CPU intensive work across more cores.</li>
<li><strong>UPDATE</strong>: almost forget the most important difference:</li>
<li>Futures need to be actively polled.</li>
<li>While Tasks are <em>fire and forget</em>. Cancellation works using the <code>JoinHandle::abort</code> API.</li>
</ul>
<p>So which one should you chose? Well that is totally up to you! What I haven’t
done here is actually <strong>measure</strong> things. Such as:
What is the cost of the additional <code>Clone</code>-ing? How much more work is put on the
Runtime/executor? How does this change the throughput of the complete system?
The average latency and the latency distribution? In the end this very much
depends on the system.</p>
<p>But what I can tell you from experience is that especially refactoring from
Futures to Tasks can be painful sometimes. We have seen an example of
a <em>“future is not <code>Send</code>”</em> error. We were lucky in this case because the compiler
told us exactly <em>why</em>. I had to struggle with other cases in which the compiler
presented me with a list of completely undecipherable types that was not at all
helpful.</p>
<h1 id="threads-and-cores"><a class="anchor-link" href="#threads-and-cores" aria-label="Anchor link for: threads-and-cores">#</a>
Threads and Cores</h1>
<p>While writing this, it also came to me that we have this <em>multiplexing</em> on
different layers.</p>
<p>Futures concurrency is scheduling/multiplexing multiple computations on a
single thread. Tasks and executors schedule M tasks on N operating system
threads. Similarly the operating system also schedules M threads to N processor
cores.</p>
<p>The idea here is that higher level we go, we have less overhead. Scheduling
tasks on an executor or threads on the operating system is not free.</p>
<p>And in the end, we have a completely different beast altogether. Modern CPUs
have this thing called simultaneous multithreading, or hardware multithreading.
This is when the CPU offers more logical cores than it has physical cores. So it
can do actual work while other threads are waiting for data to be copied from
main memory into CPU caches. Or other neat tricks to speed up the total
throughput of the system.</p>
Rust async can truly be zero-cost2021-10-12T00:00:00+00:002021-10-12T00:00:00+00:00
Unknown
https://swatinem.de/blog/zero-cost-async/<p><strong>Update</strong>:</p>
<p>I updated the code examples now that GATs have been stabilized.</p>
<hr />
<p>One of the fundamental selling points of Rust is zero-cost abstractions.
This means that you can write high-level generic code, and the compiler will
optimize it in such a way that you couldn’t have written better code by hand.</p>
<p>There are tons of examples of Rust doing this. But I came with a very specific
example in mind, and was curious if Rust could actually figure all this out.</p>
<p>I want to have some code which should be generic over an async, or a sync
implementation. What does this mean?
Essentially, I want to have an async-trait, with a generic async function
using that trait. But I also want a sync version of it, without having to write
those separately.</p>
<p>It is hard to explain in words, so lets demonstrate the idea with a snipped of
Rust:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="font-style:italic;color:#abb0b6;">// as of today, we need one nightly-only features to make this work:
</span><span style="color:#61676ccc;">#!</span><span>[</span><span style="color:#f29718;">feature</span><span>(type_alias_impl_trait)]
</span><span>
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">Stuff</span><span>(pub </span><span style="color:#fa6e32;">u8</span><span>)</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">trait </span><span style="color:#399ee6;">StuffProvider </span><span>{
</span><span> </span><span style="color:#fa6e32;">type </span><span style="color:#399ee6;">ProvideStuff</span><span style="color:#ed9366;"><</span><span style="color:#fa6e32;">'c</span><span style="color:#ed9366;">></span><span style="color:#61676ccc;">: </span><span>Future<Output = Stuff> </span><span style="color:#ed9366;">+ </span><span style="color:#fa6e32;">'c where Self</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">'c</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#abb0b6;">/// Provides [`Stuff`] asynchronously.
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">provide_stuff</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut </span><span style="color:#ff8f40;">self</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">Self</span><span style="color:#ed9366;">::</span><span>ProvideStuff<'</span><span style="color:#ed9366;">_</span><span>></span><span style="color:#61676ccc;">;
</span><span>}
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// This function is generic over something providing us stuff.
</span><span>async </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">do_stuff</span><span><P</span><span style="color:#61676ccc;">:</span><span> StuffProvider>(</span><span style="color:#fa6e32;">mut </span><span style="color:#ff8f40;">provider</span><span style="color:#61676ccc;">:</span><span> P) </span><span style="color:#61676ccc;">-></span><span> Stuff {
</span><span> </span><span style="color:#fa6e32;">let mut</span><span> stuff </span><span style="color:#ed9366;">=</span><span> provider</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">provide_stuff</span><span>()</span><span style="color:#ed9366;">.</span><span>await</span><span style="color:#61676ccc;">;
</span><span> stuff</span><span style="color:#ed9366;">.</span><span style="color:#ff8f40;">0 </span><span style="color:#ed9366;">+= </span><span style="color:#ff8f40;">1</span><span style="color:#61676ccc;">;
</span><span> stuff
</span><span>}
</span></code></pre>
<p>Okay, so far so good. We have our async function, and the async-trait defined.</p>
<p>What is needed now is an implementation for this trait, which returns <code>Stuff</code>
right away, without asynchronously waiting. My idea was that I can use
<code>core::future::Ready</code> for this, so lets do that:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">SyncStuffProvider</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">impl </span><span>StuffProvider </span><span style="color:#fa6e32;">for </span><span style="color:#399ee6;">SyncStuffProvider </span><span>{
</span><span> </span><span style="color:#fa6e32;">type </span><span style="color:#399ee6;">ProvideStuff</span><span style="color:#ed9366;"><</span><span style="color:#fa6e32;">'c</span><span style="color:#ed9366;">> = </span><span>Ready<Stuff></span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">provide_stuff</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut </span><span style="color:#ff8f40;">self</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">Self</span><span style="color:#ed9366;">::</span><span>ProvideStuff<'</span><span style="color:#ed9366;">_</span><span>> {
</span><span> </span><span style="color:#f07171;">ready</span><span>(Stuff(</span><span style="color:#ff8f40;">41</span><span>))
</span><span> }
</span><span>}
</span></code></pre>
<p>So far so good. I expect the resulting <code>do_stuff(SyncStuffProvider)</code> future
to return <code>Poll::Ready</code> immediately the first time it is called. But how do I
poll this future? Usually it is the job of an async executor to do the polling.
Most async executors have a <code>block_on</code> method that, well, blocks the current
thread for as long as the future needs to be ready, polling it repeatedly if
necessary. However, our future should be ready immediately.</p>
<p>Let us write a simple executor that does exactly this. It does involve a bit of
<code>unsafe</code> code, and it will certainly make you very unhappy if you happen to
actually call it with a future that does not immediately return its results.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">mod </span><span style="color:#399ee6;">ready_or_diverge </span><span>{
</span><span> </span><span style="color:#fa6e32;">use </span><span>core</span><span style="color:#ed9366;">::</span><span>future</span><span style="color:#ed9366;">::</span><span>Future</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">use </span><span>core</span><span style="color:#ed9366;">::</span><span>pin</span><span style="color:#ed9366;">::</span><span>Pin</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">use </span><span>core</span><span style="color:#ed9366;">::</span><span>task</span><span style="color:#ed9366;">::</span><span>{Context</span><span style="color:#61676ccc;">,</span><span> Poll</span><span style="color:#61676ccc;">,</span><span> RawWaker</span><span style="color:#61676ccc;">,</span><span> RawWakerVTable</span><span style="color:#61676ccc;">,</span><span> Waker}</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// copy-pasted from https://docs.rs/futures/0.3.17/futures/task/fn.noop_waker.html
</span><span> </span><span style="color:#fa6e32;">unsafe fn </span><span style="color:#f29718;">noop_clone</span><span>(</span><span style="color:#ff8f40;">_data</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">*const </span><span>()) </span><span style="color:#61676ccc;">-></span><span> RawWaker {
</span><span> </span><span style="color:#f07171;">noop_raw_waker</span><span>()
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">unsafe fn </span><span style="color:#f29718;">noop</span><span>(</span><span style="color:#ff8f40;">_data</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">*const </span><span>()) {}
</span><span>
</span><span> </span><span style="color:#fa6e32;">const </span><span style="color:#ff8f40;">NOOP_WAKER_VTABLE</span><span style="color:#61676ccc;">:</span><span> RawWakerVTable </span><span style="color:#ed9366;">= </span><span>RawWakerVTable</span><span style="color:#ed9366;">::</span><span>new(noop_clone</span><span style="color:#61676ccc;">,</span><span> noop</span><span style="color:#61676ccc;">,</span><span> noop</span><span style="color:#61676ccc;">,</span><span> noop)</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#fa6e32;">const fn </span><span style="color:#f29718;">noop_raw_waker</span><span>() </span><span style="color:#61676ccc;">-></span><span> RawWaker {
</span><span> RawWaker</span><span style="color:#ed9366;">::</span><span>new(core</span><span style="color:#ed9366;">::</span><span>ptr</span><span style="color:#ed9366;">::</span><span>null()</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span style="color:#ff8f40;">NOOP_WAKER_VTABLE</span><span>)
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">block_on</span><span><O, F</span><span style="color:#61676ccc;">: </span><span>Future<Output = O>>(</span><span style="color:#fa6e32;">mut </span><span style="color:#ff8f40;">fut</span><span style="color:#61676ccc;">:</span><span> F) </span><span style="color:#61676ccc;">-></span><span> O {
</span><span> </span><span style="color:#fa6e32;">let</span><span> waker </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{ Waker</span><span style="color:#ed9366;">::</span><span>from_raw(</span><span style="color:#f07171;">noop_raw_waker</span><span>()) }</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let mut</span><span> context </span><span style="color:#ed9366;">= </span><span>Context</span><span style="color:#ed9366;">::</span><span>from_waker(</span><span style="color:#ed9366;">&</span><span>waker)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> pinned </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{ Pin</span><span style="color:#ed9366;">::</span><span>new_unchecked(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> fut) }</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">match</span><span> pinned</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">poll</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> context) {
</span><span> Poll</span><span style="color:#ed9366;">::</span><span>Ready(res) </span><span style="color:#ed9366;">=> </span><span style="color:#fa6e32;">return</span><span> res</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ed9366;">_ => </span><span style="color:#fa6e32;">loop </span><span>{}</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// diverge
</span><span> }
</span><span> }
</span><span>}
</span></code></pre>
<p>What this code does is a bit of boilerplate, and a single <code>poll</code>. If we get the
result immediately, fine. Otherwise, loop forever, which is a way to tell Rust
that the function will never return in such a case.</p>
<p>Putting this all together:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">do_stuff_sync</span><span>() </span><span style="color:#61676ccc;">-></span><span> Stuff {
</span><span> </span><span style="color:#fa6e32;">let</span><span> fut </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">do_stuff</span><span>(SyncStuffProvider)</span><span style="color:#61676ccc;">;
</span><span> ready_or_diverge</span><span style="color:#ed9366;">::</span><span>block_on(fut)
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">test</span><span>]
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">does_stuff_sync</span><span>() {
</span><span> </span><span style="color:#fa6e32;">let</span><span> stuff </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">do_stuff_sync</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(stuff</span><span style="color:#ed9366;">.</span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">42</span><span>)</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>And using <code>cargo +nightly test</code>, we see that things at least work as we expected:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>running 1 test
</span><span>test does_stuff_sync ... ok
</span><span>
</span><span>test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
</span></code></pre>
<p>Very good! But how about the zero-cost abstractions that I wanted to talk about?</p>
<p>Well, for that we would have to actually look at the assembly code that the
compiler generated. I suggest the <a href="https://godbolt.org/z/865fWrE8P">Compiler Explorer</a>
for that. And sure enough, with optimizations turned on, the Rust compiler is
actually smart enough to see through all of our executor, async and trait code
and compiles it all away to just a simple return.</p>
<p>I am truly amazed! At least for such a simple example, async code is completely
zero-cost.</p>
<p>However, I wonder at which level of complexity the compiler might fail to do so.</p>
<p>In my research, I have found an <a href="https://github.com/rust-lang/rust/issues/71093">issue#71093</a>
which highlights a fairly simple case as well in which the compiler was not smart
enough to optimize things away. That issue specifically mentions panicking code,
since <code>async fn</code> will usually generate a panic in case the returned future is
polled again after it successfully returned a value.</p>
<p>That also made me think that maybe using something like
<a href="https://github.com/dtolnay/no-panic">no-panic</a> might provide at least half a
solution here. I wonder if I can use the same tricks in my <code>ready_or_diverge</code>
to rather make it "ready or fail to compile".</p>
<p>But that is an exercise for another day.</p>
Creating my own bespoke binary format2021-09-01T00:00:00+00:002021-09-01T00:00:00+00:00
Unknown
https://swatinem.de/blog/binary-formats/<h1 id="don-t-do-it"><a class="anchor-link" href="#don-t-do-it" aria-label="Anchor link for: don-t-do-it">#</a>
Don’t do it</h1>
<p>Well first off the bat, I want to repeat something that I read some time ago:</p>
<blockquote>
<p>Don’t create your own bespoke binary format! Use JSON!</p>
</blockquote>
<p>I agree with this. Having dealt with binary data quite recently, and seeing how
some of the formats are neither well documented, or some of the writers actually
creating invalid data, I can truly say this is really hard to get right!</p>
<p>And if you want to have a one-off data format for something, the best bet it to
just use JSON for it. I would argue that most applications are not as resource
constrained as to not tolerate the overhead that JSON parsing would incur.</p>
<p>Alas, my use-case has some hard constraints. Also most of my learnings come from
designing my own binary format for an interesting problem I wanted to solve a few
month ago. I haven’t published the result of that work yet, but maybe I will
create another post about that in the future.</p>
<p>My general design constraints are like this:</p>
<ul>
<li>I want the format to be compact</li>
<li>I want it to be zero-copy, as in: to be able to access all the contents from
an underlying buffer, without allocating</li>
<li>I want to allow random access</li>
</ul>
<p>For my current project, I want to avoid using JSON for another reason: I have
to use C, and since there is no <strong>universal</strong> package manager for that language,
I simply can’t <code>use serde_json</code> and be done with it. Actually importing a JSON
serializer seems like such a hassle that I might as well create my own binary
format.</p>
<p>^ This makes me think of how many horrible and crappy formats we have for the
only reason that it is way too difficult to actually pull in a JSON
parser/serializer into a C-based project. Yes, we did that as well in
<code>sentry-native</code>, and we did have unsafe and crashing code in it that we found
via fuzzing. Exactly the point I wanted to make, lol.</p>
<h1 id="interlude-aos-vs-soa"><a class="anchor-link" href="#interlude-aos-vs-soa" aria-label="Anchor link for: interlude-aos-vs-soa">#</a>
Interlude: AoS vs SoA</h1>
<p>As a small interlude, just tangentially related to the topic at hand, let me
quickly explain the difference between <em>Array of Structs</em> vs <em>Struct of Arrays</em>.</p>
<p>The <em>SoA</em> approach is popular in game development and when using
Entity-Component-Systems. It is often used to optimize memory access patterns,
thus improving performance, and might also improve memory usage by avoiding
wasted padding bytes (clownshoes).</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">AosRoot </span><span>{
</span><span> elements</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span><Aos>,
</span><span>}
</span><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">Aos </span><span>{
</span><span> a</span><span style="color:#61676ccc;">:</span><span> A,
</span><span> b</span><span style="color:#61676ccc;">:</span><span> B,
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">SoaRoot </span><span>{
</span><span> a</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span><A>,
</span><span> b</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span><B>,
</span><span>}
</span></code></pre>
<p>I think I will come back to this concept later on when talking about tree-shaped
data, lets move on for now.</p>
<h1 id="format-compatibility"><a class="anchor-link" href="#format-compatibility" aria-label="Anchor link for: format-compatibility">#</a>
Format Compatibility</h1>
<p>Apart from not having any kind of format compatibility at all, there are in
general two forms of compatibility.</p>
<p><strong>Backwards compatibility</strong> means that an up-to-date reader can read data
produced by an outdated writer. It means that I can continue to read old legacy
versions of the format. I will mostly focus on this, as I control the reading
part and can keep it up-to-date and this form of compatibility is easy to
implement.</p>
<p><strong>Forwards compatibility</strong> means that data produced by an up-to-date writer can
be read by an outdated reader. This is a lot more challenging, and also means
that changes to the format need to be anticipated, otherwise it is impossible
to make it extensible while still being able to support reading a new file with
an outdated reader.</p>
<h2 id="backwards-compatibility-via-version-tagging"><a class="anchor-link" href="#backwards-compatibility-via-version-tagging" aria-label="Anchor link for: backwards-compatibility-via-version-tagging">#</a>
Backwards compatibility via version tagging</h2>
<p>The simpler of the two is keeping backwards compatibility to older files with
a newer reader. For this, I can simply tag the root of my data format with a
version number, and have separate code for reading/interpreting each of the
known versions. It is not forwards compatible, as the only thing the reader can
do when it encounters an unknown version, all it can reasonably do is to throw
a hard error and refuse to parse the format.</p>
<p>The implementation is quite simple, but you will see <em>a lot</em> of unsafe code:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">use </span><span>core</span><span style="color:#ed9366;">::</span><span>{mem</span><span style="color:#61676ccc;">,</span><span> ptr}</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C)]
</span><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">Header </span><span>{
</span><span> version</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> num_a</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> num_b</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">Format</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> buf</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> [</span><span style="color:#fa6e32;">u8</span><span>],
</span><span> header</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> Header,
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C)]
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Debug</span><span style="color:#61676ccc;">,</span><span> PartialEq</span><span style="color:#61676ccc;">,</span><span> Eq)]
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">A</span><span>(</span><span style="color:#fa6e32;">u32</span><span>)</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C)]
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Debug</span><span style="color:#61676ccc;">,</span><span> PartialEq</span><span style="color:#61676ccc;">,</span><span> Eq)]
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">B</span><span>(</span><span style="color:#fa6e32;">u32</span><span>)</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">impl</span><span><</span><span style="color:#fa6e32;">'data</span><span>> </span><span style="color:#399ee6;">Format</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">parse</span><span>(</span><span style="color:#ff8f40;">buf</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> [</span><span style="color:#fa6e32;">u8</span><span>]) </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">Self </span><span>{
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// TODO:
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// * actually verify the version
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// * ensure the buffer is actually valid
</span><span> Format {
</span><span> buf</span><span style="color:#61676ccc;">,
</span><span> header</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="color:#ed9366;">&*</span><span>(buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>() </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const</span><span> Header) }</span><span style="color:#61676ccc;">,
</span><span> }
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">get_as</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span style="color:#ff8f40;">self</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span>[A] {
</span><span> </span><span style="color:#fa6e32;">let</span><span> a_start </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">add</span><span>(mem</span><span style="color:#ed9366;">::</span><span>size_of</span><span style="color:#ed9366;">::</span><span><Header>()) </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const</span><span> A }</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> a_slice </span><span style="color:#ed9366;">= </span><span>ptr</span><span style="color:#ed9366;">::</span><span>slice_from_raw_parts(a_start</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>header</span><span style="color:#ed9366;">.</span><span>num_a </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="color:#ed9366;">&*</span><span>a_slice }
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">get_bs</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span style="color:#ff8f40;">self</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span>[B] {
</span><span> </span><span style="color:#fa6e32;">let</span><span> b_start </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{
</span><span> </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>buf
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>()
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">add</span><span>(mem</span><span style="color:#ed9366;">::</span><span>size_of</span><span style="color:#ed9366;">::</span><span><Header>())
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">add</span><span>(mem</span><span style="color:#ed9366;">::</span><span>size_of</span><span style="color:#ed9366;">::</span><span><A>() </span><span style="color:#ed9366;">* </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>header</span><span style="color:#ed9366;">.</span><span>num_a </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span>) </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const</span><span> B
</span><span> }</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> b_slice </span><span style="color:#ed9366;">= </span><span>ptr</span><span style="color:#ed9366;">::</span><span>slice_from_raw_parts(b_start</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>header</span><span style="color:#ed9366;">.</span><span>num_b </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="color:#ed9366;">&*</span><span>b_slice }
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">test</span><span>]
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">format_works</span><span>() {
</span><span> </span><span style="color:#fa6e32;">let</span><span> buf </span><span style="color:#ed9366;">= </span><span>[
</span><span> </span><span style="color:#ff8f40;">1</span><span style="color:#fa6e32;">u32</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// version
</span><span> </span><span style="color:#ff8f40;">1</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// num_a
</span><span> </span><span style="color:#ff8f40;">2</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// num_b
</span><span> </span><span style="color:#ff8f40;">3</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[0]
</span><span> </span><span style="color:#ff8f40;">4</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// b[0]
</span><span> </span><span style="color:#ff8f40;">5</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// b[1]
</span><span> ]</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> buf </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{
</span><span> </span><span style="color:#ed9366;">&*</span><span>(ptr</span><span style="color:#ed9366;">::</span><span>slice_from_raw_parts(buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>() </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const u8</span><span style="color:#61676ccc;">,</span><span> buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">len</span><span>() </span><span style="color:#ed9366;">* </span><span>mem</span><span style="color:#ed9366;">::</span><span>size_of</span><span style="color:#ed9366;">::</span><span><</span><span style="color:#fa6e32;">u32</span><span>>()))
</span><span> }</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#fa6e32;">let</span><span> parsed </span><span style="color:#ed9366;">= </span><span>Format</span><span style="color:#ed9366;">::</span><span>parse(buf)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_as</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>[A(</span><span style="color:#ff8f40;">3</span><span>)])</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_bs</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>[B(</span><span style="color:#ff8f40;">4</span><span>)</span><span style="color:#61676ccc;">,</span><span> B(</span><span style="color:#ff8f40;">5</span><span>)])</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>Like I said, a ton of <code>unsafe</code>. There is a lot of pointer arithmetic involved,
but the upside is that we are entirely working with borrowed data, and might as
well use the code in a <code>#![no_std]</code> environment.</p>
<h2 id="tree-structured-data"><a class="anchor-link" href="#tree-structured-data" aria-label="Anchor link for: tree-structured-data">#</a>
Tree-structured Data</h2>
<p>I think it is also quite easy to support tree- or maybe even graph-shaped data.
The previous example showed how to get two types as a complete slice. If the
first data type includes a <code>start</code>+<code>len</code> tuple of its child data, we can simply
sub-slice the other data based on that. Lets change our above example.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">use </span><span>core</span><span style="color:#ed9366;">::</span><span>{mem</span><span style="color:#61676ccc;">,</span><span> ptr}</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C)]
</span><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">Header </span><span>{
</span><span> version</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> num_a</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> num_b</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C)]
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">A </span><span>{
</span><span> own_data</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> start_b</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> num_b</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C)]
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Debug</span><span style="color:#61676ccc;">,</span><span> PartialEq</span><span style="color:#61676ccc;">,</span><span> Eq)]
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">B</span><span>(</span><span style="color:#fa6e32;">u32</span><span>)</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">Format</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> buf</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> [</span><span style="color:#fa6e32;">u8</span><span>],
</span><span> header</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> Header,
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl</span><span><</span><span style="color:#fa6e32;">'data</span><span>> </span><span style="color:#399ee6;">Format</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">parse</span><span>(</span><span style="color:#ff8f40;">buf</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> [</span><span style="color:#fa6e32;">u8</span><span>]) </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">Self </span><span>{
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// TODO:
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// * actually verify the version
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// * ensure the buffer is actually valid
</span><span> Format {
</span><span> buf</span><span style="color:#61676ccc;">,
</span><span> header</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="color:#ed9366;">&*</span><span>(buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>() </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const</span><span> Header) }</span><span style="color:#61676ccc;">,
</span><span> }
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">get_as</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span style="color:#ff8f40;">self</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span>[A] {
</span><span> </span><span style="color:#fa6e32;">let</span><span> a_start </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">add</span><span>(mem</span><span style="color:#ed9366;">::</span><span>size_of</span><span style="color:#ed9366;">::</span><span><Header>()) </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const</span><span> A }</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> a_slice </span><span style="color:#ed9366;">= </span><span>ptr</span><span style="color:#ed9366;">::</span><span>slice_from_raw_parts(a_start</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>header</span><span style="color:#ed9366;">.</span><span>num_a </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="color:#ed9366;">&*</span><span>a_slice }
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">get_bs_for</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">a</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> A) </span><span style="color:#61676ccc;">-> </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span>[B] {
</span><span> </span><span style="color:#fa6e32;">let</span><span> b_start </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{
</span><span> </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>buf
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>()
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">add</span><span>(mem</span><span style="color:#ed9366;">::</span><span>size_of</span><span style="color:#ed9366;">::</span><span><Header>())
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">add</span><span>(mem</span><span style="color:#ed9366;">::</span><span>size_of</span><span style="color:#ed9366;">::</span><span><A>() </span><span style="color:#ed9366;">* </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>header</span><span style="color:#ed9366;">.</span><span>num_a </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span>) </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const</span><span> B
</span><span> }</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> b_slice </span><span style="color:#ed9366;">= </span><span>ptr</span><span style="color:#ed9366;">::</span><span>slice_from_raw_parts(b_start</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>header</span><span style="color:#ed9366;">.</span><span>num_b </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> range </span><span style="color:#ed9366;">=</span><span> a</span><span style="color:#ed9366;">.</span><span>start_b </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span style="color:#ed9366;">..</span><span>a</span><span style="color:#ed9366;">.</span><span>start_b </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize </span><span style="color:#ed9366;">+</span><span> a</span><span style="color:#ed9366;">.</span><span>num_b </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">unsafe </span><span>{ (</span><span style="color:#ed9366;">&*</span><span>b_slice)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_unchecked</span><span>(range) }
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">test</span><span>]
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">format_works</span><span>() {
</span><span> </span><span style="color:#fa6e32;">let</span><span> buf </span><span style="color:#ed9366;">= </span><span>[
</span><span> </span><span style="color:#ff8f40;">1</span><span style="color:#fa6e32;">u32</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// version
</span><span> </span><span style="color:#ff8f40;">2</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// num_a
</span><span> </span><span style="color:#ff8f40;">3</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// num_b
</span><span> </span><span style="color:#ff8f40;">3</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[0].own_data
</span><span> </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[0].start_b
</span><span> </span><span style="color:#ff8f40;">1</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[0].num_b
</span><span> </span><span style="color:#ff8f40;">4</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[1].own_data
</span><span> </span><span style="color:#ff8f40;">1</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[1].start_b
</span><span> </span><span style="color:#ff8f40;">2</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[1].num_b
</span><span> </span><span style="color:#ff8f40;">1</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[0].bs[0]
</span><span> </span><span style="color:#ff8f40;">2</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[1].bs[0]
</span><span> </span><span style="color:#ff8f40;">3</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[1].bs[1]
</span><span> ]</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> buf </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{
</span><span> </span><span style="color:#ed9366;">&*</span><span>(ptr</span><span style="color:#ed9366;">::</span><span>slice_from_raw_parts(buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>() </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const u8</span><span style="color:#61676ccc;">,</span><span> buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">len</span><span>() </span><span style="color:#ed9366;">* </span><span>mem</span><span style="color:#ed9366;">::</span><span>size_of</span><span style="color:#ed9366;">::</span><span><</span><span style="color:#fa6e32;">u32</span><span>>()))
</span><span> }</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#fa6e32;">let</span><span> parsed </span><span style="color:#ed9366;">= </span><span>Format</span><span style="color:#ed9366;">::</span><span>parse(buf)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> r</span><span style="color:#ed9366;">#as =</span><span> parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_as</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(r</span><span style="color:#ed9366;">#as.</span><span style="color:#f07171;">len</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">2</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(r</span><span style="color:#ed9366;">#as</span><span>[</span><span style="color:#ff8f40;">0</span><span>]</span><span style="color:#ed9366;">.</span><span>own_data</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">3</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_bs_for</span><span>(</span><span style="color:#ed9366;">&</span><span>r</span><span style="color:#ed9366;">#as</span><span>[</span><span style="color:#ff8f40;">0</span><span>])</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>[B(</span><span style="color:#ff8f40;">1</span><span>)])</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(r</span><span style="color:#ed9366;">#as</span><span>[</span><span style="color:#ff8f40;">1</span><span>]</span><span style="color:#ed9366;">.</span><span>own_data</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">4</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_bs_for</span><span>(</span><span style="color:#ed9366;">&</span><span>r</span><span style="color:#ed9366;">#as</span><span>[</span><span style="color:#ff8f40;">1</span><span>])</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>[B(</span><span style="color:#ff8f40;">2</span><span>)</span><span style="color:#61676ccc;">,</span><span> B(</span><span style="color:#ff8f40;">3</span><span>)])</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>This did get a little bit more difficult, but not terribly so. Still a lot of
pointer arithmetic and <code>unsafe</code> code in there.
One inconvenience here is that you need to connect the <code>Parser</code> holding the
raw buffer to our returned parent data.</p>
<h3 id="making-things-safe"><a class="anchor-link" href="#making-things-safe" aria-label="Anchor link for: making-things-safe">#</a>
Making things safe</h3>
<p>So we have a quite nice format, parsing essentially comes down to pointer
arithmetic and a few casts. And everything without allocating.
But how can we make it safe? By that I don’t mean to remove all the <code>unsafe</code>
blocks, but to rather ensure that whatever happens inside the unsafe blocks is
actually safe.</p>
<p>Well, we can make things memory safe by making sure that the pointers are
properly aligned, and that we have appropriate padding in between our raw
arrays. The next obvious thing to do it to make sure that we don’t read out of
bounds. Luckily Rust makes this super easy to for example do checked
sub-slicing. We can bounds check our <code>Header</code> in a way to ensure that the
numbers of embedded Data does not go out of bounds of the buffer. Similarly,
all the cross-struct references need to be similarly bounds-checked.</p>
<p>This I think should take care of all the unsafety while accessing the data. The
problem remains that we simply cast random bytes. The validity of the actual
data that we point to depends on the correctness of the writer.
CRC-ing the whole buffer (minus version number and CRC-sum itself) might be
another idea to at least guard against bitflips, but does not guard against
writer correctness. I guess if you manipulate the raw data in a certain way,
and we have to remember, we are dealing with user-provided data here, so we
must keep it in mind that some malicious data will fly our way.</p>
<h3 id="disadvantages"><a class="anchor-link" href="#disadvantages" aria-label="Anchor link for: disadvantages">#</a>
Disadvantages</h3>
<p>We have seen one inconvenience above already. We need access to the raw buffer
when we want to access referenced data. There is a simple solution to this, and
that is to separate the raw data structures from the public API. The public
API would use an iterator that yields stack-allocated copies of our public
structs, still <code>#![no_std]</code> compatible, unless the user decides to <code>collect</code>
those into a heap allocated <code>Vec</code>.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">PublicA</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> format</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span>Format<</span><span style="color:#fa6e32;">'data</span><span>>,
</span><span> raw_a</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> A,
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl</span><span><</span><span style="color:#fa6e32;">'data</span><span>> </span><span style="color:#399ee6;">PublicA</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">get_bs</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span style="color:#ff8f40;">self</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span>[B] {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>format</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_bs_for</span><span>(</span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>raw_a)
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl</span><span><</span><span style="color:#fa6e32;">'data</span><span>> </span><span style="color:#399ee6;">Format</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">iter_a</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span style="color:#ff8f40;">self</span><span>) </span><span style="color:#61676ccc;">-></span><span> impl </span><span style="font-style:italic;color:#55b4d4;">Iterator</span><span><Item = PublicA<</span><span style="color:#fa6e32;">'data</span><span>>> {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_as</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">iter</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(</span><span style="color:#fa6e32;">move </span><span style="color:#ed9366;">|</span><span>raw_a</span><span style="color:#ed9366;">|</span><span> PublicA {
</span><span> format</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#61676ccc;">,
</span><span> raw_a</span><span style="color:#61676ccc;">,
</span><span> })
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">let mut</span><span> iter_a </span><span style="color:#ed9366;">=</span><span> parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">iter_a</span><span>()</span><span style="color:#61676ccc;">;
</span><span style="color:#f07171;">assert_eq!</span><span>(iter_a</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">next</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_bs</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>[B(</span><span style="color:#ff8f40;">1</span><span>)])</span><span style="color:#61676ccc;">;
</span><span style="color:#f07171;">assert_eq!</span><span>(iter_a</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">next</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_bs</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>[B(</span><span style="color:#ff8f40;">2</span><span>)</span><span style="color:#61676ccc;">,</span><span> B(</span><span style="color:#ff8f40;">3</span><span>)])</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>We can have a very ergonomic and still very efficient non-allocating API this
way.</p>
<p>One problem though remains. The writer side of this might not look as nice.
We cannot directly write the format in one go and stream it into a file.
Instead, we have to allocate intermediate <code>A</code> and <code>B</code> vectors, get relative
indices for the child <code>B</code>s, and then write these different buffers one after
the other into a file.</p>
<h2 id="forwards-compatibility-via-sizeof-tagging"><a class="anchor-link" href="#forwards-compatibility-via-sizeof-tagging" aria-label="Anchor link for: forwards-compatibility-via-sizeof-tagging">#</a>
Forwards compatibility via sizeof tagging</h2>
<p>To re-iterate, being forwards compatible means that an outdated reader can still
read a newer file format. One way I found out while looking at the minidump
format, and some of the Windows APIs is that structures are tagged with their
own <code>size_of</code> values. So even though an outdated writer cannot interpret the
newly added fields, it at least knows how many bytes to skip. Random access
still works, though directly returning typed slices would not I assume.
Good thing Rust has Iterators ;-)</p>
<p>One more interesting thing is that some Windows APIs work this way. You call
them with the <code>size_of</code> of a stack allocated out-parameter. The API can return
(well, write into an out-parameter) different versions of a data type, so the
same function can be used from applications targeting the older version of the
Windows API, while newer applications get extended data.</p>
<p>It works, but it is a pain in the rear to use. At least from C. I think with
Rust it might be possible to write ergonomic wrappers for such an API, though
I haven’t looked or tried myself.</p>
<p>Lets go on and try to implement this kind of format:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">use </span><span>core</span><span style="color:#ed9366;">::</span><span>{mem</span><span style="color:#61676ccc;">,</span><span> ptr}</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C)]
</span><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">Header </span><span>{
</span><span> version</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> sizeof_header</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> sizeof_a</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> sizeof_b</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> num_a</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> num_b</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C)]
</span><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">RawA </span><span>{
</span><span> own_data</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> start_b</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> num_b</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">repr</span><span>(C)]
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Debug</span><span style="color:#61676ccc;">,</span><span> PartialEq</span><span style="color:#61676ccc;">,</span><span> Eq)]
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">B</span><span>(</span><span style="color:#fa6e32;">u32</span><span>)</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">Format</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> buf</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> [</span><span style="color:#fa6e32;">u8</span><span>],
</span><span> header</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> Header,
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl</span><span><</span><span style="color:#fa6e32;">'data</span><span>> </span><span style="color:#399ee6;">Format</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">parse</span><span>(</span><span style="color:#ff8f40;">buf</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> [</span><span style="color:#fa6e32;">u8</span><span>]) </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">Self </span><span>{
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// TODO:
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// * actually verify the version
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// * ensure the buffer is actually valid
</span><span> Format {
</span><span> buf</span><span style="color:#61676ccc;">,
</span><span> header</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="color:#ed9366;">&*</span><span>(buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>() </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const</span><span> Header) }</span><span style="color:#61676ccc;">,
</span><span> }
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">get_a</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">idx</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> RawA {
</span><span> </span><span style="color:#fa6e32;">let</span><span> a_start </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">add</span><span>(</span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>header</span><span style="color:#ed9366;">.</span><span>sizeof_header) }</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> a </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{ a_start</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">add</span><span>(</span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>header</span><span style="color:#ed9366;">.</span><span>sizeof_a </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize </span><span style="color:#ed9366;">*</span><span> idx) }</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="color:#ed9366;">&*</span><span>(a </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const</span><span> RawA) }
</span><span> }
</span><span>
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">get_b</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">idx</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> B {
</span><span> </span><span style="color:#fa6e32;">let</span><span> b_start </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{
</span><span> </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>buf
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>()
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">add</span><span>(</span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>header</span><span style="color:#ed9366;">.</span><span>sizeof_header)
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">add</span><span>(</span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>header</span><span style="color:#ed9366;">.</span><span>sizeof_a </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize </span><span style="color:#ed9366;">* </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>header</span><span style="color:#ed9366;">.</span><span>num_a </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span>)
</span><span> }</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> b </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{ b_start</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">add</span><span>(</span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>header</span><span style="color:#ed9366;">.</span><span>sizeof_b </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize </span><span style="color:#ed9366;">*</span><span> idx) }</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">unsafe </span><span>{ </span><span style="color:#ed9366;">&*</span><span>(b </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const</span><span> B) }
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">PublicA</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> format</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span>Format<</span><span style="color:#fa6e32;">'data</span><span>>,
</span><span> raw_a</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> RawA,
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl</span><span><</span><span style="color:#fa6e32;">'data</span><span>> </span><span style="color:#399ee6;">PublicA</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">iter_b</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span style="color:#ff8f40;">self</span><span>) </span><span style="color:#61676ccc;">-></span><span> impl </span><span style="font-style:italic;color:#55b4d4;">Iterator</span><span><Item = </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data</span><span> B> {
</span><span> </span><span style="color:#fa6e32;">let</span><span> raw_a </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>raw_a</span><span style="color:#61676ccc;">;
</span><span> (raw_a</span><span style="color:#ed9366;">.</span><span>start_b </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span style="color:#ed9366;">..</span><span>raw_a</span><span style="color:#ed9366;">.</span><span>start_b </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize </span><span style="color:#ed9366;">+</span><span> raw_a</span><span style="color:#ed9366;">.</span><span>num_b </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span>)
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(</span><span style="color:#fa6e32;">move </span><span style="color:#ed9366;">|</span><span>idx</span><span style="color:#ed9366;">| </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>format</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_b</span><span>(idx))
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl</span><span><</span><span style="color:#fa6e32;">'data</span><span>> </span><span style="color:#399ee6;">Format</span><span><</span><span style="color:#fa6e32;">'data</span><span>> {
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">iter_a</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'data </span><span style="color:#ff8f40;">self</span><span>) </span><span style="color:#61676ccc;">-></span><span> impl </span><span style="font-style:italic;color:#55b4d4;">Iterator</span><span><Item = PublicA<</span><span style="color:#fa6e32;">'data</span><span>>> {
</span><span> (</span><span style="color:#ff8f40;">0</span><span style="color:#ed9366;">..</span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>header</span><span style="color:#ed9366;">.</span><span>num_a </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">usize</span><span>)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">map</span><span>(</span><span style="color:#fa6e32;">move </span><span style="color:#ed9366;">|</span><span>idx</span><span style="color:#ed9366;">|</span><span> PublicA {
</span><span> format</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#61676ccc;">,
</span><span> raw_a</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_a</span><span>(idx)</span><span style="color:#61676ccc;">,
</span><span> })
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">test</span><span>]
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">format_works</span><span>() {
</span><span> </span><span style="color:#fa6e32;">let</span><span> buf </span><span style="color:#ed9366;">= </span><span>[
</span><span> </span><span style="color:#ff8f40;">1</span><span style="color:#fa6e32;">u32</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// version
</span><span> </span><span style="color:#ff8f40;">28</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// sizeof_header
</span><span> </span><span style="color:#ff8f40;">16</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// sizeof_a
</span><span> </span><span style="color:#ff8f40;">8</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// sizeof_b
</span><span> </span><span style="color:#ff8f40;">2</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// num_a
</span><span> </span><span style="color:#ff8f40;">3</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// num_b
</span><span> </span><span style="color:#ff8f40;">123</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// 🤷🏻♂️
</span><span> </span><span style="color:#ff8f40;">3</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[0].own_data
</span><span> </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[0].start_b
</span><span> </span><span style="color:#ff8f40;">1</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[0].num_b
</span><span> </span><span style="color:#ff8f40;">123</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// 🤷🏻♂️
</span><span> </span><span style="color:#ff8f40;">4</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[1].own_data
</span><span> </span><span style="color:#ff8f40;">1</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[1].start_b
</span><span> </span><span style="color:#ff8f40;">2</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[1].num_b
</span><span> </span><span style="color:#ff8f40;">123</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// 🤷🏻♂️
</span><span> </span><span style="color:#ff8f40;">1</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[0].bs[0]
</span><span> </span><span style="color:#ff8f40;">123</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// 🤷🏻♂️
</span><span> </span><span style="color:#ff8f40;">2</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[1].bs[0]
</span><span> </span><span style="color:#ff8f40;">123</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// 🤷🏻♂️
</span><span> </span><span style="color:#ff8f40;">3</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// a[1].bs[1]
</span><span> </span><span style="color:#ff8f40;">123</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// 🤷🏻♂️
</span><span> ]</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> buf </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{
</span><span> </span><span style="color:#ed9366;">&*</span><span>(ptr</span><span style="color:#ed9366;">::</span><span>slice_from_raw_parts(buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">as_ptr</span><span>() </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const u8</span><span style="color:#61676ccc;">,</span><span> buf</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">len</span><span>() </span><span style="color:#ed9366;">* </span><span>mem</span><span style="color:#ed9366;">::</span><span>size_of</span><span style="color:#ed9366;">::</span><span><</span><span style="color:#fa6e32;">u32</span><span>>()))
</span><span> }</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#fa6e32;">let</span><span> parsed </span><span style="color:#ed9366;">= </span><span>Format</span><span style="color:#ed9366;">::</span><span>parse(buf)</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#fa6e32;">let mut</span><span> iter_a </span><span style="color:#ed9366;">=</span><span> parsed</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">iter_a</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> a </span><span style="color:#ed9366;">=</span><span> iter_a</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">next</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let mut</span><span> iter_b </span><span style="color:#ed9366;">=</span><span> a</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">iter_b</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(iter_b</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">next</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>B(</span><span style="color:#ff8f40;">1</span><span>))</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let</span><span> a </span><span style="color:#ed9366;">=</span><span> iter_a</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">next</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let mut</span><span> iter_b </span><span style="color:#ed9366;">=</span><span> a</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">iter_b</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(iter_b</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">next</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>B(</span><span style="color:#ff8f40;">2</span><span>))</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(iter_b</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">next</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">unwrap</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#ed9366;">&</span><span>B(</span><span style="color:#ff8f40;">3</span><span>))</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>Well, this was easy enough! Thanks to Rusts <code>impl Iterator</code>, this also happens
to be very ergonomic. One disadvantage is that we can’t use slices directly,
and we would need to <em>always</em> go through the iterators, and allocate and
<code>collect</code> anytime we want a <code>Vec</code> or a slice.</p>
<h1 id="dynamically-sized-data"><a class="anchor-link" href="#dynamically-sized-data" aria-label="Anchor link for: dynamically-sized-data">#</a>
Dynamically sized Data</h1>
<p>Let me emphasize this very clearly: <strong>Don’t do this!</strong> I’m serious!</p>
<p>Some data formats don’t use an array of equally-sized objects, but rather
dynamically sized objects. Maybe they have string-data embedded, maybe they have
a type tag which says how large the structure is. There are two major problems
with this.</p>
<p>First, you can’t randomly-access the n-th item, you have to parse them
one-by-one.</p>
<p>The second problem is that instead of using indices to cross-reference data,
you rather use byte offsets. To be honest, this is a bit simpler than relying on
pointer arithmetic. But it has one critical shortcoming. It relies on the
correctness of the writer.</p>
<p>I recently debugged a problem with the PDB format that uses dynamically-sized
data, and relative virtual addressing (RVA) to cross-reference entries. Well,
one of these references was wrongly aligned, and reading/casting that reference
yielded garbage. While looking at this problem, I found out that all these wrong
entries had one thing in common, but I still haven’t found out why they were
corrupt, or how to fix them. If anyone is curious, the case I’m referring to is
documented <a href="https://github.com/getsentry/symbolic/blob/4e3dc0a9f211588b140bdf3fdf1658fcab8cefcf/symbolic-debuginfo/src/pdb.rs#L1125-L1135">here</a>.</p>
<p>Please don’t do this, avoid dynamically-sized data as much as possible.</p>
<h2 id="embedded-tree-structured-data"><a class="anchor-link" href="#embedded-tree-structured-data" aria-label="Anchor link for: embedded-tree-structured-data">#</a>
Embedded Tree-structured Data</h2>
<p>Even worse than having dynamically-sized data at all is to have it for nested
tree-structured data.</p>
<p>Remember how dynamically-sized data means we can’t have random access. Well,
with tree-structured dynamic data we made things even worse, as we would have
to walk and parse a complete sub-tree in order to advance. Well, that is a pain.
We could kind-of solve this problem by having separate <code>sizeof_self</code> and
<code>sizeof_self_including_subtree</code> sizes, so we can use the latter to skip over the
subtree without walking it.</p>
<p>Its doable, but we are trading storage space for parsing performance. Not to
mention that we are already wasting a lot of storage space to either version-tag
or sizeof-tag every single entity.</p>
<h1 id="conclusion"><a class="anchor-link" href="#conclusion" aria-label="Anchor link for: conclusion">#</a>
Conclusion</h1>
<p>I guess we have come to the end, and I am actually super happy about having
written down all my thoughts on this topic, and even implemented a few ideas in
runnable Rust code (though missing all the bounds-checks).</p>
<p>So in summary, if you can’t use JSON for your data, and absolute need to use a
custom binary format, here are some tips and observations:</p>
<ul>
<li>Use the <em>Struct of Arrays</em> pattern, and give each of your entities its own
Array.</li>
<li>Cross-reference entities via indices into those arrays, or sub-slice via
index+len.</li>
<li>If you care about forwards-compatibility, use sizeof-tagging, so that outdated
readers know how to skip data they don’t know how to interpret.</li>
<li>If going with Iterators, use separate <em>raw</em> vs <em>public</em> structs for a nice API.</li>
<li><strong>Avoid dynamically-sized data at all costs</strong>, especially tree-structured.</li>
</ul>
<p>And just as a finishing rant, just keep your damn software up-to-date!
Even though the forwards-compatible version didn’t turn out to be half as bad
as I had imagined.
Even though I have full control over the reader for my use-case, I can imagine
a scenario where we would have to roll-back a reader upgrade, and might end up
in a situation where the reader is outdated compared to the writer. So I might
as well end up using the sizeof-tagging approach.</p>
Howto Design an infallible algorithm that records errors2021-07-07T00:00:00+00:002021-07-07T00:00:00+00:00
Unknown
https://swatinem.de/blog/infallible-errors/<p>A quick note first: The "howto" does not mean I present a solution here, but
rather that I am searching for a good solution.</p>
<p>With that out of the way, let me explain what I want to do. I would like to
have an algorithm that always succeeds and produces <em>some</em> kind of result.
And I want to simultaneously output any kind of error that happens while running
that algorithm.</p>
<p>I can quickly think about two solutions here (please ignore any syntax or type
errors here):</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">solution_1</span><span>(</span><span style="color:#ff8f40;">input</span><span style="color:#61676ccc;">:</span><span> Input) </span><span style="color:#61676ccc;">-> </span><span>(Output, </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span><</span><span style="font-style:italic;color:#55b4d4;">Box</span><span><dyn Error>>)</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">trait </span><span style="color:#399ee6;">ErrorCollector </span><span>{
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">record_error</span><span><E</span><span style="color:#61676ccc;">:</span><span> Error>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut </span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">error</span><span style="color:#61676ccc;">:</span><span> E) {}
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">solution_2</span><span>(</span><span style="color:#ff8f40;">input</span><span style="color:#61676ccc;">:</span><span> Input, </span><span style="color:#ff8f40;">error_collector</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> dyn ErrorCollector) </span><span style="color:#61676ccc;">-></span><span> Output</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>The first solution <em>returns</em> a list of errors along with the normal output. One
disadvantage of this is that it allocates and boxes the errors, so it won’t work
in a <code>no_std</code> environment. Although that is not necessary for my particular
use-case, even though it might be nice to have in general.
The other disadvantage is that is will allocate and return the list
unconditionally, even if the caller just doesn’t care.</p>
<p>The second solution takes a <code>collector</code> that we feed errors into. The caller is
responsible to either collect the errors or to just ignore them. I kind of favor
this solution.</p>
<p>One more question would be: Do I <em>always</em> want to continue processing when an
error happens? With the <code>ErrorCollector</code> solution, it might be possible to give
feedback whether the processor should continue, or early-exit. The processor
might as well continue to return a <code>Result</code> as well, making a difference between
recoverable or non-recoverable errors.</p>
<h1 id="some-prior-art"><a class="anchor-link" href="#some-prior-art" aria-label="Anchor link for: some-prior-art">#</a>
Some prior Art</h1>
<p>I do have a thing for programming language tooling, and there is quite some
prior art for resilient parsers there. It is a question of how convenient it is
to work with the results however.</p>
<p>There is a difference between parsers outputting an <em>abstract syntax tree</em>, and
ones that rather output a <em>concrete syntax tree</em>. The ones that work with CSTs
seem to rather output token trees, with special <code>Error</code> tokens which can happen
<em>anywhere</em> in the CST. On the other hands, previous AST-based parsers would
rather output a list of errors along with the normal AST. The thing that makes
these inconvenient is that all the AST properties are optional, which makes it
a bit hard to use.</p>
<p>I have worked with the output of the TypeScript parser a lot. It is a resilient
parser that can as well parse invalid code as much as possible. However for my
use-case, I only work with valid code. And for that use-case, having to always
match on, or unwrap the TypeScript equivalent of <code>Option</code> is a bit of a pain.</p>
<p>I have heard some good things about the incremental and resilient parser of
Rust-Analyzer, which produces a token-based CST, as far as I know. I have never
directly worked with it, so I can’t speak to how convenient it is to use and
match on code structures though.</p>
<h1 id="my-use-case"><a class="anchor-link" href="#my-use-case" aria-label="Anchor link for: my-use-case">#</a>
My use-case</h1>
<p>At sentry, we deal a lot with native executable and debugging formats. The two
major formats there are PDB, the debugging format on Windows, and DWARF, the one
on Linux and Mac. The low level parsers for these formats can spew all kinds of
errors from almost every single function/accessor/iterator they provide.</p>
<p>However, we want to extract as much high-level information from these formats
as possible. Parsing these formats can fail for all kinds of different reasons.
For one, the compilers producing these formats might have bugs and they generate
some invalid records in these files. Or our parsers might be buggy, or they
might fail if they encounter some records that are not supported.</p>
<p>We want to be as resilient as possible. We shouldn’t fail processing the whole
debug information file, if only one record deep inside of it is faulty, for
whatever reason. We also want to get detailed errors so we know how to improve
our parsers to add support for new formats and fix bugs in it.</p>
<p>This is a hard problem to solve. Also, how resilient do we want to be? How do
we know if we can parse 99% of the file correctly, but just fail one single
entry, vs if we can only parse a single entry, and fail to parse 99% (the rest)
of the file.</p>
<p>How do we want to continue when we encounter an error? Do we skip the whole
record, or do we rather emit some kind of sentinel record? To give a concrete
example. If our job is to match an instruction address to a
<code>(Function, File, Line)</code> tuple, should we rather say "we have no record", or
should we rather return a record such as
<code>(function "foo", in "unknown file", on line "unknown")</code>?</p>
<p>I think we can extend the <code>ErrorCollector</code> pattern, and also give it a context
of what we try to process, or even give it the possibility to return a sentinel
value?</p>
<p>I’m struggling a bit with figuring out how this would actually work. Also, I’m
not sure we want to give the collector all that much choice, or if we should
just make these decisions ourselves. If we encounter an error parsing the
filename of a function record, just return "unknown", if we encounter an error
parsing the whole function record, just "skip and continue with the next one"?</p>
<p>The question is also, how specific should this be to our own use-case? We have
a higher-level library wrapping a low-level one. But we are not the only user
of this higher-level library. There are other users as well. How fine-grained
should the control be that we expose? How can we evolve this without forcing
breaking changes on other users?</p>
<h1 id="some-brainstorming"><a class="anchor-link" href="#some-brainstorming" aria-label="Anchor link for: some-brainstorming">#</a>
Some Brainstorming</h1>
<p>Lets sketch up a possible API for this:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">trait </span><span style="color:#399ee6;">ErrorCollector </span><span>{
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">record_error</span><span><E</span><span style="color:#61676ccc;">:</span><span> Error>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut </span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">error</span><span style="color:#61676ccc;">:</span><span> E) </span><span style="color:#61676ccc;">-> </span><span style="font-style:italic;color:#55b4d4;">Result</span><span><(), E></span><span style="color:#61676ccc;">;
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">AlwaysError </span><span>{}</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">impl </span><span>ErrorCollector </span><span style="color:#fa6e32;">for </span><span style="color:#399ee6;">AlwaysError </span><span>{
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">record_error</span><span><E</span><span style="color:#61676ccc;">:</span><span> Error>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut </span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">error</span><span style="color:#61676ccc;">:</span><span> E) </span><span style="color:#61676ccc;">-> </span><span style="font-style:italic;color:#55b4d4;">Result</span><span><(), E> {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Err</span><span>(error)
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">AlwaysIgnore </span><span>{}</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">impl </span><span>ErrorCollector </span><span style="color:#fa6e32;">for </span><span style="color:#399ee6;">AlwaysIgnore </span><span>{
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">record_error</span><span><E</span><span style="color:#61676ccc;">:</span><span> Error>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut </span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">error</span><span style="color:#61676ccc;">:</span><span> E) </span><span style="color:#61676ccc;">-> </span><span style="font-style:italic;color:#55b4d4;">Result</span><span><(), E> {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(())
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Default)]
</span><span style="color:#fa6e32;">struct </span><span style="color:#399ee6;">CollectErrors </span><span>{
</span><span> errors</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span><</span><span style="font-style:italic;color:#55b4d4;">Box</span><span><dyn Error>>;
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl </span><span style="color:#399ee6;">CollectErrors </span><span>{
</span><span> </span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">into_inner</span><span>(</span><span style="color:#ff8f40;">self</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="font-style:italic;color:#55b4d4;">Vec</span><span><</span><span style="font-style:italic;color:#55b4d4;">Box</span><span><dyn Error>> {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>errors
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">impl </span><span>ErrorCollector </span><span style="color:#fa6e32;">for </span><span style="color:#399ee6;">CollectErrors </span><span>{
</span><span> </span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">record_error</span><span><E</span><span style="color:#61676ccc;">:</span><span> Error>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut </span><span style="color:#ff8f40;">self</span><span>, </span><span style="color:#ff8f40;">error</span><span style="color:#61676ccc;">:</span><span> E) </span><span style="color:#61676ccc;">-> </span><span style="font-style:italic;color:#55b4d4;">Result</span><span><(), E> {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>errors</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">push</span><span>(</span><span style="font-style:italic;color:#55b4d4;">Box</span><span style="color:#ed9366;">::</span><span>new(error))</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(())
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">err_or_with</span><span><Collector, F, T, E>(</span><span style="color:#ff8f40;">collector</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> Collector, </span><span style="color:#ff8f40;">result</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">Result</span><span><T, E>, </span><span style="color:#ff8f40;">ok</span><span style="color:#61676ccc;">:</span><span> F)
</span><span style="color:#fa6e32;">where
</span><span> Collector</span><span style="color:#61676ccc;">:</span><span> ErrorCollector,
</span><span> F</span><span style="color:#61676ccc;">:</span><span> FnOnce() -> T
</span><span>{
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// hm, do we have such a combinator on `Result` already?
</span><span> </span><span style="color:#fa6e32;">match</span><span> result {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Err</span><span>(err) </span><span style="color:#ed9366;">=> </span><span>{
</span><span> </span><span style="color:#fa6e32;">match</span><span> collector</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">record_error</span><span>(err) {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(</span><span style="color:#ed9366;">_</span><span>) </span><span style="color:#ed9366;">=> </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(</span><span style="color:#f07171;">ok</span><span>())</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ed9366;">_ => _
</span><span> }
</span><span> }
</span><span> ok </span><span style="color:#ed9366;">=></span><span> ok</span><span style="color:#61676ccc;">,
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#f07171;">macro_rules! </span><span style="color:#399ee6;">err_or_continue </span><span>{
</span><span> (</span><span style="color:#ff8f40;">$collector</span><span style="color:#61676ccc;">:</span><span style="color:#fa6e32;">expr</span><span>, </span><span style="color:#ff8f40;">$expr</span><span style="color:#61676ccc;">:</span><span style="color:#fa6e32;">expr </span><span style="color:#ed9366;">$</span><span>(,)?) </span><span style="color:#ed9366;">=> </span><span>{
</span><span> std</span><span style="color:#ed9366;">::</span><span>result</span><span style="color:#ed9366;">::</span><span>Result</span><span style="color:#ed9366;">::</span><span>Err(err) </span><span style="color:#ed9366;">=> </span><span>{
</span><span> </span><span style="color:#fa6e32;">match</span><span> collector</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">record_error</span><span>(err) {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(</span><span style="color:#ed9366;">_</span><span>) </span><span style="color:#ed9366;">=> </span><span style="color:#fa6e32;">continue</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ed9366;">_ => _
</span><span> }
</span><span> }
</span><span> ok </span><span style="color:#ed9366;">=></span><span> ok</span><span style="color:#61676ccc;">,
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">process</span><span><C</span><span style="color:#61676ccc;">:</span><span> ErrorCollector>(</span><span style="color:#ff8f40;">collector</span><span style="color:#61676ccc;">:</span><span> C) </span><span style="color:#61676ccc;">-> </span><span style="font-style:italic;color:#55b4d4;">Result</span><span><</span><span style="font-style:italic;color:#55b4d4;">Vec</span><span><</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">str</span><span>>, Error> {
</span><span> </span><span style="color:#fa6e32;">let mut</span><span> names </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">vec!</span><span>[]</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">for</span><span> res </span><span style="color:#ed9366;">in</span><span> some_iter {
</span><span> </span><span style="color:#fa6e32;">let</span><span> res </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">err_or_continue</span><span>(collector</span><span style="color:#61676ccc;">,</span><span> res)</span><span style="color:#ed9366;">?</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="color:#fa6e32;">let</span><span> name </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">err_or_with</span><span>(collector</span><span style="color:#61676ccc;">,</span><span> res</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get_name</span><span>()</span><span style="color:#61676ccc;">, </span><span>|| </span><span style="color:#86b300;">"unknown"</span><span>)</span><span style="color:#ed9366;">?</span><span style="color:#61676ccc;">;
</span><span> }
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(names)
</span><span>}
</span></code></pre>
<p>I would really appreciate some feedback on this.</p>
The REAL mathematics of fat-loss2021-04-24T00:00:00+00:002021-04-24T00:00:00+00:00
Unknown
https://swatinem.de/blog/real-mafs-of-fatloss/<p>There is an interesting TED-talk called
<a href="https://www.youtube.com/watch?v=vuIlsN32WaE">The mathematics of weight loss</a>
which explains with chemical formulas and maths how bodyfat will be turned
into \( CO_2 \) and Water.</p>
<p>The presenter starts with the following chemical formula:</p>
<p>\[ C_{55}H_{104}O_6 + 78O_2 \rarr 55CO_2 + 52H_2O \]</p>
<p>And then goes on to calculate that 84% of <em>of the fat</em> is being exhaled
as \( CO_2 \). But that is not really what I am interested in, so lets do
the calculation again.</p>
<p>Using the <a href="https://en.wikipedia.org/wiki/Standard_atomic_weight#In_the_periodic_table">list of atomic weights</a> as presented in the video, we
get the following totals for the fat and our <em>total</em> \( CO_2 \) exhaled:</p>
<p>\[
55 * 12.011 + 104 * 1.008 + 6 * 15.999 = 861.431 \newline
55 * (12.011 + 2 * 15.999) = 2420.495 \newline
2420.495 / 861.431 \approx 2.8
\]</p>
<p>Now that is more interesting! So for every kilogram of fat that my body
metabolizes, I breath out 2.8 kg of \( CO_2 \).</p>
<p>Even though not every fat is the same, dietary fat is the equivalent to 9 kcal
when it comes to dietary labels, so I will just use that. So one kg of fat
equals roughly 9000 kcal.
Lets assume that my daily energy expenditure is around 2200 kcal, since I am
super sedentary right now; when active I would say that my energy expenditure
is more around the 3000 kcal range.</p>
<p>Anyhow. So, I would need between 3 to 4 days of fasting to metabolize one kg of
bodyfat.
Or fasting for 90 days straight to get rid of the roughly 25 kg of bodyfat that I
carry around with me.</p>
<p>When it comes to \( CO_2 \), that means that every day I exhale maybe 700-900 g.
Considering that a car that is halfway fun to drive emits around 180 g per km,
those numbers are put into a quite interesting relation.</p>
<p>We can round that to maybe 300 kg of \( CO_2 \) a year. Multiplying that up to
the approximately 8 mil inhabitants of Austria, that is roughly 2.4 mega tons.</p>
<p>Just by people breathing. Certainly less than
<a href="https://www.youtube.com/watch?v=6yv1qrsbUYo">1-2 giga byte</a>, lol.</p>
Force Unwind Tables2021-04-22T00:00:00+00:002021-04-22T00:00:00+00:00
Unknown
https://swatinem.de/blog/unwind-tables/<p>I was recently investigating a customer issue where I got stuck at the point
where I simply couldn’t seem to find any kind of unwind information for a
specific piece of code, while the rest of the code had such information.
My conclusion was that it could potentially be a problem in the customers build
chain. I found a similar problem looking at a truncated stack trace on Android.</p>
<p>To better understand how this can happen, I tried to reproduce this myself. So
lets build a program that has debug and unwind information for parts of the
program, but not others.</p>
<pre data-lang="c" style="background-color:#fafafa;color:#61676c;" class="language-c "><code class="language-c" data-lang="c"><span style="font-style:italic;color:#abb0b6;">// opt.c
</span><span style="color:#fa6e32;">void </span><span style="color:#f29718;">fn_without_debuginfo</span><span>()
</span><span>{
</span><span> </span><span style="color:#f29718;">print_backtrace</span><span>()</span><span style="color:#61676ccc;">;
</span><span>}
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// main.c
</span><span style="color:#fa6e32;">void </span><span style="color:#f29718;">indirect_call</span><span>()
</span><span>{
</span><span> </span><span style="color:#f07171;">printf</span><span>(</span><span style="color:#86b300;">"=== indirect call ===</span><span style="color:#4cbf99;">\n</span><span style="color:#86b300;">"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f29718;">fn_without_debuginfo</span><span>()</span><span style="color:#61676ccc;">;
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">void </span><span style="color:#f29718;">direct_call</span><span>()
</span><span>{
</span><span> </span><span style="color:#f07171;">printf</span><span>(</span><span style="color:#86b300;">"=== direct call ===</span><span style="color:#4cbf99;">\n</span><span style="color:#86b300;">"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f29718;">print_backtrace</span><span>()</span><span style="color:#61676ccc;">;
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">void </span><span style="color:#f29718;">main</span><span>()
</span><span>{
</span><span> </span><span style="color:#f29718;">direct_call</span><span>()</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f29718;">indirect_call</span><span>()</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>I split my program into two <em>compile units</em>, which I compile with different
flags before I link them into the final executable:</p>
<pre data-lang="make" style="background-color:#fafafa;color:#61676c;" class="language-make "><code class="language-make" data-lang="make"><span> gcc -c opt.c -Os -fno-asynchronous-unwind-tables -fno-optimize-sibling-calls
</span><span> gcc -c main.c -g
</span><span> gcc -o foo main.o opt.o -ldl -rdynamic
</span></code></pre>
<p>Here, <code>-fno-optimize-sibling-calls</code> avoids gcc being smart and actually
tail-call-optimizing my whole <code>fn_without_debuginfo</code> away. I also needed <code>-rdynamic</code>
when linking the executable in order to be able to symbolicate the stack trace.</p>
<p>The important piece here is <code>-fno-asynchronous-unwind-tables</code> which instructs
gcc to avoid creating the <code>.eh_frame</code> section which contains the unwind tables.
Also, the compile unit is built without debug information.</p>
<p>Running my executable yields the expected (broken) results:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>=== direct call ===
</span><span>0 - 0x0x55f80c10e1b7 (+0x0x11b7) - print_backtrace
</span><span>1 - 0x0x55f80c10e2e1 (+0x0x12e1) - direct_call
</span><span>2 - 0x0x55f80c10e2f2 (+0x0x12f2) - main
</span><span>3 - 0x0x7f94c555db25 (+0x0x27b25) - __libc_start_main
</span><span>4 - 0x0x55f80c10e0be (+0x0x10be) - _start
</span><span>
</span><span>=== indirect call ===
</span><span>0 - 0x0x55f80c10e1b7 (+0x0x11b7) - print_backtrace
</span><span>1 - 0x0x55f80c10e307 (+0x0x1307) - fn_without_debuginfo
</span></code></pre>
<p>The stack trace is truncated after my <code>fn_without_debuginfo</code>, exactly as
intended, except, it is not what you expect when using a tool such as sentry.</p>
<h1 id="lose-some-weight"><a class="anchor-link" href="#lose-some-weight" aria-label="Anchor link for: lose-some-weight">#</a>
Lose some Weight</h1>
<p>It turns out, all this information can lead to a bit of binary bloat. I have
seen reports that unwind information can take up as much as 10% of the resulting
binary size. So in cases where binary size matters, which it especially does for
mobile and embedded, there are a few tutorials online that advocate to just
completely remove the whole <code>.eh_frame</code> section.</p>
<h1 id="small-and-big-unwind-information"><a class="anchor-link" href="#small-and-big-unwind-information" aria-label="Anchor link for: small-and-big-unwind-information">#</a>
Small and Big Unwind Information</h1>
<p>Coming back to my example, I compiled part of my program with the <code>-g</code> switch
which turned on detailed debug information.</p>
<p>This detailed debug info contains all the details to allow debuggers to show
which local variables are defined, and where on the stack or in which registers
they can be found, among other information.</p>
<p>I will call this the <em>big</em> unwind information. And yes, they are <em>huge</em>,
sometimes 2x or even 10x the binary size.</p>
<p>These are however not needed at runtime, and it is best practice to split them
apart from the actual binary. They also contain sensitive details about the codebase ;-)</p>
<p>The <em>small</em> <code>.eh_frame</code> remains in the binary, as you might need it, such as to
create a stack trace as in our example. But you can also safely remove it in
some cases, more on that later.</p>
<hr />
<p>Another interesting case is statically linked libraries. How do you know with
what flags they were compiled? Do they have both kinds of unwind information or
not? How would I know when looking at <code>libc.a</code> that comes with the
<a href="https://archlinux.org/packages/community/x86_64/musl/">arch musl package</a>?</p>
<hr />
<p>To summarize, there is two different sets of unwind information, <em>big</em> and
<em>small</em>, one that is shipped together with the executable, and the other not.
And you can end up in situations where parts of your program have either one,
both or neither.</p>
<p>This is also the reason why, when working with sentry, it is important to upload
both the executable, and the accompanying debug file.</p>
<h1 id="rust"><a class="anchor-link" href="#rust" aria-label="Anchor link for: rust">#</a>
Rust</h1>
<p>I actually hit a similar problem recently while playing around with
<code>-C panic=abort</code> in Rust. I was surprised that the <code>panic!</code> backtrace was
truncated and <a href="https://github.com/rust-lang/rust/issues/81902">filed an issue</a> about it. Turns out that
Rust <em>by default</em> will avoid creating unwind info when compiled with
<code>panic=abort</code>. This option is frequently recommended to avoid binary bloat.
And sure enough, if you don’t want to ever catch a <code>panic!</code>, you don’t need it,
however you will also lose the ability to get a meaningful stack trace.</p>
<p>You can get that ability back by using <a href="https://doc.rust-lang.org/rustc/codegen-options/index.html#force-unwind-tables"><code>-C force-unwind-tables</code></a>.</p>
<p>In this case I must say that I disagree with Rusts behavior, as catching panics
and creating stack traces are two different concerns. And <code>panic=abort</code> is <em>very</em>
frequently used. So remember to use <code>-C force-unwind-tables</code> if you care about
stack traces but not about catching panics.</p>
Overcoming Bad Standards2021-04-18T00:00:00+00:002021-04-18T00:00:00+00:00
Unknown
https://swatinem.de/blog/standards/<p>Interestingly, in recent times I have read and watched a few articles/videos
about first principles thinking.
It might be a bit related to confirmation bias, since recently I have been
thinking about this topic myself, in the context of bad standards.
Or rather, standards that we <em>now</em> adhere to merely for the reason that it <em>is</em>
a standard, and we have <em>always done it that way</em>.</p>
<p>Anyhow, my train of thought started a few weeks ago when I saw a snippet of
<a href="https://www.youtube.com/watch?v=Nb2tebYAaOA">Jim Keller</a> talking about all
kinds of topics ranging from microprocessor design to, well, talking about
first principle thinking, also mentioning Elon Musk who is famous for his way
of first principle thinking. BTW, I am a bit of an Elon fanboy, and I do have a
framed picture of him smoking weed on my desk, lol.</p>
<hr />
<p>I have this talent to think outside of the box, to see flaws and improvements
where others don’t even think about them.
It’s a blessing because it makes me really good at what I do. It’s also a giant
curse because, well, it is quite depressing to see flaws everywhere. Especially
when you realize that there already exist perfectly good solutions, or at least
improvements. Which will never gain mass adoption because of societies resistance
to change.</p>
<p>I mean, there are valid reasons for resisting change. Lack of a <em>compelling</em>
incentive is a good one. Or maybe the possible improvement is not worth the
effort re-learning everything. I resist change myself in certain situations.
I would rather drive a 20 year old car with fewer digital components than
modern cars. I don’t want to exchange my physical light switches for home
automation. Because why should I? Things work fine the way they do now. And
the “more modern” solutions are complex, have more moving parts that can
potentially break, and are harder or even impossible for me to repair.</p>
<hr />
<p>What people way too seldom do is think about the constraints that influenced
existing solutions. When doing so, you may realize that often times, some
fundamental design decisions are just not valid anymore, since the constraints
have changed.</p>
<p>Take as an example the most widely used keyboard layout QWERTZ/QWERTY. About
ten years ago, I invested some time learning the
<a href="https://www.neo-layout.org/">Neo2 Layout</a>, which has quite some more logic
and intuition to it and is easier to learn. It is also said to allow faster
touch typing with less typos.
So what is wrong with QWERTZ? Well, it was developed a <em>long</em> time ago, and it
is widely believed that it was specifically engineered to work around the
limitations of mechanical typewriters of those times. Essentially slowing you
down to avoid mechanical parts from getting stuck to each other.
That is a technical limitation that simply does not exist anymore. We should be
free to use better solutions, so why don’t we?</p>
<p>There is a simple reason. Every device that we can buy has QWERTZ. Every
operating system has it, and might not have easy ways to use alternative
layouts. Touch based operating systems have it as onscreen keyboard. I mus
admit that I still use QWERTZ on Android, even though Neo2 support was added
a while back. Because of muscle memory. And yes, a lot of people learn that
layout in school or dedicated training courses. Even though Neo2 would be a lot
easier to learn as it has a much better learning curve.</p>
<p>So while the original technical limitations do not exist any more, we are in a
state of <em>we have always done it this way</em>. And I kind of agree. Having a bad
standard is better than having no standard at all.</p>
<hr />
<p>In terms of software, we have the same resistance to change. We still have
companies or projects that actively aim to support Windows XP, which is 20 years
old by now. Also from my own experience, I have nightmares about how bad C as
a language is, and also how bad all the tools and formats around it are. How
bad POSIX is. And its not just me. I read an increasing number of articles
talking about this.</p>
<p>I must admit, I haven’t spent much time researching the reasons for why things
are the way they are. But I would guess that a lot of that has to do with
technical limitations that existed half a century ago, but are not valid anymore.
See, most of the technology we use today stems from the 70s. And either keeping
backward compatibility, or adopting existing tools with minimal effort, are
among the guiding principles of new projects being done today.</p>
<p>I regularly daydream about how our world would look like today without these
limitations.</p>
<hr />
<p>Enough ranting for now.</p>
Finding loaded libraries on Linux2021-04-02T00:00:00+00:002021-04-02T00:00:00+00:00
Unknown
https://swatinem.de/blog/proc-maps/<p>Well, I am still very much procrastinating on writing the next blog post in my
<em>Relax and Unwind</em> series about writing a stack unwinder from scratch.</p>
<p>However, todays topic is a prerequisite for that. We will take a look at how
we can get a list of loaded libraries on Linux.</p>
<p>Usually, the platform will provide the necessary APIs to get a list of loaded
libraries directly from the dynamic loader that is responsible to load them.
On Windows, you have the <a href="https://docs.microsoft.com/en-us/windows/win32/api/tlhelp32/nf-tlhelp32-createtoolhelp32snapshot">Tool Help Library</a>, and on Apple
platforms you have some <a href="https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/dyld.3.html">dyld</a> functions available.</p>
<p>For better or for worse, on Linux there are no standardized userspace tools.
GNU/Linux has the <a href="https://man7.org/linux/man-pages/man3/dl_iterate_phdr.3.html"><code>dl_iterate_phdr</code></a> function for this purpose, but
that is notably not available on ancient Android systems (The
<a href="https://android.googlesource.com/platform/bionic/+/master/docs/status.md">Bionic Status</a> lists the API as available starting with API 21,
aka Android 5, released end of 2014).
So if you have to support ancient Android versions, which unfortunately we have
to, you will need to get the list of loaded libraries from somewhere else.</p>
<p>It seems the state of the art is to parse the memory map info from <code>/proc/XXX/maps</code>
and try to find the mapped elf files that way. It is what <a href="https://chromium.googlesource.com/breakpad/breakpad/+/master/src/client/linux/minidump_writer/linux_dumper.cc#541">Breakpad</a>
<a href="https://chromium.googlesource.com/breakpad/breakpad/+/master/src/processor/proc_maps_linux.cc#29">in two places</a>,
<a href="https://chromium.googlesource.com/crashpad/crashpad/+/refs/heads/master/util/linux/memory_map.cc#57">Crashpad</a>, <a href="https://cs.android.com/android/platform/superproject/+/master:system/libprocinfo/include/procinfo/process_map.h;drc=master;l=92">Androids libunwindstack</a> and
<a href="https://github.com/llvm/llvm-project/blob/62ec4ac90738a5f2d209ed28c822223e58aaaeb7/lldb/source/Plugins/Process/Utility/LinuxProcMaps.cpp#L26">LLDB</a> do. I think another reason these tools do it that way is
because some of them are <em>outside observers</em>, that just can’t query the dynamic
loader from inside the process.</p>
<p>This approach is also what I implemented for <a href="https://github.com/getsentry/sentry-native/blob/aee5dc1a55dee01477f20016c197084a501db0de/src/modulefinder/sentry_modulefinder_linux.c#L28">sentry-native</a> as well.
However that implementation was rather conservative and did not catch all the
cases, most notably <code>.so</code> files loaded directly from inside Android <code>.apk</code> packages.</p>
<p>So I re-thought the approach to support more cases, and want to document my
approach, and a few interesting cases that I found here.</p>
<h1 id="the-proc-x-maps-format"><a class="anchor-link" href="#the-proc-x-maps-format" aria-label="Anchor link for: the-proc-x-maps-format">#</a>
The <code>/proc/X/maps</code> format</h1>
<p>The format for these <code>/proc/X/maps</code> is documented in a <a href="https://man7.org/linux/man-pages/man5/proc.5.html">manpage here</a>.
It includes the start/end of the virtual address space covered by the mapping,
as well as permission information, and information about the inode (file) it is
coming from, and the offset inside that file.</p>
<p>On Linux, all the executables and libraries have the ELF format. The was recently
a really great post on the <a href="https://blog.cloudflare.com/how-to-execute-an-object-file-part-1/">Cloudflare Blog</a> that explained the ELF
format, and how a loader parses and processes it in great detail.</p>
<p>There are cases when a library uses just one mapping, but most of the time, it
is split into two or more mappings. Usually a read-only mapping that includes
the ELF headers and some metadata, and an executable mapping that holds the
actual program code.</p>
<p>On my Linux system, I saw up to 6 mappings for a single file:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>7f8cd3467000-7f8cd3475000 r--p 00000000 00:1c 7597971 /usr/lib/libcurl.so.4.7.0
</span><span>7f8cd3475000-7f8cd34da000 r-xp 0000e000 00:1c 7597971 /usr/lib/libcurl.so.4.7.0
</span><span>7f8cd34da000-7f8cd34f6000 r--p 00073000 00:1c 7597971 /usr/lib/libcurl.so.4.7.0
</span><span>7f8cd34f6000-7f8cd34f7000 ---p 0008f000 00:1c 7597971 /usr/lib/libcurl.so.4.7.0
</span><span>7f8cd34f7000-7f8cd34fa000 r--p 0008f000 00:1c 7597971 /usr/lib/libcurl.so.4.7.0
</span><span>7f8cd34fa000-7f8cd34fc000 rw-p 00092000 00:1c 7597971 /usr/lib/libcurl.so.4.7.0
</span></code></pre>
<p>The interesting case here is that the 4th mapping is not readable, and basically
creates a gap in the address space.</p>
<hr />
<p>Another interesting case I found on Android:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>737b5570d000-737b5570e000 r--p 00000000 07:70 34 /apex/com.android.runtime/lib64/bionic/libdl.so
</span><span>737b5570e000-737b5570f000 r-xp 00000000 07:70 34 /apex/com.android.runtime/lib64/bionic/libdl.so
</span><span>737b5570f000-737b55710000 r--p 00000000 07:70 34 /apex/com.android.runtime/lib64/bionic/libdl.so
</span></code></pre>
<p>Here, the same file at the same offset is mapped onto different address ranges.</p>
<hr />
<p>The way that the Android loader loads libraries directly from apks is also interesting.
Compare the following two mappings, which load the exact same libraries, once extracted to disk,
once directly from the apk:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>77a85dbda000-77a85dbdd000 r-xp 00000000 fd:05 40992 /data/app/x/y/lib/x86_64/libsentry-android.so
</span><span>77a85dbdd000-77a85dbde000 ---p 00000000 00:00 0
</span><span>77a85dbde000-77a85dbdf000 r--p 00003000 fd:05 40992 /data/app/x/y/lib/x86_64/libsentry-android.so
</span><span>77a85dc15000-77a85dd6c000 r-xp 00000000 fd:05 40991 /data/app/x/y/lib/x86_64/libsentry.so
</span><span>77a85dd6c000-77a85dd6d000 ---p 00000000 00:00 0
</span><span>77a85dd6d000-77a85dd79000 r--p 00157000 fd:05 40991 /data/app/x/y/lib/x86_64/libsentry.so
</span><span>77a85dd79000-77a85dd7a000 rw-p 00163000 fd:05 40991 /data/app/x/y/lib/x86_64/libsentry.so
</span></code></pre>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>77a85dbf0000-77a85dbf3000 r-xp 00001000 fd:05 40977 /data/app/x/y/base.apk
</span><span>77a85dbf3000-77a85dbf4000 ---p 00000000 00:00 0
</span><span>77a85dbf4000-77a85dbf5000 r--p 00004000 fd:05 40977 /data/app/x/y/base.apk
</span><span>77a85dc15000-77a85dd6c000 r-xp 00006000 fd:05 40977 /data/app/x/y/base.apk
</span><span>77a85dd6c000-77a85dd6d000 ---p 00000000 00:00 0
</span><span>77a85dd6d000-77a85dd79000 r--p 0015d000 fd:05 40977 /data/app/x/y/base.apk
</span><span>77a85dd79000-77a85dd7a000 rw-p 00169000 fd:05 40977 /data/app/x/y/base.apk
</span></code></pre>
<p>The mappings are basically the same, just that in the case of the <code>base.apk</code>,
the file offsets are different. Also, the Android loader inserts a non-readable
gap in between.</p>
<h1 id="so-how-do-we-get-the-library-list-from-there"><a class="anchor-link" href="#so-how-do-we-get-the-library-list-from-there" aria-label="Anchor link for: so-how-do-we-get-the-library-list-from-there">#</a>
So how do we get the library list from there?</h1>
<p>So far, the sentry-native modulefinder was a bit too conservative. Because of
concerns reading arbitrary memory, we mmap-ed the file into memory and were
trying to extract ELF headers from there, but that approach did not work with
libraries loaded directly from apk files. Plus, there were some issues related
to non-contiguous mappings and double mappings as we have seen above.</p>
<p>My new approach is to keep track of <em>readable</em> mappings that I have seen so far,
keeping track of their file offsets and gaps in between them. For each readable
mapping, I am looking for the magic ELF signature. If I find one, I process the
previously saved mappings, also taking care of possible duplicates.</p>
<p>A possible issue is trying to read arbitrary memory. I think I’m pretty safe as
I only consider readable mappings, but one improvement would be to use
<a href="https://man7.org/linux/man-pages/man2/process_vm_readv.2.html"><code>process_vm_readv</code></a> for this, but I have also seen problems with
using that on Android.</p>
<p>Another unanswered question is how to correctly deal with mappings that have gaps
in them, or that appear multiple times. Information embedded in the ELF file
might instruct the loader to load the executable code at a specific offset to
the ELF header in RAM, which might be different to the offset on disk. This very
much depends on how we use this information to post-process crash reports.</p>
<hr />
<p>All in all, this is a pretty much non-trivial problem, and I am far from the only
one struggling with it. It seems that the <code>libunwindstack</code> that I mentioned above,
and which we vendor as our Unwinder on Android has issues itself, as it is unable
to correctly create a stacktrace that involves libraries loaded from an <code>.apk</code>.
We have also seen some breakpad tools getting this wrong and creating minidumps
with duplicated/invalid mappings that fail post-processing. It might be quite
some work to investigate those failures and patch the relevant external dependencies.</p>
<p>And through all this, I wish I could implement all of this in a sane language
like Rust, and share that code across different steps of the pipeline. Oh well…</p>
Relax and Unwind2021-02-20T00:00:00+00:002021-02-20T00:00:00+00:00
Unknown
https://swatinem.de/blog/unwind-2/<p>Last time, we were looking at how calls are actually implemented in native
assembly code, and what kind of instructions the CPU is executing and what
registers are involved.</p>
<p>We do know that the <code>rsp</code> register (the stack pointer) points to the <em>top</em> of
the stack, and we learned that the <code>call</code> instruction pushes the <em>next</em>
instruction onto the stack, which then becomes the <em>bottom</em> of the new stack
frame, and the <code>ret</code> instruction jumps to whatever instruction address is on
top of the stack. In between though, the stack pointer can move as the function
will push and pop things onto and from the stack. So the position of the
return address relative to the current stack pointer will change. Most of the
times statically, but sometimes even dynamically, but I will come back to that
in a later part of this series.</p>
<p>Anyhow, it seems we are a bit stuck. We only know where the current stack frame
ends (the stack pointer), we don’t really know where it began (where the return
address is).</p>
<hr />
<p>So lets go back a bit and look at the special purpose registers again.
One of them is called <code>rbp</code>, which stands for <em>base pointer</em>. It turns out that
in older times, this base pointer served exactly this purpose. Lets look at how
it is used by passing the <code>-Cforce-frame-pointers=yes</code> option to our Rust
compiler. This is the output that the <a href="https://godbolt.org/z/WYq7nn">Compiler Explorer</a>
gives me:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>example::never:
</span><span> push rbp
</span><span> mov rbp, rsp
</span><span> call example::gonna
</span><span> pop rbp
</span><span> ret
</span><span>
</span><span>example::gonna:
</span><span> push rbp
</span><span> mov rbp, rsp
</span><span> call example::give
</span><span> pop rbp
</span><span> ret
</span><span>
</span><span>example::give:
</span><span> push rbp
</span><span> mov rbp, rsp
</span><span> pop rbp
</span><span> ret
</span></code></pre>
<p>I will also try to visualize how the stack actually looks like in that case.
Also note that I drew the stack top to bottom, as it actually grows high to low.
Anyhow, what we see is that <code>rbp</code> points to the position in the stack that has
the parents base pointer. And exactly <em>below</em> that (actually <em>above</em> when
talking of addresses) is the return address.</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span> ┌────────────────────────┐
</span><span> │ return addr of parent │
</span><span> ↱│ bp of parent │
</span><span> │├────────────────────────┤
</span><span> ││ return addr `never` │
</span><span> └│ bp of `never` │↰
</span><span> ├────────────────────────┤│
</span><span> │ return addr `gonna` ││
</span><span>rbp → │ bp of `gonna` │┘
</span><span> │ ... locals of `give` │
</span><span>rsp → │ ... │
</span><span> └────────────────────────┘
</span></code></pre>
<h2 id="walking-the-stack"><a class="anchor-link" href="#walking-the-stack" aria-label="Anchor link for: walking-the-stack">#</a>
Walking the Stack</h2>
<p>Alright, lets try to actually walk that stack of a live program.
First, we need to actually read the <code>rsp</code>, <code>rbp</code> (and <code>rip</code>) registers. For
this I will play a bit with unsafe Rust features, namely the nightly-only <code>asm</code>
feature.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span> </span><span style="color:#fa6e32;">let mut</span><span> sp</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let mut</span><span> bp</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">let mut</span><span> ip</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">unsafe </span><span>{
</span><span> </span><span style="color:#f07171;">asm!</span><span>(
</span><span> </span><span style="color:#86b300;">"mov {sp}, rsp"</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#86b300;">"mov {bp}, rbp"</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#86b300;">"lea {ip}, [rip]"</span><span style="color:#61676ccc;">,
</span><span> sp </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">out</span><span>(reg) sp</span><span style="color:#61676ccc;">,
</span><span> bp </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">out</span><span>(reg) bp</span><span style="color:#61676ccc;">,
</span><span> ip </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">out</span><span>(reg) ip</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#f07171;">options</span><span>(nomem</span><span style="color:#61676ccc;">,</span><span> nostack)</span><span style="color:#61676ccc;">,
</span><span> )</span><span style="color:#61676ccc;">;
</span><span> }
</span></code></pre>
<p>Here we have a snippet of inline assembly which just moves the current values
of the registers into local Rust variables. Reading <code>rip</code> is a bit more involved,
but that is a completely different story.</p>
<p>Now that we have the base pointer, we can use it to walk up the stack using this
simple code:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span> </span><span style="color:#fa6e32;">for </span><span style="color:#ed9366;">_ in </span><span style="color:#ff8f40;">0</span><span style="color:#ed9366;">..=</span><span style="color:#ff8f40;">5 </span><span>{
</span><span> ip </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{ (bp </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const usize</span><span>)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">offset</span><span>(</span><span style="color:#ff8f40;">1</span><span>)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">read</span><span>() }</span><span style="color:#61676ccc;">;
</span><span> bp </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">unsafe </span><span>{ (bp </span><span style="color:#ed9366;">as </span><span style="color:#fa6e32;">*const usize</span><span>)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">offset</span><span>(</span><span style="color:#ff8f40;">0</span><span>)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">read</span><span>() }</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">println!</span><span>(</span><span style="color:#86b300;">"bp: {:#018x}; ip: {:#018x}"</span><span style="color:#61676ccc;">,</span><span> bp</span><span style="color:#61676ccc;">,</span><span> ip)</span><span style="color:#61676ccc;">;
</span><span> }
</span></code></pre>
<p>That is basically all there is to walking up the stack, <strong>if</strong> you have a valid
base pointer to work with.</p>
<p>I double-checked my output and it is the same as in produced by <code>std::backtrace</code>.
You can run the full example in the <a href="https://godbolt.org/z/ocos1j">compiler explorer</a>
as well.</p>
<p>While working on this, I had a super hard time and couldn’t get this to work on
Windows. In the end, I think that I hit a miscompilation bug in Rust.
I filed <a href="https://github.com/rust-lang/rust/issues/82333">a bug</a> about it.</p>
<p>Instead of writing the <em>current</em> stack pointer into the base pointer register
on function start, and growing the stack <em>afterward</em>, the code generated on
windows first adjusts the stack pointer, and then tries to write the base
pointer, adjusting for the offset again, though it seems that it messes up that
offset sometimes.</p>
<p><strong>Update</strong></p>
<p>A comment on the issue I filed said that on Windows, the base pointer might as
well be offset. And that offset can be looked up in the Unwind Info for that
particular DLL/function. We will take a look at how to read that unwind info next
time.</p>
<h2 id="what-if-there-is-no-base-pointer"><a class="anchor-link" href="#what-if-there-is-no-base-pointer" aria-label="Anchor link for: what-if-there-is-no-base-pointer">#</a>
What if there is no base pointer</h2>
<p>In my previous post, the examples that I showed did not use any base pointer.
And also today I had to use an explicit <code>-Cforce-frame-pointers=yes</code> compiler
flag to make it use a frame pointer. This shows clearly that you don’t really
<em>need</em> to maintain a base pointer when running the program.</p>
<p>So Rust does not maintain one by default. And I remember the oblivious
<code>-fomit-frame-pointer</code> compiler options back when I didn’t know what it actually
does. So now we know. It avoids a few instructions per function call, saves a
bit of space on the stack, and frees up the <code>rbp</code> register. Which I would
argue is the main reason, since general purpose registers are really scarce on
x64, it makes sense to have one more available for use.</p>
<p>But from a stack unwinding perspective, it complicates things, since we don’t
have all the information we need to unwind right there in the registers or on
the stack.</p>
<p>Essentially all we need is the position of the return address relative to the
stack pointer (which still is a special purpose register). And we will look at
where to get this information the next time.</p>
Relax and Unwind2021-02-15T00:00:00+00:002021-02-15T00:00:00+00:00
Unknown
https://swatinem.de/blog/unwind-1/<p>I have been working on the the <a href="https://sentry.io">Sentry</a> Native Team for a bit
more than a year now. One of the most important things that helps Engineers to
find the cause of an Error is a stack trace. This is also a really challenging
topic, especially for native code.</p>
<p>In my opinion, the best way to learn and understand a complex topic, and to
appreciate the existing solutions, is to try to implement it yourself.
Along that way, I will try to implement my own stack unwinder in order to learn
more about how a native call stack really looks like and how to extract a stack
trace. This is just my personal learning experiment and is not related to any
specific things I do in my day job.</p>
<h2 id="a-simple-stacktrace"><a class="anchor-link" href="#a-simple-stacktrace" aria-label="Anchor link for: a-simple-stacktrace">#</a>
A simple Stacktrace</h2>
<p>Lets start by looking at a stacktrace from another language first.
Take this simple example in JS:</p>
<pre data-lang="js" style="background-color:#fafafa;color:#61676c;" class="language-js "><code class="language-js" data-lang="js"><span style="color:#fa6e32;">function </span><span style="color:#f29718;">never</span><span>() {
</span><span> </span><span style="color:#f29718;">gonna</span><span>()</span><span style="color:#61676ccc;">;
</span><span>}
</span><span style="color:#fa6e32;">function </span><span style="color:#f29718;">gonna</span><span>() {
</span><span> </span><span style="color:#f29718;">give</span><span>()</span><span style="color:#61676ccc;">;
</span><span>}
</span><span style="color:#fa6e32;">function </span><span style="color:#f29718;">give</span><span>() {
</span><span> </span><span style="color:#f29718;">you</span><span>()</span><span style="color:#61676ccc;">;
</span><span>}
</span><span style="color:#fa6e32;">function </span><span style="color:#f29718;">you</span><span>() {
</span><span> </span><span style="color:#f29718;">up</span><span>()</span><span style="color:#61676ccc;">;
</span><span>}
</span><span style="color:#fa6e32;">function </span><span style="color:#f29718;">up</span><span>() {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">console</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">log</span><span>(</span><span style="color:#ed9366;">new </span><span style="color:#399ee6;">Error</span><span>()</span><span style="color:#ed9366;">.</span><span>stack)</span><span style="color:#61676ccc;">;
</span><span>}
</span><span style="color:#f29718;">never</span><span>()</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>This will give me the following stacktrace when running in <code>node</code>:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>Error
</span><span> at up ([...]rickroll.js:14:15)
</span><span> at you ([...]rickroll.js:11:3)
</span><span> at give ([...]rickroll.js:8:3)
</span><span> at gonna ([...]rickroll.js:5:3)
</span><span> at never ([...]rickroll.js:2:3)
</span><span> at Object.<anonymous> ([...]rickroll.js:16:1)
</span><span> at Module._compile (node:internal/modules/cjs/loader:1102:14)
</span><span> at Object.Module._extensions..js (node:internal/modules/cjs/loader:1131:10)
</span><span> at Module.load (node:internal/modules/cjs/loader:967:32)
</span><span> at Function.Module._load (node:internal/modules/cjs/loader:807:14)
</span></code></pre>
<p>As you see, it gives the call stack top to bottom (the order is a language
ecosystem convention), and it also includes a few frames that are outside of my
code.</p>
<p>In this case, node is the <em>runtime environment</em> that executes the javascript code.
Running this so called <em>managed code</em> means that node will keep track of what
happens. This tracking comes with some overhead, but as shown it does provide
us with a few benefits.</p>
<p>Native code is very different. Depending on your definition of <em>Runtime</em>, there
is nothing that manages or drives your code, and the code itself is usually
tuned for maximum performance, so it will avoid to do anything that is not
strictly necessary to achieve its goals.</p>
<p>Since we have no runtime that we can just ask to give us a stacktrace, we have
to create one ourselves.</p>
<h2 id="native-instructions-registers-and-stack"><a class="anchor-link" href="#native-instructions-registers-and-stack" aria-label="Anchor link for: native-instructions-registers-and-stack">#</a>
Native Instructions, Registers and Stack</h2>
<p>In order to understand what a native call stack is, we have to first learn how
the processor in our computer actually works and executes code.</p>
<p>We will take a look at the actual assembly code and at the x64
Instruction Set Architectures (ISA) to figure out what it does.</p>
<p>Lets start by doing some quick mafs in Rust:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">one</span><span>() </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">usize </span><span>{
</span><span> </span><span style="color:#ff8f40;">1
</span><span>}
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">two</span><span>() </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">usize </span><span>{
</span><span> </span><span style="color:#ff8f40;">2
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">plus</span><span>(</span><span style="color:#ff8f40;">a</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span>, </span><span style="color:#ff8f40;">b</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">usize </span><span>{
</span><span> a</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">wrapping_add</span><span>(b)
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">minus</span><span>(</span><span style="color:#ff8f40;">a</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span>, </span><span style="color:#ff8f40;">b</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">usize</span><span>) </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">usize </span><span>{
</span><span> a</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">wrapping_sub</span><span>(b)
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">quick_mafs</span><span>() </span><span style="color:#61676ccc;">-> </span><span style="color:#fa6e32;">usize </span><span>{
</span><span> </span><span style="color:#fa6e32;">let</span><span> four </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">plus</span><span>(</span><span style="color:#f07171;">two</span><span>()</span><span style="color:#61676ccc;">, </span><span style="color:#f07171;">two</span><span>())</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#f07171;">minus</span><span>(four</span><span style="color:#61676ccc;">, </span><span style="color:#f07171;">one</span><span>())
</span><span>}
</span></code></pre>
<p>You can look at this in detail in the <a href="https://godbolt.org/z/rjGr8W">Compiler Explorer</a></p>
<p>Lets look at a small snippet of assembly that is created for these functions:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>
</span><span>example::two:
</span><span> mov eax, 2 // 1
</span><span> ret // 2
</span><span>
</span><span>example::quick_mafs:
</span><span> sub rsp, 40 // 3
</span><span> call example::two // 4
</span><span> mov qword ptr [rsp + 32], rax // 5
</span><span> call example::two // 6
</span><span> mov qword ptr [rsp + 24], rax // 7
</span></code></pre>
<p>Our function call, and the return from that function are clearly visible, they
correspond to the <code>call</code> and the <code>ret</code> instruction respectively.
We see some other things as well which need a bit more research to understand.</p>
<p>The Windows Documentation of the <a href="https://swatinem.de/blog/unwind-1/x64-arch">x64 Architecture</a> is a really good
resource to learn. I also very much enjoyed previous years <a href="https://swatinem.de/blog/unwind-1/aoc">Advent of Code</a>
which introduced its own instruction set and guided you along implementing a
virtual processor around that.</p>
<p>Each architecture has its own calling conventions, and the document says that
the <code>return value is returned in the rax register</code>, which is what we see above.
All that our <code>two</code> function does is write its return value (1) to that register
before returning (2). The call then moves that return value someplace else (5)
to deal with it later.</p>
<p>The processor executes the instructions one ofter the other, but this example
shows that it does need to jump around in the code a bit. In particular, this
is the sequence in which the instructions are executed:
<code>3, 4, 1, 2, 5, 6, 1, 2, 7</code>. We have two calls to <code>two</code>, and we execute the
instructions at the addresses <code>1</code> and <code>2</code> twice, but on the first go we jump
back to <code>5</code>, while we continue at <code>7</code> the second time. How does the processor
know to do that?</p>
<p>Lets take a look at the documentation for the <code>call</code> and <code>ret</code> instructions.
Again, the Windows Documentation for the <a href="https://swatinem.de/blog/unwind-1/x86-instr">x86 Instructions</a> helps.</p>
<ul>
<li>The <code>call</code> instruction pushes the return address onto the stack then jumps to the destination.</li>
<li>The <code>ret</code> instruction pops and jumps to the return address on the stack.</li>
</ul>
<p>Okay, so we have to learn about something called the stack, and jumps.</p>
<p>We heard the term <code>register</code> already. These registers hold the values that the
processor currently works with. They are extremely fast, but there is only a
very limited number of them, depending on the architecture. Some of them are
<em>special purpose</em> registers. We already learned about <code>rax</code>, the <em>Accumulator</em>
register which is used for return values. We also see another one in the example
above, <code>rsp</code>, the <em>Stack Pointer</em> register. It points to the top of the stack,
and is changing when you <code>push</code> or <code>pop</code> things to and from the stack.
It is kind of like <code>Array#length</code> in JS, which also changes when you call
<code>Array#.push</code> and <code>Array#.pop</code>.</p>
<p>Another special register is the <em>Instruction Pointer</em>, <code>rip</code>, which holds the
address of the <em>next</em> instruction in line after the one currently being executed.</p>
<p>Alright, so we know about the stack pointer and the instruction pointer, and we
know what <code>call</code> and <code>ret</code> do, so lets try to visualize this.</p>
<ul>
<li>(4) <code>call</code>: The <em>next</em> instruction is <code>5</code>, which we push onto the stack and
then <em>change</em> to <code>1</code>. (Stack: <code>[5]</code>, <code>rip</code>: 1)</li>
<li>(2) <code>ret</code>: Pop <code>5</code> from the stack and overwrite rip. (Stack: <code>[]</code>, <code>rip</code>: 5)</li>
<li>(6) <code>call</code>: Push, overwrite rip. (Stack: <code>[7]</code>, <code>rip</code>: 1)</li>
<li>(2) <code>ret</code>: Pop, overwrite rip. (Stack: <code>[]</code>, <code>rip</code>: 7)</li>
<li>(7) ...</li>
</ul>
<hr />
<p>So we can now kind of follow how a processor executes native instructions, and
what happens to the stack and the instruction pointer during that process.</p>
<p>The key takeaway here is that the processor is quite dumb and just executes
instructions one after the other, and it follows whatever the next instruction
(<code>rip</code>) is. Other things to note are that all the stack manipulations
(<code>push</code> and <code>pop</code>) are balanced. Also, the stack contains the <em>next</em> instruction
where we have to go to, not really where we came from.
However, the <code>call</code> instruction always pushes the current <code>rip</code>, which happens
to be the instruction after the <code>call</code>. Which makes things a bit simpler since
the instruction where we came from is in most cases one place before our return
address.</p>
<p>So we are done, right? Well not quite. As we see in instruction <code>3</code>, the code
itself can manipulate the stack pointer in any way it wants, and we don’t
really know where on the stack our return address is. Figuring that out will be
a story for another day.</p>
Investing2021-01-30T00:00:00+00:002021-01-30T00:00:00+00:00
Unknown
https://swatinem.de/blog/investing/<p>It is only January, but the 2021 bullshit bingo is in full flow already.</p>
<p>So apparently an Internet flashmob is buying tons of stock and thereby
bankrupting big funds that have allegedly been manipulating markets for a long
time.</p>
<p>Bear in mind, I’m just an average person who happens to own some stock. I have
no idea how the financial markets really work. And this is also a key point that
I want to make here.</p>
<p>So the market is basically open to anyone, with some hurdles, more on which later.
But not every player is created equal. There is a lot of people like me, who
have a regular job, and some leftover money that they want to <em>invest</em> for one
reason or another. Like me, I would argue these investors do not want to spend
too much time on this, and rarely make any transaction. Unlike me, I might sadly
say that most people are way too gullible and will follow unsound advice, some
of the time from financial advisors or their bank, which might not be in their
best self interest.</p>
<p>And then there is people who live and breathe financial instruments, who not
only day-trade, but for which every second counts, who have a lot more
instruments to chose from.</p>
<p>The whole problem in my opinion is that the financial system is built in a way
to benefit those who know how to play the game, in favor of the average person,
like me.</p>
<h1 id="ideal-market"><a class="anchor-link" href="#ideal-market" aria-label="Anchor link for: ideal-market">#</a>
Ideal Market</h1>
<p>The ideal stock market is actually super simple, and everyone can understand it.
Imagine a room full of people, divided in two groups. One group wants to <em>sell</em>
stock and <em>ask</em>ing a certain price for it. The other group want to <em>buy</em> the
stock and <em>bid</em>s a price for it. Usually, there is a gap in between.</p>
<ul>
<li>A wants to sell for 20€</li>
<li>B wants to buy for 10€</li>
</ul>
<p>Now we are at an impasse. One of the two has to move in order to actually form
a transaction. Or maybe another one comes along that says <em>I buy at any price</em>,
and gets the share at 20€. Or the other way around, such as <em>I sell at any price</em>.</p>
<blockquote>
<p>Tip of the day: <strong>Always</strong> set a limit (explicit price)! The default for lots
of trading platforms is set to buy/sell at <em>any</em> price, which is not what you
want!</p>
</blockquote>
<h1 id="shorting-and-other-instruments"><a class="anchor-link" href="#shorting-and-other-instruments" aria-label="Anchor link for: shorting-and-other-instruments">#</a>
Shorting and other instruments</h1>
<p>The above is super simple. I would argue that anyone can understand it. This is
also what the <em>average person</em> wants to do when they think about stock market
trading.</p>
<p>There is a lot more financial products, and possibilities out there. Some are
being sold to the uninformed average person, which in my opinion is by itself
fraud, but more on that later.</p>
<p>One of those instruments is called “short”, no idea why really, or “Leerverkauf”
in german.</p>
<p>There are two variants, and I’m not even sure if the second is something
completely different?</p>
<p>In one variant, I get someone to lend me a share, which I have to give back
tomorrow. I sell that share immediately for 10€, in the hopes that I can buy it
back at 5€ tomorrow before I return it. And I pocket the difference as my profit.</p>
<p>The problem now is, what if the price at which people are willing to sell
actually increases? Well, I have to return the stock that I borrowed, so I might
be forced to buy <em>at any price</em>, so I lose money.</p>
<p>In the other variant, I sign a contract with you to sell you a share <em>tomorrow</em>
for 10€. I don’t own that share yet. No problem, I can just buy it myself
tomorrow for 5€ and pocket the difference. Except when the price tomorrow is
higher than we agreed on, in which case I lose.
(BTW, is this what is called a <em>future</em> in financial circles? I don’t know, really)</p>
<hr />
<p>So, with this play, one can actually make a profit by betting on a falling price.
I would argue that this is something that the <em>average person</em> would not want to
do, for one. And also, me as an average person who owns shares, I wouldn’t even
know <em>how</em> to do that. Thus, I just assume this is an instrument reserved for
the expert players.</p>
<h1 id="valuation"><a class="anchor-link" href="#valuation" aria-label="Anchor link for: valuation">#</a>
Valuation</h1>
<p>There is another interesting thing to note here. There is no fixed <em>price</em> for
a share. It is not something you buy in a supermarket that has the same price tag
today as it had yesterday. The real price, or <em>value</em> only exists for a split
second at the moment of a completed transaction. Because at that moment, it was
<em>worth</em> that amount of money to the buyer.</p>
<p>Lets take real estate as an example. Prices right now are <em>crazy</em>, ranging from
400 to 700k for an average sized apartment. But are they actually <em>worth</em> that
much? This is really up to the buyer. Either someone comes along that is willing
to pay that much. Or the seller adjusts the price. Or simply waits long enough.</p>
<p>And here comes a bit of a problem. Real estate companies can build an apartment
for, lets say, 50k€, and put a price tag of 500k€ on it. And then they just sit
there, waiting for a potential buyer to come bye. They are under no pressure at
all to sell this apartment now, for cheaper. They are willing to wait 10 years,
with the apartment just standing there, empty, until finally a buyer comes by.</p>
<p>Anyway, I digress. The point is, usually the value of something is determined by
a transaction. But maybe you don’t even want to sell the real estate you own,
because you use it yourself. So what is its current market value? How would you
know, if actually selling it is the only way to know?</p>
<p>Well, you can get a hopefully independent and neutral third party to look at the apartment and
make an assessment, which is basically a good guess at the price someone would
be willing to pay for it, considering the quality and demand and so on.</p>
<p>So, how neutral are these people that make assessments? There is a joke about that:</p>
<blockquote>
<p>Billionaire wants to avoid paying X€ in taxes.</p>
<p>Billionaire has an artist friend, asks them to paint a picture.</p>
<p>Billionaire has a friend who assesses artworks. The picture has a value in the
ballpark of X€.</p>
<p>Billionaire donates that piece of art to charity in a tax deductable way.</p>
<p>Problem solved.</p>
</blockquote>
<p>Anyhow, this was a small digression just to highlight that maybe some things are
not really worth whatever people say that they are. Just to keep in mind.</p>
<h1 id="virtual-goods"><a class="anchor-link" href="#virtual-goods" aria-label="Anchor link for: virtual-goods">#</a>
Virtual goods</h1>
<p>Coming back to the concept of selling shares that you don’t own. One of the
scams that has blown up now is that apparently more shares have been lent out
than exist in the first place. I have no idea how that would be even possible,
or legal for that matter.</p>
<p>When I lend you my car, I am unable to use it myself until you return it. Also,
I will notice if you return a different car to me. This concept of the physical
world apparently does not apply to shares, which only exist as digital things
on a computer.</p>
<h1 id="money-creation"><a class="anchor-link" href="#money-creation" aria-label="Anchor link for: money-creation">#</a>
Money creation</h1>
<p>This is also a really big misconception that average people have about money.
It is still a widely believed myth that debt is actually one person lending
another person some money for a time, and expecting some interest payments in
return. There is no other person that loses physical access to their money when
you borrow it in the form of a loan. It is a bank that just creates this money
as a number on the computer out of thin air.</p>
<h1 id="banksters"><a class="anchor-link" href="#banksters" aria-label="Anchor link for: banksters">#</a>
Banksters</h1>
<p>Speaking of lending, usually you would expect some form of compensation. Like
an interest payment, or some fee. Well, according to things that I recently read,
lending some shares comes with some hefty fees.
I do own shares, but I haven’t knowingly
lend them to anyone, and I certainly haven’t collected any fees. And I also
didn’t agree to this.</p>
<p>So how come there is more shares being lend out than even exist? Maybe the bank
that is managing my shares is just doing this behind my back without telling me,
and without sharing some of those fees with me.</p>
<p>Maybe this is stated in the terms and conditions of my depot that I certainly
did not read, like I expect no one ever does.</p>
<hr />
<p>Speaking of fees. I was really surprised that a certain trading app, which has
gained a lot of notoriety by simply restricting its customers from buying select
stock, allows you to trade with extremely low, or even without any fees at all.</p>
<p>I hate fees as much as the next person, and I pay a minimum of ~11€ per
transaction. And that btw is the lowest fees I have found after researching this
intensively. Last time I checked, I have by far the best conditions of any stock
market depot in the german speaking area.</p>
<p>And yes, those fees buy me a certain <em>service</em>. I am listed somewhere in a
shareholder directory. I receive regular invites to shareholder meetings that I
don’t care about. And notifications that a foreign government has looked at my
data in the shareholder directory which is super annoying.</p>
<p>The point is, I am fairly certain that I have some kind of guarantees that I
really do own those shares.</p>
<p>Being told that there are trading platforms out there that allow you to buy
shares without any kind of fees makes me speculate as to how they can pull that
off. Also considering the news that they are bleeding money makes me speculate
that maybe they are not actually trading real shares for each order, but rather
batch those up and buy in bulk, hoping to make a profit for themselves while
doing so.</p>
<p>Say, person A buys one share at 12€, person B buys one share at 13€, but the
broker actually merges these two together, buys two shares at 11€ at
<em>some later point in time</em> and pockets the difference.</p>
<p>Is it possible that the broker itself does short selling in the background
without the customers even knowing?</p>
<p>How is any of this even legal?</p>
<h1 id="hodl"><a class="anchor-link" href="#hodl" aria-label="Anchor link for: hodl">#</a>
Hodl</h1>
<p>Coming full circle to the beginning. I believe that the average person who knows
little about complex financial instruments just wants to buy shares and hold
them for a long time, usually years, possibly even decades.</p>
<p>Speaking of my own experience, I do own a lot of different shares. But I only
buy and hold. In the 11 years since I own shares, I only ever sold two times.
Both times, I sold about 10% of some very lucky shares that grew by 10x, to at
least get my initial investment back. In hindsight, I shouldn’t have sold,
because those shares went on to grow to 30x and 20x respectively.</p>
<p>Oh well. It really is all about gambling; like a casino.</p>
<blockquote>
<p>Tip of the day:</p>
<p>You can’t take a loss if you never sell. Similarly, you can’t make a profit if
you never sell either.</p>
</blockquote>
<p>Coming back to something that I wrote about earlier, the value of your shares
only exists for the instant that you trade them for money, otherwise its just
pretty numbers on the screen, that can change in either direction all the time.</p>
<h1 id="why-now"><a class="anchor-link" href="#why-now" aria-label="Anchor link for: why-now">#</a>
Why now?</h1>
<p>So, why is this exploding like this right now?</p>
<p>The answer is simple. Since half a year, international travel is so heavily
restricted as to being almost impossible. There is no way to spend any money
on restaurants or entertainment because everything is closed down. So those of
us who are lucky enough to still have a job get a salary that they have no way
to spend except on the bare necessities.</p>
<p>Also the financial policies of recent years have made sure that
<em>savings accounts</em> have become a joke that is effectively losing you money due
to inflation.</p>
<p>So the average person is sitting on a pile of money they have no idea what to
do with except to invest in <em>something</em>. And the average person is likely to
buy and hold.</p>
<hr />
<p>A small digression: I do think that out current situation is actually a
conspiracy. For the simple reason that I just can’t believe that governments
are acting with the goal of protecting the health and lives of citizens.</p>
<p>That would be a first. Its a lot more plausible that media propaganda and
governments are protecting and furthering some other interests. I just haven’t
figured out yet who is profiting and pulling the strings.</p>
<h1 id="the-game-is-rigged"><a class="anchor-link" href="#the-game-is-rigged" aria-label="Anchor link for: the-game-is-rigged">#</a>
The game is rigged</h1>
<p>Again, this turned into a super long opinion piece already.</p>
<p>Long story short, I think the whole financial market is rigged to disadvantage
small average everyday people.</p>
<p>Maybe another time I will write more about why I think that having a publicly
traded company is the worst idea one would have. Related, I think its no
coincidence that a certain chinese company which is not being traded publicly
is being specifically targeted by the US government with the clear goal of
destroying them. Yes; I do like conspiracy theories. :-D</p>
The Problem with walled gardens2021-01-13T00:00:00+00:002021-01-13T00:00:00+00:00
Unknown
https://swatinem.de/blog/walled-gardens/<p>Well, we live in exciting times!
And to say the least, some of the developments are a bit disturbing.
Some of the things have been developing for years and people didn’t much care
about it.</p>
<h1 id="exclusive-society"><a class="anchor-link" href="#exclusive-society" aria-label="Anchor link for: exclusive-society">#</a>
Exclusive Society</h1>
<p>The way I see it, most of these things follow a grand scheme, which I would say
is that society is getting more exclusive, by which I mean that exclusion, and
excluding people is getting more commonplace.</p>
<p>There is the blatant example of individuals or groups of people being blocked
or censored, companies refusing to do business with others, or the fact that we
are being coerced into accepting terms and conditions which are troublesome, out
of fear of being excluded from a service. Or trade restrictions that can even
exclude whole countries from participating in the normal world as we know it.</p>
<p>If you think even further, we see that we have less and less control over the
things we own. Increasingly, the world is relying on the <em>cloud</em> and services
offered by it. When your mobile phone, or even your car is tied to some kind of
online account / identity, it is not too far fetched that the hardware you
bought for hard earned cash can be turned into an expensive paperweight remotely.</p>
<p>Not to mention this certain biological nemesis that can as well ostracize
everyone who is not willing to comply with whatever is mandated.</p>
<h1 id="laws-and-justice"><a class="anchor-link" href="#laws-and-justice" aria-label="Anchor link for: laws-and-justice">#</a>
Laws and Justice</h1>
<p>The thing is, we as people, and also companies and governments are well within
their rights to do what they do. <em>Or are they?</em>
No one can force us to be friends with someone who, for example, puts pineapple
on pizza. A fancy-ass restaurant is not obliged by law to serve you if you come
there without a shirt. A company is not forced to hire someone that is not fit
for a job. And a company does not need to conduct business with everyone.</p>
<p>There are surely limitations, laws and so called terms of service, or codes of
conduct. You know, those boring-ass legal documents that seriously no one ever
reads. But you kind of have to "accept" them in order not to be excluded from a
service. Those documents kind of state under which circumstances a company does
business, or ceases to do business.</p>
<p>It is really interesting though that in most cases, the company that for example
de-monetizes or outright censors or blocks a content creator rarely points to
a specific reason for its actions. They just say its violating the terms of
service and thats it.</p>
<p>Because there are things like anti discrimination laws. Remember when I said
that a company can reject an applicant, maybe because they found someone who is
better qualified. The company better not give any reason for refusing to hire
that candidate. Because if the company says they refused to hire because the
candidate likes pizza with pineapple, that candidate can sue the company.</p>
<p>The other way around as well. A company can fire someone for various reasons.
However if they clearly say that the reason is that the employee eats pineapple
pizza, things might end up in court.</p>
<p>Usually though, when one party decides to cancel a contract for whatever reason,
there are laws that say under which circumstances that happens. In most cases,
you have to have due notice. A landlord can’t just bring the eviction squad
one minute after the rent is overdue.</p>
<p>The problem is that most of the service providers that we rely on do not really
give due notice, they don’t have a formal process to contest the decision, and
in general the decisions can seem extremely arbitrary.</p>
<p>Hey, maybe someone found out that the person in question likes pineapple pizza
and they decided to just refuse to process payments that were made to that
person or company.</p>
<hr />
<p>So there are end-user, or business contracts, and laws that govern those.
Maybe anti-discrimination laws make sure that a certain group of people is not
being excluded because of some discerning characteristic. Maybe other laws say
that certain people <em>have to be excluded</em>.</p>
<p>I am not sure about the details, but I think there are even essential services
which are forced to serve you. For example, what if the postal service just
refuses to deliver your mail, maybe because the mailman saw you eating pineapple
pizza?</p>
<h1 id="sovereignty-of-nations"><a class="anchor-link" href="#sovereignty-of-nations" aria-label="Anchor link for: sovereignty-of-nations">#</a>
Sovereignty of Nations</h1>
<p>Also, we live in a globalized and connected world. And we also have a lot of
business or other kind of relations that cross borders. Businesses want to
exchange goods, etc. And there are international agreements and laws that say
under which rules and conditions that has to happen.</p>
<p>Maybe a country mandates by law that there has to be pineapple on every pizza
sold? A foreign company better put that pineapple on the pizza or otherwise they
miss out on all the profit they might make.</p>
<p>Sure, there is a certain country in this world that likes to play world police,
and tell other <em>sovereign nations</em> if, and how much pineapple there has to go
on the pizza!</p>
<p>And sometimes there is disagreement and compromises. Take disputed borders as
an example. Country A claims the border includes a pineapple plantation, while
neighboring country B wants that pineapple plantation for themselves.
If a multinational corporation wants to do business in both countries, they have
to abide by the laws of that respective country. Which means that if you use for
example an online mapping service of that company, it will show different
borders depending from which country you use that service.</p>
<p>In reality, a company may be forced to censor certain content, or to fork over
private data to government institutions or implement backdoors. Its just the
way things work. If company A wants to do business and make profit in country B,
they better make sure to hand over all the personal details of people who do not
like to have pineapple on their pizzas.</p>
<p>Things get crazier when you realize that some of these conditions can prevent
fair taxation of foreign companies, or prevent any kind of technology licensing,
essentially keeping a whole country in the stone-age by refusing to share all
the nice things with them.</p>
<p>Maybe a country just doesn’t want to respect international patent law, and with
a bit of industrial espionage they can reverse engineer everything. Or maybe we
finally see the value of having open standards that are not subject to copyright,
patents or other kinds of restrictions.</p>
<p>The problem is. Countries have laws, and when the justice system and the courts
decide that by law, it is not allowed to advocate putting pineapple on pizza,
then a court order may force a company to cease to have business relations with
other companies or people. But when a company decides this on itself, then its
more like vigilantism.</p>
<p>We have armed people running around on the streets executing their own form of
justice on others that eat pizza, with or without pineapple. These big oligarchic
companies just became judge, jury and executioner. All based on their <strong>opinion</strong>
on what belongs on a pizza and what does not.</p>
<p>Maybe the reason the courts are not working properly is because the justice
system is overwhelmed with fights over patents and copyright claims?</p>
<h1 id="propaganda-and-censorship"><a class="anchor-link" href="#propaganda-and-censorship" aria-label="Anchor link for: propaganda-and-censorship">#</a>
Propaganda and Censorship</h1>
<p>Thing is. We put ourselves into this mess. We followed the smell of delicious
pineapple pizza and didn’t even notice that we sold all our freedoms while
munching on it. We ended up with this giant propaganda machine, that can just
block and silence everyone who dare to speak against putting pineapple on pizza.</p>
<p>We don’t even know by which rules it operates. The thing we know is to better
follow whatever they say, or otherwise fear losing your job, fear losing all
your family photos that you have uploaded into the cloud, or fear being able to
start your car.</p>
<p>If the machine says you gotta eat pizza with pineapple, you gotta do it and
better fake that you like it, even if deep down you almost choke on it.</p>
<p>Remember the question: By which rules do those companies operate? At times it
seems like they just follow the loudest mob. Just reverberate the opinion that
will most likely make their share prices skyrocket, or will guarantee them the
most profit.</p>
<hr />
<p>The problem is that people are way too gullible to all this. Use the propaganda
machine the right way and the people will actually demand that every pizza has
to have pineapple. Just like we have to shut everything down and destroy the
livelihood of a lot of people, and maybe even a lot of interpersonal relationships.
Because propaganda made us believe that this is the only way.</p>
<p>And there will be zealots. I used to know a fanatic that literally said:</p>
<blockquote>
<p>When I meet someone, I ask them what kind of pizza they eat. And then I decide
if I want to be friends with them or not.</p>
</blockquote>
<h1 id="divide-and-conquer"><a class="anchor-link" href="#divide-and-conquer" aria-label="Anchor link for: divide-and-conquer">#</a>
Divide and Conquer</h1>
<p>By now we have an extremely divided society, which likes to compartmentalize
everyone. Interestingly, there is two categories here. Things that you were just
born with, which are dictated by genetics, and things that are opinions, tastes
and something that you <em>chose</em> to believe.</p>
<p>We know its not just to treat people differently depending on what combination
of chromosomes they were born with. So why is it such a problem what kind of pizza
people prefer?</p>
<p>It seems to me that some people are turned into radical fundamentalists. It is
interesting that the terms "religion" and "evangelize" come to my mind. It is a
bit similar I would say. You want to convince others of your choice of pizza.
Because yours is the best, and all the rest are just garbage. If you have to,
you will ostracize them, put them into jail or worse.</p>
<p>You are absolutely convinced that you are fighting for the good cause. That
pizza without pineapple is the right thing to do and everyone who eats is with
pineapple is committing a crime!</p>
<p>It is offending to you even existing in the same room as someone who prefers
pineapple pizza.</p>
<hr />
<p>So yes. We are living in an extremely divided society. The thing that I still
haven’t quite figured out is: who benefits from this? Sure, in the current
situation the one who <em>can afford</em> to get tested or vaccinated sure enjoys more
freedoms than one who can’t. But I’m not sure if it is that simple. And even if
it is, people are too busy arguing among themselves if they should wear masks or
not.</p>
<p>Its also a bit comical how even the respect for laws and law enforcement
personnel is eroded when you read about the police harassing little children and
confiscating sleds. LOL.</p>
<h1 id="privacy-and-politics"><a class="anchor-link" href="#privacy-and-politics" aria-label="Anchor link for: privacy-and-politics">#</a>
Privacy and Politics</h1>
<p>The only thing that is really private to us, thus far at least, is our thoughts
and minds. And we do need to share that. Because there is so much beauty, love,
innovation, humor, and ideas in there. Some ideas may contradict your own.
But thats okay. I laugh about writing this. But hey: It is okay to have different
opinions! Except for what kind of pizza to eat! If it is the wrong kind, I will
hate you and make your life miserable. Nah, just joking.</p>
<p>We need to have the freedom to express ourselves. To have a <em>safe space</em> to
speak our mind. To challenge the status quo. To think outside of the box.</p>
<p>Do we live in a society where you must not defy the eternal leader, or otherwise
fear death? Or do we actively encourage to be challenged intellectually. Maybe
someone else has better ideas than I do?</p>
<p>Most of these things are <em>opinions</em>. And those <em>opinions</em> end up being policy
and law. Because some time ago it was decided that some voted representatives
will make a majority decision on this. This is what we call democracy. In some
parts of the world, it works differently than in others. I don’t want to go into
too much detail here.</p>
<p>Anyhow. One problem I see in politics is lack of privacy. There has to be some
kind of transparency, yes. But on the other hand, when you can actually see
online how each member of parliament voted makes it way too easy to coerce,
blackmail, bribe someone to vote a certain way. Party discipline is a thing, and
maybe someone doesn’t want to get kicked out of the club, so they better vote
the party line even if their consciousness tells them otherwise.</p>
<p>We need to have secrecy of the vote also in parliament! Otherwise its just a
joke.</p>
<h1 id="conclusion"><a class="anchor-link" href="#conclusion" aria-label="Anchor link for: conclusion">#</a>
Conclusion</h1>
<p>Well this has been long. And a bit emotional. Like I said, somehow we are more
attached to the things we <em>chose</em> to believe in.</p>
<p>In general, I feel both quite chill, and afraid at the same time. So you say
<code>1+1 = 3</code>, or wait, is it <code>1+1 = 5</code>. Doesn’t matter, just do your thing.
(2 + 2 is 4, minus 1 thats 3, quick mafs!)</p>
<p>Its just that I’m a bit disappointed. We came quite far from darker times where
you were discriminated against based on the skin color, genitalia, or a patch
sewn on your jacket. While we are still far away from true justice, haven’t we
learned anything? Why do we start start fights over wearing, or not wearing
masks? Why are we even debating discriminating against people based on whether
they are vaccinated, or not. Vegan or meat eater.</p>
<p>But you might say, I know that pizza with pineapple is better. It is the right
thing to do! And it is okay to shove it down peoples throats who hate it.
Because the ends justifies the means.</p>
<p>I disagree. And I live in a time right now in which I actually have to be afraid
to speak my opinion. Shoving pizza down peoples throats is wrong! It does not
matter which kind it is!</p>
<p>I do believe in freedom. Freedom to express an opinion. To not be censored and
blocked for it. I might not agree with the opinions that are being expressed.
But I do fight for the right of expressing those. Because I believe that is the
right thing to do. And I do believe that it should be laws decided upon with a
democratic process, enforced by courts, which decide what is okay and what is not.</p>
<blockquote>
<p>Ordnung muss sein. ;-)</p>
</blockquote>
<p>And we need to take back control over our services and our data. The big
oligarchs literally have us by our family photos and means of communication.
We need free and open networks, protocols, instruction sets, operating systems,
devices, etc, etc. Federated, decentralized, peer-to-peer. With the right to
repair, and to modify. To copy and to encrypt. It might be more work than just
forking over the key to your lives to some private company, but it might just
be worth it in the end.</p>
<p>Others may disagree.</p>
<hr />
<p>Anyway, this really got a lot longer than anticipated.</p>
<p>So the real conclusion here is: I do like pineapple pizza. And I like to eat my
Nutella bread with butter.</p>
<p><strong>Why?</strong> you might ask. <strong>Why does it even matter why?</strong> I will ask in return.</p>
<p>I have friends who prefer it either way, and thats okay.</p>
Feedback on Rusts Code Coverage2020-11-23T00:00:00+00:002020-11-23T00:00:00+00:00
Unknown
https://swatinem.de/blog/rust-cov/<p>In my <a href="https://swatinem.de/blog/rust-2021/">Rust 2021</a> wishlist, I was expressing
my excitement about having high-quality, precise code coverage in Rust. I have
been playing around with it a bit these days, and here comes my feedback and a
wishlist.</p>
<h1 id="why-code-coverage-matters"><a class="anchor-link" href="#why-code-coverage-matters" aria-label="Anchor link for: why-code-coverage-matters">#</a>
Why Code Coverage matters</h1>
<p>I think this is a matter of personal preference to some degree. But in general
I think most developers do care about testing their software. And code coverage
simply answers the question what parts of your code you actually test. Every
part of the code that is not tested can potentially have bugs. Or maybe that
part is also dead code?</p>
<p>The other question is how detailed you want coverage to be. I think that testing
every single permutation is a bit overkill, but on the other hand, just having
per-function or per-line statistics is way too less in my opinion. For me
personally, the sweet spot is at the branch level. I want all the branches in
my code to be covered by tests. If one branch is not covered, it might mean that
a condition is always true. Either because my tests are lacking, or because
I have basically dead code.</p>
<p>Branch-level coverage makes sure that all conditions of conditional-code are
hit. This is especially important when chaining conditions using
short-circuiting operators. So for the expression <code>a() && b()</code>, <code>b()</code> will only
be executed if the result of <code>a()</code> is true. And you can nest a couple of those
conditions.</p>
<p>Also, code coverage is a really nice way to gamify writing tests. Because you
can write tests specifically that exercise certain parts of your code base,
which also increases your understanding of the code at the same time.</p>
<h1 id="what-i-expect-from-tools"><a class="anchor-link" href="#what-i-expect-from-tools" aria-label="Anchor link for: what-i-expect-from-tools">#</a>
What I expect from Tools</h1>
<p>So far, I have been used to the excellent tools that are available for
JavaScript, more specifically <a href="https://istanbul.js.org/">istanbul</a>. It does
function, statement and branch-level coverage. It does so by instrumenting the
code, giving each source-file a preamble with some metadata, and then for each
function, statement or branch, it generates code to increment a counter. In
post-processing, it will then generate a report out of that.</p>
<p>Since a few versions, node has had builtin coverage which is supposed to be a
lot quicker and support "block level" granularity. This does go as deep as
expressions, not sure about branches though. But there is
<a href="https://github.com/bcoe/c8">c8</a> which can basically create the same output as
istanbul, but a lot quicker. I have tried this some time ago with an older
version, but found that the output quality was a bit lacking.</p>
<p>I think in the end both of these tools are reasonably good at producing code
coverage for <em>JavaScript</em>, however, JS is not always the code that you <em>write</em>,
it just happens to be the code that the engine <em>executes</em>. And the problem now
becomes to map between those two. In the JS world, sourcemaps are the way that
is done, but they are quite a pain sometimes and the results have varying
quality.</p>
<p>Anyway. What I expect is to have a single command line option, or a wrapper
around my command that will just magically provide me with a code coverage
report that I can view and act on.</p>
<h1 id="rust-process"><a class="anchor-link" href="#rust-process" aria-label="Anchor link for: rust-process">#</a>
Rust Process</h1>
<p>The process of how to get reports is actually <a href="https://swatinem.de/blog/rust-cov/rust-cov">well documented</a>, but
still super complex.</p>
<p>Lets start from the beginning. The first step is to provide a switch to the rust
compiler instructing it to generate an instrumented library or executable.
Running cargo with <code>RUSTFLAGS="-Zinstrument-coverage"</code> does a wonderful job
there, but is has one major disadvantage: It passes these rustflags to <em>all</em> of
the code it compiles. But I would argue that in 99% of the cases, you only care
about your <em>own</em> code, which translates to: <em>crates in your workspace</em>. This
creates two problems: I have to explicitly <em>ignore</em> code that is not part of my
workspace, and instrumentation does have a negative effect on both compile time,
and also on runtime. Although I imagine with enough caching this won’t matter
as much. Still, I am paying the cost for something that I don’t want to use!</p>
<p>The second env var that I have to provide is
<code>LLVM_PROFILE_FILE=$PWD/cov/%p.profraw</code>. Note that I have to provide an
absolute path here, and use the <code>%p</code> placeholder because unit tests, integration
tests, and also doctests are basically their own executable and are run
independently. Another side-effect of that is <a href="https://github.com/rust-lang/cargo/issues/2832">cargo#2832</a> which makes
<code>cargo test</code> output rather unreadable.</p>
<p>Then run <code>llvm-profdata merge -sparse cov/*.profraw -o coverage.profdata</code> to
merge these individual files. Well fair enough, I can fully understand why.
I had to basically manually write something like that for <a href="https://istanbul.js.org/">istanbul</a> because
that functionality was somehow missing in that ecosystem, oh well.</p>
<p>Now the next step is the really annoying one, as I have to give a list of
<em>objects</em> to <code>llvm-cov</code> for it to generate a report. I am far from actually
<em>knowing</em> how this all works, but my <em>guess</em> is that it is using the debuginfo
embedded/referenced by the object files to actually map to the sourcefiles and
the line/offset in those. This is super tedious when dealing with cargo. Lets
illustrate this with an example.</p>
<hr />
<p>So lets start with a really simple toy example.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="font-style:italic;color:#abb0b6;">// src/lib.rs:
</span><span style="font-style:italic;color:#abb0b6;">/// ```
</span><span style="font-style:italic;color:#abb0b6;">/// assert_eq!(fucov::generic_fn("doc", "oh hai"), Ok("doctest"));
</span><span style="font-style:italic;color:#abb0b6;">/// ```
</span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">generic_fn</span><span><T>(</span><span style="color:#ff8f40;">s</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">str</span><span>, </span><span style="color:#ff8f40;">val</span><span style="color:#61676ccc;">:</span><span> T) </span><span style="color:#61676ccc;">-> </span><span style="font-style:italic;color:#55b4d4;">Result</span><span><</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">str</span><span>, T> {
</span><span> </span><span style="color:#fa6e32;">match</span><span> s {
</span><span> </span><span style="color:#86b300;">"unit" </span><span style="color:#ed9366;">=> </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(</span><span style="color:#86b300;">"unit-test"</span><span>)</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#86b300;">"integration" </span><span style="color:#ed9366;">=> </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(</span><span style="color:#86b300;">"integration-test"</span><span>)</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#86b300;">"doc" </span><span style="color:#ed9366;">=> </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(</span><span style="color:#86b300;">"doctest"</span><span>)</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ed9366;">_ => </span><span style="font-style:italic;color:#55b4d4;">Err</span><span>(val)</span><span style="color:#61676ccc;">,
</span><span> }
</span><span>}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">test</span><span>]
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">unit_test</span><span>() {
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(</span><span style="color:#f07171;">generic_fn</span><span>(</span><span style="color:#86b300;">"unit"</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">1</span><span>)</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(</span><span style="color:#86b300;">"unit-test"</span><span>))</span><span style="color:#61676ccc;">;
</span><span>}
</span><span>
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// tests/test_integration.rs:
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">test</span><span>]
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">integration_test</span><span>() {
</span><span> </span><span style="color:#f07171;">assert_eq!</span><span>(
</span><span> fucov</span><span style="color:#ed9366;">::</span><span>generic_fn(</span><span style="color:#86b300;">"integration"</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#55b4d4;">Some</span><span>(</span><span style="color:#ff8f40;">true</span><span>))</span><span style="color:#61676ccc;">,
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(</span><span style="color:#86b300;">"integration-test"</span><span>)
</span><span> )</span><span style="color:#61676ccc;">;
</span><span>}
</span><span>
</span></code></pre>
<p>Running my tests, I get the following output:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>> cargo +nightly test
</span><span>
</span><span> Finished test [unoptimized + debuginfo] target(s) in 0.00s
</span><span> Running target/debug/deps/fucov-e207e6174e8f3968
</span><span>
</span><span>running 1 test
</span><span>test unit_test ... ok
</span><span>
</span><span>test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
</span><span>
</span><span> Running target/debug/deps/test_integration-d1ff69dad6b5720c
</span><span>
</span><span>running 1 test
</span><span>test integration_test ... ok
</span><span>
</span><span>test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
</span><span>
</span><span> Doc-tests fucov
</span><span>
</span><span>running 1 test
</span><span>test src/lib.rs - generic_fn (line 1) ... ok
</span><span>
</span><span>test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
</span></code></pre>
<p>Well, so much for clean output… anyway… I do see which executables cargo is
running, which is good. Except, which executable is run for my doctests?
Checking the <code>target/debug/deps</code> folder, I do see a <code>libfucov-HASH.rlib</code>, maybe
thats my library that was linked for my doctests?</p>
<p>Playing around with this a little, I played around with variations of this
command line: <code>llvm-cov show --format=html --show-instantiations=false --instr-profile coverage/fucov.profdata --object target/debug/deps/fucov-e207e6174e8f3968 --object target/debug/deps/test_integration-d1ff69dad6b5720c</code>.
This is really a mouthful. As you can see, I don’t have listed my rlib file
there, as it did not make any difference in output. In the end, I was not able
to get the coverage from my doctest at all. Thats a shame.</p>
<p>Looking at html output or json/lcov output, it was also interesting how this
generic function was treated. The command line I gave explicitly excluded
showing individual instantiations of the generic, as that would surely be
overload if there were lots of generics, as I think there usually are with rust.
In that case, also providing a demangler would have been useful as it will show
each block of code captioned with the function name.</p>
<p>What I can see from the output is that my unit and integration test functions
themselves are also covered, as is kind of expected but a bit useless.</p>
<h1 id="conclusion"><a class="anchor-link" href="#conclusion" aria-label="Anchor link for: conclusion">#</a>
Conclusion</h1>
<p>I’m really impressed with the quality of the reports, although the process to
get there is a bit too convoluted IMO. So the foundation is there, now its just
a matter of optimizing and making it simple to use. And well, maybe I feel like
it and will create my own cargo command called <code>fucov</code>, which does all that,
because well, fuck off and give me my coverage!</p>
Understanding the limitations of functional record update2020-11-19T00:00:00+00:002020-11-19T00:00:00+00:00
Unknown
https://swatinem.de/blog/frufru-1/<p>This story starts with the sentry protocol and event payloads, and how to make
them both typesafe, easy-to-use, and extensible at the same time, which they
are currently not, and probably won’t be with current Rust.</p>
<p>So these definitions are just plain-old-data (POD), they have no logic or other
functionality, they are just used to transfer data to the server. And they have
a whole bunch of properties, most of which are optional. You mostly just create
them, and very rarely look inside them. So we want to optimize the ergonomics
for creation.</p>
<p>I would argue the most ergonomic solution here is to use a plain struct with
public fields (notice that we derive <code>Default</code>):</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(Default)]
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">MyStructure </span><span>{
</span><span> </span><span style="color:#fa6e32;">pub </span><span>field_a</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">u32</span><span>,
</span><span> </span><span style="color:#fa6e32;">pub </span><span>field_b</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">Option</span><span><</span><span style="color:#fa6e32;">f64</span><span>>,
</span><span>}
</span></code></pre>
<p>And you can very easily create it via a struct literal:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span>MyStructure {
</span><span> field_a</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">123</span><span style="color:#61676ccc;">,
</span><span> field_b</span><span style="color:#61676ccc;">: </span><span style="font-style:italic;color:#55b4d4;">None</span><span style="color:#61676ccc;">,
</span><span>}
</span></code></pre>
<p>We don’t really care about <code>field_b</code> that much, and when its not 2 fields, but
rather 20, its very tedious to list each and every one of those especially if
you don’t really care about them.</p>
<p>One solution to this is to use the functional record update (FRU) syntax.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span>MyStructure {
</span><span> field_a</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">123</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#ed9366;">..</span><span style="font-style:italic;color:#55b4d4;">Default</span><span style="color:#ed9366;">::</span><span>default()
</span><span>}
</span></code></pre>
<p>FRU has the additional benefit that adding new fields will still work without
changes. However when a user really wants to exhaustively list all of the
fields, and you add another one later, they will receive a compile error:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>error[E0063]: missing field `field_c` in initializer of `MyStructure`
</span><span> --> src\main.rs:11:5
</span><span> |
</span><span>11 | MyStructure {
</span><span> | ^^^^^^^^^^^ missing `field_c`
</span></code></pre>
<p>Adding fields to public structures is thus a breaking change, because it breaks
the code of users who don’t use FRU.</p>
<p>The go-to solution to be able to extend structs without breaking peoples code
is to use the <code>#[non_exhaustive]</code> attribute. But this is a really big hammer,
and it basically prevents users from using struct literals <em>at all</em>.</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>error[E0639]: cannot create non-exhaustive struct using struct expression
</span><span> --> src\main.rs:11:5
</span><span> |
</span><span>11 | / MyStructure {
</span><span>12 | | field_a: 123,
</span><span>13 | | ..Default::default()
</span><span>14 | | };
</span><span> | |_____^
</span></code></pre>
<p>One is forced to create an object first using a constructor (or default), and
then assign properties one-by-one, or use setters for that purpose, maybe
through the builder pattern. However, for this use-case, this is really heavy
and kind of destroys the ergonomics. So right now its essentially impossible to
have ergonomic <em>and</em> extensible structs.</p>
<p>But why shouldn’t that be possible? If you have a complete object via <code>Default</code>,
why can’t you just override those properties that you care about? The <code>Default</code>
will make sure its always the right struct, even if it changes.</p>
<h2 id="digging-through-rust-code"><a class="anchor-link" href="#digging-through-rust-code" aria-label="Anchor link for: digging-through-rust-code">#</a>
Digging through Rust code</h2>
<p>Thinking that it should be easily possible, I dusted off my 5 year old Rust
clone and after some initial difficulties I managed to get it to build on
Windows.</p>
<p>Since Rust Errors have a specific code, its easy to search for that and sure
enough, I quickly find the actual
<a href="https://github.com/rust-lang/rust/blob/57edf88b400ff6c6ae1de255fbd7e3448aca4fb2/compiler/rustc_typeck/src/errors.rs#L166-L172">type implementing <code>E0639</code></a>:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">derive</span><span>(SessionDiagnostic)]
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">error </span><span style="color:#ed9366;">= </span><span style="color:#86b300;">"E0639"</span><span>]
</span><span style="color:#fa6e32;">pub struct </span><span style="color:#399ee6;">StructExprNonExhaustive </span><span>{
</span><span> </span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">message </span><span style="color:#ed9366;">= </span><span style="color:#86b300;">"cannot create non-exhaustive {what} using struct expression"</span><span>]
</span><span> </span><span style="color:#fa6e32;">pub </span><span>span</span><span style="color:#61676ccc;">:</span><span> Span,
</span><span> </span><span style="color:#fa6e32;">pub </span><span>what</span><span style="color:#61676ccc;">: </span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">'static str</span><span>,
</span><span>}
</span></code></pre>
<p>Searching for that type gives me a single use, which is exactly
<a href="https://github.com/rust-lang/rust/blob/e0ef0fc392963438af5f0343bf7caa46fb9c3ec3/compiler/rustc_typeck/src/check/expr.rs#L1110-L1116">the code I was looking for</a>:</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span> </span><span style="font-style:italic;color:#abb0b6;">// Prohibit struct expressions when non-exhaustive flag is set.
</span><span> </span><span style="color:#fa6e32;">let</span><span> adt </span><span style="color:#ed9366;">=</span><span> adt_ty</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">ty_adt_def</span><span>()</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">expect</span><span>(</span><span style="color:#86b300;">"`check_struct_path` returned non-ADT type"</span><span>)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">if </span><span style="color:#ed9366;">!</span><span>adt</span><span style="color:#ed9366;">.</span><span>did</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">is_local</span><span>() </span><span style="color:#ed9366;">&&</span><span> variant</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">is_field_list_non_exhaustive</span><span>() {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">self</span><span style="color:#ed9366;">.</span><span>tcx
</span><span> </span><span style="color:#ed9366;">.</span><span>sess
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">emit_err</span><span>(StructExprNonExhaustive { span</span><span style="color:#61676ccc;">:</span><span> expr</span><span style="color:#ed9366;">.</span><span>span</span><span style="color:#61676ccc;">,</span><span> what</span><span style="color:#61676ccc;">:</span><span> adt</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">variant_descr</span><span>() })</span><span style="color:#61676ccc;">;
</span><span> }
</span></code></pre>
<p>Reading the code a bit, its obvious that the optional <code>base_expr</code> holds the
record update if there is one. So I added a <code>&& base_expr.is_none()</code> and
recompiled.</p>
<p>And lo and behold, my code suddenly compiles.</p>
<h2 id="this-was-way-too-simple"><a class="anchor-link" href="#this-was-way-too-simple" aria-label="Anchor link for: this-was-way-too-simple">#</a>
This was way too simple</h2>
<p>I couldn’t believe that a oneliner was enough to fix this. This was way too
simple. I must be missing something. So I went back to research, and found
<a href="https://github.com/rust-lang/rfcs/blob/master/text/2008-non-exhaustive.md#functional-record-updates">RFC 2008</a>
which specified the <code>#[non_exhaustive]</code> feature, and had a paragraph dedicated
to FRU. Im paraphrasing here:</p>
<blockquote>
<p>It will not work, because you could, in the future, add private fields that the user didn't account for.</p>
</blockquote>
<p>So you can add public fields without a problem, but private fields will run
into a different Error:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>error[E0451]: field `field_d` of struct `MyStructure` is private
</span><span> --> src\main.rs:13:11
</span><span> |
</span><span>13 | ..Default::default()
</span><span> | ^^^^^^^^^^^^^^^^^^ field `field_d` is private
</span></code></pre>
<p>As a side note, this error seems to happen at a different compiler stage, as it
would only show up in my tests when I solve all other compiler errors.</p>
<p>There are <a href="https://github.com/rust-lang/rust/issues/63538">two issues</a>
<a href="https://github.com/rust-lang/rust/issues/70564">for that</a> already, and probably
lots of duplicates since this is obviously a much desired feature.</p>
<p>It looks like there were some experiments 3 years ago that weren’t successful,
so I guess things get a bit more complicated after all.</p>
<h2 id="dealing-with-private-fields"><a class="anchor-link" href="#dealing-with-private-fields" aria-label="Anchor link for: dealing-with-private-fields">#</a>
Dealing with private Fields</h2>
<p>Searching for this new error <code>E0451</code> also gives me a single usage in the Rust
compiler in a function called
<a href="https://github.com/rust-lang/rust/blob/7d747db0d5dd8f08f2efb073e2e77a34553465a7/compiler/rustc_privacy/src/lib.rs#L952-L985">check_field</a> in the <code>rustc_privacy</code> crate. Surprisingly,
this method takes a <code>in_update_syntax</code> parameter which is used to customize the
error message.</p>
<p>This in turn comes from checking the privacy of structs, which has the following
to say related to FRU:</p>
<blockquote>
<p>If the expression uses FRU we need to make sure all the unmentioned fields
are checked for privacy (RFC 736). Rather than computing the set of
unmentioned fields, just check them all.</p>
</blockquote>
<p>Ok, time to skim through
<a href="https://github.com/rust-lang/rfcs/blob/master/text/0736-privacy-respecting-fru.md">RFC 736</a>.
The RFC seems to be very thorough, and explicitly mentions our desired
extensibility. The reasoning for the change was to plug an abstraction-violating
hole. The example given in the RFC is pretty clear on why that is a bad idea.
And indeed disallowing FRU for structs with private fields solves it neatly.</p>
<h2 id="moving-on"><a class="anchor-link" href="#moving-on" aria-label="Anchor link for: moving-on">#</a>
Moving on</h2>
<p>Now we know why things are the way they are, and they make perfect sense.
I’m positively surprised how reasonably well things are documented. Although its
sad that our problem is a lot harder to solve than it seemed at first. I mean
otherwise surely someone would have done it in the last 5 or so years.</p>
<p>So the challenge becomes: Can we still find a way to improve things?
That is for another time to find out.</p>
Forms of blocking and non-blocking I/O2020-11-15T00:00:00+00:002020-11-15T00:00:00+00:00
Unknown
https://swatinem.de/blog/forms-of-io/<p>My fiance is often challenging me to explain some programmers topic in simple
terms for non-programmers like her to understand. We call these stories
<em>Programmers Fairytales</em>. I was surprisingly successful explaining to her the
differences between blocking, and different forms of non-blocking I/O
(input-output) in a way that she understood. Surprisingly, these thoughts have
come back recently, thinking about a problem at work that came up and needs
brainstorming.</p>
<h1 id="programmers-fairytale"><a class="anchor-link" href="#programmers-fairytale" aria-label="Anchor link for: programmers-fairytale">#</a>
Programmers Fairytale</h1>
<p>So our programmers fairytale starts with us having to endure the overly
bureaucratic and annoying process of getting married. We have to frequently go
to the <em>Standesamt</em> (apparently it’s <em>civil registration office</em> in english).
There are different ways we can talk to them, which is how I explained I/O to
her.</p>
<h2 id="blocking-i-o"><a class="anchor-link" href="#blocking-i-o" aria-label="Anchor link for: blocking-i-o">#</a>
Blocking I/O</h2>
<blockquote>
<p>Hello there. We would like you to get this work done, we will wait here until
you are done.</p>
</blockquote>
<p>Blocking I/O is like knocking at someones door and giving them something to
work, while waiting patiently at the door until that work is done. Depending on
the kind of work, this is obviously a bad idea, because you just keep waiting
/ <em>blocking</em> until the work is finished, while you can’t do anything meaningful
in the meantime yourself.</p>
<h2 id="polling-based-i-o"><a class="anchor-link" href="#polling-based-i-o" aria-label="Anchor link for: polling-based-i-o">#</a>
Polling-based I/O</h2>
<blockquote>
<p>Hello there. We have some work for you. We will check back tomorrow.</p>
<p>Tomorrow: <a href="https://www.youtube.com/watch?v=QfaSLHOm4rw">Yo, you about done? Tick tock, motherfucker!</a></p>
</blockquote>
<p>When doing non-blocking I/O based on polling, it means to periodically check
back and see if the work you wanted to get done is finished already or not.
It’s a lot better already, because we can spend the meantime doing other things
while we wait. It’s however not ideal, because we still have to walk over to
the office and ring the door, or alternatively do a phone-call and wait for it
to get answered. The other person might also be interrupted in whatever they
were doing at that time.</p>
<h2 id="completion-based-i-o"><a class="anchor-link" href="#completion-based-i-o" aria-label="Anchor link for: completion-based-i-o">#</a>
Completion-based I/O</h2>
<blockquote>
<p>Us: Hello there. We have some work for you.</p>
<p>Office: Cool. Leave your number, we will call you when its done.</p>
</blockquote>
<p>With this form of I/O, we register a <em>callback</em>, to be notified of the
completion of the work later on. In our case, we just leave our phone number
and email with the office. Less interruptions from checking back every day,
and eventually the work is done. This works fine for software, although not so
well when dealing with civil offices.</p>
<h2 id="queueing-io-uring"><a class="anchor-link" href="#queueing-io-uring" aria-label="Anchor link for: queueing-io-uring">#</a>
Queueing (io_uring)</h2>
<blockquote>
<p>Us: We have some work for you, we will leave it in your letter box. When you
are done, put the reply in our letter box please.</p>
<p>Office: I have some spare time right now, I guess I can check my letter box.</p>
<p>Us: Might as well check my letter box on the way out. Oh, there is the reply.</p>
</blockquote>
<p>Now we have introduced something in between, a letterbox, to hold the messages
and the work instructions. We don’t interrupt anyone by ringing the doorbell or
calling the phone. Whenever they have some spare time, they will just check
their letter box. In terms of software, I would call this an ideal solution.
Every party can just focus on doing its own thing, no distractions.</p>
<p>There is one complication though. You have this new concept of a letter box to
think about. In particular to think about what you want to do in case it fills
up with messages that you don’t have time to reply to.</p>
<p>Interestingly enough, this form of I/O was recently introduced in the Linux
kernel under the name <code>io_uring</code>. It is based around a <em>submission queue</em>
(office letter box) and a <em>completion queue</em> (our letter box). Submitting
requests and polling the result works without any syscall / context-switches
(ringing the doorbell).</p>
<h3 id="spectre-for-non-programmers"><a class="anchor-link" href="#spectre-for-non-programmers" aria-label="Anchor link for: spectre-for-non-programmers">#</a>
Spectre for non-programmers</h3>
<p>A small digression here. I was even able to kind-of explain spectre to my fiance.
Remember when we ring the doorbell of the office. It would be super bad if they
open the door and we would see all of the paperwork of other people laying
around. So naturally, they will first get that stuff out of the way before
opening the door for us. Kind of like how CPU caches have to be wiped before
switching contexts, so other processes are not able to do cache-timing attacks.</p>
<hr />
<p>Anyway, <code>io_uring</code> does not involve syscalls, so it does not need to do any of
that. Which means its super fast. And I’m certainly very excited about its
potential. Since it is quite young, it’s not yet possible to use it for
<em>everything</em>, and also programs and runtimes need to be updated to actually make
use of it.</p>
<h1 id="languages-and-runtimes"><a class="anchor-link" href="#languages-and-runtimes" aria-label="Anchor link for: languages-and-runtimes">#</a>
Languages and Runtimes</h1>
<p>In another programmers fairytale, I tried to explain <code>async/await</code> to her,
although I think I was not as successful as before. Interestingly, you can mix
different forms of async.</p>
<p>Lets start with some work that you want to get done asynchronously. We call
this <code>Future</code> in Rust, and <code>Promise</code> in JS. Its a <em>promise</em> that the work you
requested may eventually be done in the <em>future</em> (or not).</p>
<p>The old way of using these futures was with callbacks, the <em>completion-based</em>
model from above. You <em>leave your number</em> by providing a function / closure.
Its just a block of code thats separate from your other code.</p>
<p>The <em>new</em> way of doing things is by using the <code>await</code> keyword. In Rust, it is
put <em>behind</em> an expression, in JS it’s in front. What it does is, it
<em>looks like</em> blocking, like the <em>we will wait here</em> from above, hence the
<code>await</code>.</p>
<p>However, under the hood its implemented completely differently in different
languages. Rust futures are based on polling, while JS promises use a completion
callback. This has interesting implications, both good and bad.</p>
<p>In Rust, its kind of bad that you have to poll all the time, but on the other
hand, if you don’t care about the result anymore, you just stop polling and
the work stops as well. In JS, you can easily say <em>I don’t really care, just do it</em>.
On the other hand, its a lot harder to actually <em>stop</em> doing something that you
already started. Sometimes the office just burns down and you have no idea why.</p>
<p>Coming back to our fairytale, I think the office is more like Rust futures. You
constantly have to bug them and say <em>yes, we still want that stuff to get done</em>,
or they just won’t move a muscle. It’s sad actually :-(</p>
<p>There are also differences on the operating system level. While <code>io_uring</code> is
the new hotness on Linux, previously things were based on polling. Although in a
more optimized way, where you have a whole list of things and ask
<em>is any of these done yet?</em> As far as I know, Windows is based on callbacks.
This is interesting, because a language/runtime has to work in a uniform way
across operating systems, although under the hood it does completely different
things.</p>
<h1 id="tradeoffs"><a class="anchor-link" href="#tradeoffs" aria-label="Anchor link for: tradeoffs">#</a>
Tradeoffs</h1>
<p>It’s interesting to think a bit more about tradeoffs though. Depending on the
work you want to get done, it might be the best idea to just block. When its a
matter of <em>latency</em>, it might be the best idea to just make the trip to the
office once and wait a little. Having to come back the next day might just delay
things unnecessarily. Also, providing phone numbers (callbacks) or installing
letter boxes (queues) may increase the complexity, even polling has complexity,
as the other person may just ask <em>who were you again?</em></p>
<h1 id="networked-services"><a class="anchor-link" href="#networked-services" aria-label="Anchor link for: networked-services">#</a>
Networked Services</h1>
<p>All of these concepts map to networked services as well. The difference might be,
instead of ringing our next door neighbors, we actually have to walk 10 minutes
down the street to the office. The <em>round trip time</em> is a lot longer. But you
can still decide if you want to wait right there, or come back tomorrow. Or
even both. Wait there for a certain amount of time, but then you get bored and
decide to rather check back later.</p>
<hr />
<p>This is exactly how <a href="https://github.com/getsentry/symbolicator">symbolicator</a>
works right now. Or rather, symbolicator is like the office.</p>
<blockquote>
<p>Here, take a number and have a seat. I might have your answer right away,
otherwise check back tomorrow.</p>
</blockquote>
<p>This system is starting to be a problem, and we are looking into ways to
improve it. I tend to favor the queueing solution. Coming back to our story,
when it comes to networked services, the <em>letter box</em> itself is a separate
service, like a post office. You submit your work request to the post office
nearest to you, and they will deliver the request to the post office box nearest
to the office that will serve that request. In that scenario, <code>symbolicator</code>
will just walk to its post office box whenever it has nothing to do, get the job
done and bring the result back to the post office. Kind of like the
<em>submission queue</em> and <em>completion queue</em> from <code>io_uring</code>. No distractions,
although in this case the distractions don’t matter that much. It is more the
concept of <em>take a number and check back</em> that bothers me personally. Overall,
I think the question is rather, do we want to build and manage a new post office,
and have contingency plans in place on what we should be doing when the letter
boxes start piling up and eventually overflow? Also, should we introduce another
service, someone who is responsible for deciding and delivering the messages to
the right letter box?</p>
<hr />
<p>Well thats it for today. I think the main takeaway here is this:</p>
<blockquote>
<p>Public offices are like Rust futures. You have to poll them constantly or
they won’t be doing any work. <em>sadface</em></p>
</blockquote>
PSA: Clearing global debugger properties2020-11-11T00:00:00+00:002020-11-11T00:00:00+00:00
Unknown
https://swatinem.de/blog/windows-debuggers/<h2 id="tldr"><a class="anchor-link" href="#tldr" aria-label="Anchor link for: tldr">#</a>
TLDR</h2>
<p>So your <code>UnhandledExceptionFilter</code> is just not being called? Maybe you program
is being run under a debugger without you even knowing.
Check the <code>HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\</code>
registry key and make sure it does not specify any special treatment for your
<code>.exe</code>.</p>
<h2 id="the-whole-story"><a class="anchor-link" href="#the-whole-story" aria-label="Anchor link for: the-whole-story">#</a>
The whole story</h2>
<p>Working at sentry, one of my responsibilities is to take care of the native SDK.
And yesterday I was in a really strange situation. For unknown reasons, all my
integration tests using the crashpad backend were failing. Admittedly, those
were the only tests I was running since I didn’t touch any other code.
But it was strange and it didn’t make any sense. I tried the <code>master</code> branch,
and it was the same, even though everything was working as expected on CI.</p>
<p>A quick internet search didn’t give me any good ideas at first. But I noticed
that log output was missing from our <code>FirstChanceHandler</code>, which I
<a href="https://github.com/getsentry/crashpad/pull/20">added on Windows</a> in addition
to Linux. Its a piece of code that can still run in-process at the time of a
crash to flush any internal state. It is based around the
<a href="https://docs.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-setunhandledexceptionfilter"><code>SetUnhandledExceptionFilter</code></a>
mechanism, which it seems is the Windows way of handling native crashes in the
process.</p>
<p>This refined my search a bit, and lead me to a SO post highlighting a paragraph
from the official documentation.</p>
<blockquote>
<p>[…] if an exception occurs in a process <strong>that is not being debugged</strong> […]
the exception makes it to the unhandled exception filter […]</p>
</blockquote>
<p>So having an unhandled exception filter, and a debugger is mutually exclusive.
But I was not using a debugger. At this point I was also testing all the other
sentry backends, none of which was working.</p>
<p>And then I remembered, I was running the
<a href="https://docs.microsoft.com/en-us/windows-hardware/drivers/devtest/application-verifier">Application Verifier</a>,
which does not really do anything by itself, but it apparently through some kind
of global magic enables additional checks when running programs in the Visual
Studio Debugger. But I was not doing that in this case, how come my exceptions
still didn’t reach the handler?</p>
<p>Another round of searching got me to a page that mentioned the registry key
<code>HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\</code>
which can be used to attach a debugger automatically right when spawning a
process, and its
<a href="https://docs.microsoft.com/en-us/previous-versions/windows/desktop/xperf/image-file-execution-options">documentation</a>
mentions something along those lines as well.</p>
<p>And sure enough, there was an entry for <code>sentry_example.exe</code>. Just deleting that
registry key solved the whole problem. All my integration tests returned to
running correctly. So it seems all of this is a bit of magic on Windows, and
sadly not really well documented. But now I’m eager to learn more about how all
this works.</p>
Rust 20212020-09-24T00:00:00+00:002020-09-24T00:00:00+00:00
Unknown
https://swatinem.de/blog/rust-2021/<h1 id="tldr"><a class="anchor-link" href="#tldr" aria-label="Anchor link for: tldr">#</a>
TLDR</h1>
<p>’Tis the planning season for Rust 2021 already, and as suggested, I will start
with a very short list of bullet points:</p>
<ul>
<li>As a long-time observer of the Rust project, I would to see some long-time
projects come to an end, to reap the promised benefits ;-)</li>
<li>As an ex-TypeScript/JS developer, I am super excited to see high quality,
source-based code coverage (with branch coverage even) being worked on, and
would like to see this project get the attention it deserves.</li>
<li>Related, as a developer who loves testing and gamification, I want to have
an easier way to get an overview of how my project is doing, and where I can
improve it.</li>
<li>As a maintainer of a mid-sized workspace of Rust crates, I would like to have
better tooling, to deal with versioning, and publishing of my crates, as the
process feels very hacky right now.</li>
<li>As a maintainer of some crates that have a lot of feature flags, I would like
feature flags to be less of a pain and easier to work with, to give me the
confidence that my crate works no matter what feature flags the consumers use.</li>
<li>As a developer who has recently switched to GitHub Actions, I would have
liked that process to be more straight forward, so that I can avoid a lot of
copy&paste, and be sure that my CI builds are as fast as they can be.</li>
</ul>
<h1 id="a-big-thank-you"><a class="anchor-link" href="#a-big-thank-you" aria-label="Anchor link for: a-big-thank-you">#</a>
A Big Thank You!</h1>
<p>I have actually already written about
<a href="https://swatinem.de/blog/doc-driven-development/">my wishlist for Rust tooling</a> a
few month ago, and like always, things have changed, and I have learned a lot.</p>
<p>Let me start off that I am super happy where Rust and the ecosystem are moving.
<a href="https://rust-analyzer.github.io/"><code>rust-analyzer</code></a> has been a major success and is a great productivity booster!
It is also test-driving <a href="https://github.com/rust-lang/chalk"><code>chalk</code></a>, which is supposed to become the new trait
solver in Rust hopefully soon. It is already working amazingly well in my IDE,
as I can get type hints and hover information in cases where the main Rust
compiler still wants me to manually type-annotate things.</p>
<p>Also, another project that I am super excited about and that gets a lot less
publicity than it deserves is precise source-based code coverage support built
into the compiler. I only know about this because I regularly scan through the
list of PRs to the compiler. This definitely needs more marketing around it!
And of course a way to easily use it. From all PRs I have seen, I would have no
idea how to actually use it and maybe even combine it with codecov, etc.</p>
<p>Another great thing that I see was recently stabilized is intra-doc links. I am
super excited about this, and am already starting to use them, as they have been
available on <a href="https://docs.rs">docs.rs</a> for quite some time.</p>
<h1 id="where-rust-needs-to-improve"><a class="anchor-link" href="#where-rust-needs-to-improve" aria-label="Anchor link for: where-rust-needs-to-improve">#</a>
Where Rust needs to improve</h1>
<p>The pain points I have right now are mostly related to tooling, and also related
to having bigger more complex projects, possibly workspaces consisting of
multiple crates.</p>
<p>I think the experience of creating and maintaining a small, simple Rust project
is amazing right now. But things start to get annoying as soon as the project
starts to grow.</p>
<p>I read that a lot of people are still complaining about long compile times. It
is actually a very complex problem, and when looking into more detail, it is
not really the speed of the Rust compiler that is at fault here.
For me, local development, especially with rust-analyzer has been a pleasure.
Things only start to fall apart when a small change means that all the
interdependent crates in my workspace have to be updated, or the problem is the
final linking step that is taking forever. Or the problem is not even my own
machine, but rather CI that is super slow because of tons of possibly duplicated
dependencies, and bad caching.</p>
<h1 id="improving-ci"><a class="anchor-link" href="#improving-ci" aria-label="Anchor link for: improving-ci">#</a>
Improving CI</h1>
<p>Speaking of CI, my team and I have been migrating a few workspaces to GitHub
Actions recently, and while we do like it in general, there are a few rough
edges that should be smoother. Not all are strictly Rust problems, some are
also GHA.</p>
<p>The primary goal of CI in my opinion is to both be as fast as possible, and as
thorough as possible, while also being efficient and easy to use.
Which is the first tradeoff that a developer needs to choose from.
Do I want to fail fast, or rather have <em>all</em> the failures visible?
When we fan out wide, we get a ton of different status reports, which IMO is a
UI concern to begin with; but that also means more overhead.
While the wall-time might be smaller, the machine-time is increased, because the
common steps of checkout, toolchain, etc are run by every job.</p>
<p>Right now, setting up a CI pipeline is too cumbersome. There is too many
decisions a developer has to do, and way too much copy-paste going on in the
CI definition itself. I wish a lot of these things would just be done out of the
box.</p>
<p>One of the big problems for CI right now, especially for big projects is caching.
Caching is not free! If cargo creates a big <code>target</code> folder, persisting and
restoring that cache can already take a considerable amount of CI time.
This problem gets a lot worse because cargo does not auto-clean that cache.
There are some <a href="https://github.com/rust-lang/cargo/issues/5885">open issues</a>
about that, but nothing much has happened in years. A lot of projects have
custom solutions to deal with this, and it does not really scale. Also, should
one disable incremental compilation on CI? I don’t know. I would like if there
would be a simple out of the box solution that covers 90% of the usecases.</p>
<h1 id="workspaces-and-features"><a class="anchor-link" href="#workspaces-and-features" aria-label="Anchor link for: workspaces-and-features">#</a>
Workspaces and Features</h1>
<p>A bit related to CI is features. Lets start off by saying that I dislike having
feature flags in the first place. They are just a pain to work with, and a pain
to test, both locally and on CI. What I would like to see is official support
for something like <code>--feature-powerset</code> of <a href="https://github.com/taiki-e/cargo-hack"><code>cargo-hack</code></a>. Just make sure that
every permutation of features works.</p>
<p>I saw some time ago that cargo is working on something called
<a href="https://github.com/rust-lang/cargo/issues/8088">Features 2.0</a>, which I hope
will solve some of the pain points that I have. But so far I didn’t have the
time or motivation to actually read up on what the proposed changes there are.</p>
<hr />
<p>Another thing that I mentioned in my earlier blog post was some quality of
life improvements related to workspaces. I think as a minimum, I would like to
see an atomic
<a href="https://github.com/rust-lang/cargo/issues/1169"><code>cargo publish --all</code></a> and
maybe a tool to deal with versioning.
As an ex-JS developer, there were a ton of tools available, but unfortunately,
I was not really satisfied with any of them. I have the feeling that the Rust
community as a whole creates a lot more high-quality tools, and I am hopeful
that will also be the case here.</p>
<h1 id="in-closing"><a class="anchor-link" href="#in-closing" aria-label="Anchor link for: in-closing">#</a>
In Closing</h1>
<p>I do feel Rust is headed in the right direction, but there is a lot of polishing
still to do. Not necessarily with the language itself, but with the tools and
ecosystem around it.
And its not really specific to Rust either. In a lot of ecosystems, I feel like
the tasks that are <em>not</em> coding related are often a lot more complex than they
need to be.
Rust, both as a language and as an ecosystem has proved that it can solve
complex problems in a nice and high quality way. I’m looking forward to seeing
these problems solved as well.</p>
Documentation Driven Development2020-05-19T00:00:00+00:002020-05-19T00:00:00+00:00
Unknown
https://swatinem.de/blog/doc-driven-development/<p>I have been working on documenting a growing rust project, and ran into some papercuts along the way. Also, writing good
documentation is actually very hard and sometimes tedious work, but I think it is very much worth it.</p>
<h1 id="why-i-love-doctests"><a class="anchor-link" href="#why-i-love-doctests" aria-label="Anchor link for: why-i-love-doctests">#</a>
Why I love Doctests</h1>
<p>I can’t really stress how important both testing, and documentation is! Testing makes sure that your project meets its
goals and certain quality criteria. And documentation makes sure that your potential users know how to effectively use
your project. Without documentation (and marketing), a project will struggle hard to grow its user base.</p>
<p>However, maintaining good documentation is really hard. And depending on the language ecosystem, it is too easy for your
code and documentation to drift apart and become broken / outdated. That is one reason why I really love the concept of
doctests. On a high level, they are just example code snippets that you want to present to your users. But they also run
as part of your testsuite, just like any other tests. So they can’t drift apart. It is a really good way to give usage
examples of your API, and document corner cases all at the same time.</p>
<h1 id="rusts-unique-position"><a class="anchor-link" href="#rusts-unique-position" aria-label="Anchor link for: rusts-unique-position">#</a>
Rusts Unique Position</h1>
<p>Rust actually is in a very good position here. I originally come from the TypeScript ecosystem, and it is actually a
huge pain that TypeScript suffers a lot from the <em>“new day, new framework”</em> syndrome. There is way too much choice among
linters, build systems / bundlers, testing frameworks, benchmark frameworks, and probably documentation tools, although
I haven’t really used any of those. It is quite tiring to wade through all the options. At least that ecosystem now has
a code formatter that everybody agrees on.</p>
<p>It is really refreshing that Rust is super opinionated in this regard. It ships with a default build system (which also
has workspace support), testing tool, a linter, a benchmarking tool (with support for custom frameworks) and a first
class documentation tool. You get all of this out of the box (or rather, via <code>rustup</code>) and don’t have to worry about any
of this. What a way to boost productivity!</p>
<p>Also, rusts tools are <em>really good</em>, even though they have some shortcomings as we will see.</p>
<h1 id="my-wishlist-for-better-tools"><a class="anchor-link" href="#my-wishlist-for-better-tools" aria-label="Anchor link for: my-wishlist-for-better-tools">#</a>
My Wishlist for better Tools</h1>
<p>But unfortunately, testing and writing doctests is a bit painful right now. But as I see it, those obstacles should be
fairly straight forward to overcome. Here is just a short wishlist of things I would love to see in the Rust ecosystem
that would make my life a bit easier ;-)</p>
<ul>
<li>
<p><a href="https://github.com/rust-lang/cargo/issues/2832">cargo#2832</a> and <a href="https://github.com/rust-lang/cargo/issues/4324">cargo#4324</a>: In general, the output of <code>cargo test</code> is horrible. Especially if you have workspaces,
or a number of examples and integration tests. It is super hard to actually find failures, and the signal to noise
ratio is horrible. I would love if cargo had a better way to put tests into suites, and to better visualize those.</p>
<p>Related, it would be nice to better filter tests, and to re-run only previously failed tests.</p>
</li>
<li>
<p><a href="https://github.com/rust-lang/cargo/issues/6424">cargo#6424</a>: Apparently, <code>cargo check</code> does not check doctests? Oh well. I haven’t run into this so far, but its a
minor oversight.</p>
</li>
<li>
<p><a href="https://github.com/rust-analyzer/rust-analyzer/issues/4170">rust-analyzer#4170</a> and <a href="https://github.com/rust-analyzer/rust-analyzer/issues/571">rust-analyzer#571</a>: My number one productivity win would probably come from being able to
write doctests just like any other code, with syntax highlighting and auto-completion and realtime linting.</p>
</li>
<li>
<p><a href="https://github.com/rust-lang/rustfmt/issues/3348">rustfmt#3348</a>: Similarly, code in doctests should be nicely formatted just like any other code.</p>
</li>
<li>
<p><a href="https://github.com/rust-lang/rust/issues/44732">rust#44732</a>: Especially for crate-level or module-level documentation, I would very much prefer to write that in an
external markdown file. This would also solve all the problems with duplicated, nearly empty, or outdated README
files.</p>
</li>
<li>
<p><a href="https://github.com/rust-lang/rustfmt/issues/2036">rustfmt#2036</a>: A bit related to the point above, I would love to also have auto-formatting and re-wrapping for
markdown. I spend way too much time manually formatting my doc comments.</p>
</li>
<li>
<p><a href="https://github.com/rust-lang/rust/issues/43466">rust#43466</a>: Another small quality of life issue which makes linking to other items easier and avoids the problems of
links going stale when moving or renaming items.</p>
</li>
<li>
<p><a href="https://github.com/rust-lang/rust/issues/45599">rust#45599</a> and <a href="https://github.com/rust-lang/rust/issues/67295">rust#67295</a>: Its a bit broken right now how to use feature-flagged things in doctests. But I found a
workaround I explain in the next section.</p>
</li>
<li>
<p>One pipedream would be to also have a kind of linter for doc comments. Things like length limits for captions,
enforcing formatting rules such as “end captions with a period”. A spell checker would also be nice ;-)</p>
</li>
</ul>
<p>Some of the mentioned issues are already implemented on nightly versions, and are just pending stabilization. Others are
quite big and complex, or seem to stuck in bikeshedding discussions. But one thing that actually sounds quite doable is
to have <code>rustfmt</code> format and re-wrap markdown. I might even look into this if I have too much free time on my hands, or
as something of a hackweek project.</p>
<h1 id="forcing-features-in-tests"><a class="anchor-link" href="#forcing-features-in-tests" aria-label="Anchor link for: forcing-features-in-tests">#</a>
Forcing Features in Tests</h1>
<p>So the crate I am working on right now has quite some feature flags, and I would like to refer to those feature-flagged
items in my doctests. But that breaks the tests when run with <code>--no-default-features</code>, or in general if the features I
want to use are not default.</p>
<p>Also, you can’t <code>#[cfg(test)]</code> those items, since doctests won’t pick those up due to <a href="https://github.com/rust-lang/rust/issues/45599">rust#45599</a> and <a href="https://github.com/rust-lang/rust/issues/67295">rust#67295</a>. But
I have found a rather nice workaround for that. Whereas Rust disallows normal dependency cycles. Defining circular
<code>dev-dependencies</code> actually works fine, which is quite magical. So you can just dev-depend on yourself, with certain
features. The same also works across circular workspace crates.</p>
<pre data-lang="toml" style="background-color:#fafafa;color:#61676c;" class="language-toml "><code class="language-toml" data-lang="toml"><span>[</span><span style="color:#399ee6;">package</span><span>]
</span><span style="color:#399ee6;">name </span><span>= </span><span style="color:#86b300;">"doctests"
</span><span style="color:#399ee6;">version </span><span>= </span><span style="color:#86b300;">"0.1.0"
</span><span style="color:#399ee6;">edition </span><span>= </span><span style="color:#86b300;">"2018"
</span><span>
</span><span>[</span><span style="color:#399ee6;">features</span><span>]
</span><span style="color:#399ee6;">featured </span><span>= []
</span><span>
</span><span>[</span><span style="color:#399ee6;">dev-dependencies</span><span>]
</span><span style="color:#399ee6;">doctests </span><span>= { </span><span style="color:#399ee6;">path </span><span>= </span><span style="color:#86b300;">"."</span><span style="color:#61676ccc;">, </span><span style="color:#399ee6;">features </span><span>= [</span><span style="color:#86b300;">"featured"</span><span>] }
</span></code></pre>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span style="font-style:italic;color:#abb0b6;">/// ```
</span><span style="font-style:italic;color:#abb0b6;">/// doctests::featured();
</span><span style="font-style:italic;color:#abb0b6;">/// ```
</span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">doctest</span><span>() {}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">cfg</span><span>(feature </span><span style="color:#ed9366;">= </span><span style="color:#86b300;">"featured"</span><span>)]
</span><span style="color:#fa6e32;">pub fn </span><span style="color:#f29718;">featured</span><span>() {}
</span><span>
</span><span style="color:#61676ccc;">#</span><span>[</span><span style="color:#f29718;">test</span><span>]
</span><span style="color:#fa6e32;">fn </span><span style="color:#f29718;">unittest</span><span>() {
</span><span> </span><span style="color:#fa6e32;">crate</span><span style="color:#ed9366;">::</span><span>featured()</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
PSA: Deactivate Windows Security for your Source Repository2020-04-17T00:00:00+00:002020-04-17T00:00:00+00:00
Unknown
https://swatinem.de/blog/windows-security/<p>I’m very happy that I have been moving to a somewhat large Rust project
recently ;-) And because of the COVID mandated home-office, I have started
doing a lot more development on my Desktop, which runs Windows for gaming
reasons.</p>
<p>Anyhow. So far I had no reason to complain about compile times, since my
(still, but not for long) oil cooled Ryzen has a lot of cores and power. But I
was noticing that quite a lot of annoying Windows Security notifications have
been popping up when doing compiles, or in general when <code>rust-analyzer</code> was
running in the background. I also noticed that for some reason, the CPU was
also not being utilized fully when doing builds.</p>
<p>So I began researching, and profiling. Running nightly cargo with <code>-Z timings</code>
revealed that especially custom build scripts were taking a very long time.
My research yielded
<a href="https://github.com/rust-lang/cargo/issues/5028">this cargo issue</a> which was
talking about windows defender. There is also an official windows
<a href="https://support.microsoft.com/en-us/help/4028485/windows-10-add-an-exclusion-to-windows-security">guide</a>
on how to disable windows security for a whole directory tree.</p>
<p>I just went ahead and tried doing that.
Disabling Windows Security for my whole coding directory improved my clean
<code>cargo test</code> times of <code>sentry-rust</code> from <strong>3m17s</strong> to <strong>1m22s</strong>. What a jaw
dropping difference. My compile times are cut by more than half <strong>!!!</strong>,
and the annoying notifications are gone as well.</p>
<p>So for anyone on Windows, go ahead and add a security exception, and save
yourself some headaches!</p>
<p>A word of caution though, as I advocate for <em>disabling</em> a security feature.
I think as developers, we are all a bit more cautious than regular people when
it comes to internet security. On the other hand, we download and run tons of
third party software from package archives all the time. And software supply
chain attacks are a real thing, should I remind everyone of the
<a href="https://medium.com/intrinsic/compromised-npm-package-event-stream-d47d08605502"><code>event-stream</code> fiasco</a>?
Nonetheless, I don’t think Windows Security would actually help in such a case,
so I doubt we are worse off disabling it.</p>
Fear, the class keyword, you must not!2020-02-28T00:00:00+00:002020-02-28T00:00:00+00:00
Unknown
https://swatinem.de/blog/moar-ts-optimization/<p>So, my TypeScript patch that I wrote about previously was recently merged.
This gave me quite some motivation to start up the profiler again, and look
for some more wins.</p>
<p>Being able to memory profile JS can be considered a dark art sometimes, as even
the TS team recognized in their
<a href="https://github.com/microsoft/TypeScript/issues/36948">current half-year roadmap</a>:</p>
<blockquote>
<p>much of what we discovered was also that profiling memory allocations in
JavaScript applications is really hard. Infrastructure seems lacking in this
area (e.g. figuring out what is being allocated too much), and much of our
work may reside in learning about more tools</p>
</blockquote>
<p>So I want to take this chance to highlight how I discovered another easy
<a href="https://github.com/microsoft/TypeScript/pull/36845">~2% win</a> just by looking
at the memory profile and squinting really hard.</p>
<h1 id="assumptions"><a class="anchor-link" href="#assumptions" aria-label="Anchor link for: assumptions">#</a>
Assumptions</h1>
<p>Most memory optimizations I did were the result of an informed guess, based on
some assumptions. Followed by experiments to see if my guess was right or not.</p>
<p>That being said, these things are not an exact science, and they might work
very differently on all the different js engines.
But I will be focusing on node/v8 here, since I would guess that is the >95%
case of how typescript is used.</p>
<p>The main thing that I focus on here is a few things that I learned by watching
recorded talks about v8 performance:</p>
<ul>
<li>v8 can <em>inline</em> properties of objects</li>
<li>inlined properties are <em>good</em> for performance, and also memory usage</li>
<li>v8 learns what to inline by observing your code</li>
<li>adding the same properties, with the same types in the same order, as early
as possible is <em>good</em></li>
<li>slapping on random properties in random order is <em>bad</em></li>
<li>v8 will try to group objects together based on their <em>constructor function</em></li>
</ul>
<h1 id="analysing-a-memory-profile"><a class="anchor-link" href="#analysing-a-memory-profile" aria-label="Anchor link for: analysing-a-memory-profile">#</a>
Analysing a Memory Profile</h1>
<p>So I just opened up a memory profile and started looking for anything that
popped into my eyes. Here is what I saw:</p>
<p><img src="https://swatinem.de/blog/moar-ts-optimization/before.png" alt="Memory Profile Before" /></p>
<ol>
<li>There is a really large Array, which has a <em>retained size</em> of <em>24M</em>, which
means garbage collecting it would transitively free up 8% of the memory usage (?)</li>
<li>That Array is assigned to a variable called <code>nodeLinks</code> somewhere</li>
<li>Its elements are all anonymous <code>Object</code>s, so they don’t have a dedicated
constructor.</li>
<li>Each one of those is <em>32 bytes</em>, which is <code>4 * 8</code>, so apart from the v8
internal special properties <code>map</code>, <code>properties</code> and <code>elements</code>, it has just
one inlined property.</li>
<li>Expanding <code>map.descriptors</code> shows us the property descriptors. I didn’t know
that one before. We can also see that different objects have different
properties on them.</li>
</ol>
<p>Now comes a wild speculation on my part, which turned out to be correct in
the end: What if, instead of having anonymous objects, we actually create a
dedicated constructor function? Will that help v8 to make better inlining
decisions?</p>
<h1 id="optimizing-the-code"><a class="anchor-link" href="#optimizing-the-code" aria-label="Anchor link for: optimizing-the-code">#</a>
Optimizing the Code</h1>
<p>So I started looking at the TS codebase, searching for this <code>nodeLinks</code> variable.</p>
<p>The variable comes from <code>checker.ts</code>, which is too large for github to
display so I can’t permalink it, but here it is:</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="font-style:italic;color:#abb0b6;">// definition:
</span><span style="color:#fa6e32;">const </span><span>nodeLinks</span><span style="color:#ed9366;">: </span><span style="color:#399ee6;">NodeLinks</span><span>[] </span><span style="color:#ed9366;">= </span><span>[]</span><span style="color:#61676ccc;">;
</span><span style="font-style:italic;color:#abb0b6;">// usage:
</span><span style="color:#fa6e32;">function </span><span style="color:#f29718;">getNodeLinks</span><span>(</span><span style="color:#ff8f40;">node</span><span style="color:#ed9366;">: </span><span style="color:#399ee6;">Node</span><span>)</span><span style="color:#ed9366;">: </span><span style="color:#399ee6;">NodeLinks </span><span>{
</span><span> </span><span style="color:#fa6e32;">const </span><span>nodeId </span><span style="color:#ed9366;">= </span><span style="color:#f29718;">getNodeId</span><span>(node)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">return </span><span>nodeLinks[nodeId] </span><span style="color:#ed9366;">|| </span><span>(nodeLinks[nodeId] </span><span style="color:#ed9366;">= </span><span>{ flags</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">0 </span><span>} </span><span style="color:#fa6e32;">as </span><span style="color:#399ee6;">NodeLinks</span><span>)</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>You might also want to look at the definition of
<a href="https://github.com/microsoft/TypeScript/blob/f883bf3acbb207dfa1dc134738abcf565f14a835/src/compiler/types.ts#L4250-L4277"><code>NodeLinks</code></a>.</p>
<p>The <code>NodeLinks</code> interface defines <strong>26</strong> properties, of which only <code>flags</code> and
<code>jsxFlags</code> are mandatory, and btw <code>jsxFlags</code> is not defined in the the
<code>getNodeLinks</code> function, so there is null-unsafety right there!</p>
<p>Anyway, lets create a quick constructor function for all the NodeLinks, call
it with <code>new</code> and see if it helped v8 better optimize the object allocations.</p>
<h1 id="verifying-the-optimization"><a class="anchor-link" href="#verifying-the-optimization" aria-label="Anchor link for: verifying-the-optimization">#</a>
Verifying the Optimization</h1>
<p>Since optimizing these cases is actually a lot of guesswork and shots in the
dark, it is very important to also check what the results of the code changes
are. So lets take a look:</p>
<p><img src="https://swatinem.de/blog/moar-ts-optimization/after.png" alt="Memory Profile After" /></p>
<ol>
<li>We now have a new group for all the <code>NodeLinks</code>, and we have <em>~277k</em>
of those, quite a lot actually.</li>
<li>Each one of them is <em>48 bytes</em>, which is <code>6 * 8</code>, so we can have <em>3</em> inlined
properties. On a side note, the whole <code>nodeLinks</code> Array now has a retained
size of <em>17M</em>, vs the <em>24M</em> from before.</li>
<li>Some smaller variants use only inline properties. BTW, the <code>flags</code> prop is
not listed here, I <em>think</em> because it is an integer type, and not a pointer
type.</li>
<li>Some larger variants still have external <code>properties</code> which are bad for both
performance and memory usage.</li>
</ol>
<p>So from this quick look we can see that we were successful. The memory usage
has dropped by a bit, and we how have all the <code>NodeLinks</code> grouped together,
and at least half of them have all of their properties inlined, without the
need for an external <code>properties</code> hashtable.</p>
<h1 id="the-typescript-codebase"><a class="anchor-link" href="#the-typescript-codebase" aria-label="Anchor link for: the-typescript-codebase">#</a>
The TypeScript Codebase</h1>
<p>However, just by looking at the definition of <code>NodeLinks</code>, and at the memory
profile, we can clearly see that there is probably no object that has <em>all</em> the
26 properties it could potentially have, but that they could most likely be
grouped into separate types.
I do see this general code pattern a lot in the TS codebase, the <code>Node</code>
interface that I was previously looking at was another case.
The TS codebase defines a <em>very broad</em> interface type, with tons of optional
properties. Then is uses explicit and apparently very unsafe casts all around
the codebase, and adds properties some time later, in a probably arbitrary
order. Exactly the things you should not do, both from a v8 optimization
standpoint, and a general type-safety standpoint.</p>
<p>Well there is probably a reason for it. The TypeScript codebase is ancient.
I think it has been self-hosting since the very beginning? but it only had
proper union types and <code>strictNullChecks</code> since <em>2.0</em>, and
<code>strictPropertyInitialization</code> since <em>2.7</em>. Also, engines did not have native
<code>class</code> support back then.</p>
<p>So from my perspective, there is quite some technical debt to be paid! I would
very much suggest to embrace union types of native <code>class</code>es with initialized
properties in the codebase.
Or as Yoda would say:</p>
<blockquote>
<p>Fear, the class keyword, you must not!</p>
</blockquote>
<p>In my opinion, it would help engines to better optimize the code, it would
help type safety, and it would also help onboarding new contributors.</p>
<p>Well so long, hope this has helped some people better understand the
memory optimization tools, and how to approach such problems.</p>
Rewrite it in Rust2020-02-20T00:00:00+00:002020-02-20T00:00:00+00:00
Unknown
https://swatinem.de/blog/rewrite-in-rust/<p>Early this year, I managed to mostly move away from JS development into native
code, which in my case means a lot of C/C++, as well as Rust, hopefully more
of that in the future.</p>
<p>Most of what I will write about comes from my experience with <a href="https://github.com/getsentry/sentry-native">sentry-native</a>,
which will soon release a rewritten version in C. That being said, all of the
opinions in this post are my own.</p>
<p>I also want to start this with a quote from <em>Bruce Lee</em>:</p>
<blockquote>
<p>If I tell you I'm good, probably you will say I'm boasting. But if I tell you I'm not good, you'll know I'm lying.</p>
</blockquote>
<h1 id="imposter-syndrome"><a class="anchor-link" href="#imposter-syndrome" aria-label="Anchor link for: imposter-syndrome">#</a>
Imposter Syndrome</h1>
<p>While it has been quite some time since I have actively dealt with C code, I
can up to speed with anything you can throw at me pretty quickly.</p>
<p>I do make quite good progress with my work on <code>sentry-native</code>; my code compiles,
runs and passes tests. But for some reason, I don’t really feel confident in it.
I’m not really sure if the things that I do are really correct, or if it is just
luck that it works. And I constantly have the feeling that I must be missing
something, or that things will probably blow up at some point later.</p>
<p>This is just in my mind though, and a classic example of <em>imposter syndrome</em>.
And surprisingly, I don’t have this when writing Rust. Writing Rust code
really <em>empowers</em> me, in the literal sense that I feel <em>powerful and confident</em>
when writing Rust code. I have the feeling that whatever I do is <em>correct</em>.
Quite remarkable actually.</p>
<h1 id="distractions-and-explicitness"><a class="anchor-link" href="#distractions-and-explicitness" aria-label="Anchor link for: distractions-and-explicitness">#</a>
Distractions and Explicitness</h1>
<p>One reason that I don’t feel very productive with C is that there is a lot of
boilerplate and ceremony around almost everything.</p>
<p>Dealing with allocations, strings, iterables and generics is very tedious.
I sometimes have the feeling that I don’t even see the <em>real</em> application logic
because it is so obfuscated and drowns among all the <code>malloc</code>s, <code>NULL</code>-checks,
manual copying and pointer-chasing.</p>
<p>One of the big distractions is checking for <code>NULL</code> all the time. There is two
issues with this. One is that obviously, these checks do come with a runtime
cost.</p>
<p>The other one is about explicitness. Is returning <code>NULL</code> part of the API
contract, like <code>Option</code> in Rust? Does it actually <em>mean something</em>?
Or is it just cargo-culted boilerplate that people copy-paste, because its
what everyone else is doing?</p>
<p>I actually had to deal with a bug where <code>NULL</code> had special meaning.</p>
<h1 id="infallible-allocation"><a class="anchor-link" href="#infallible-allocation" aria-label="Anchor link for: infallible-allocation">#</a>
Infallible allocation</h1>
<p>Most of the checks however are just unnecessary boilerplate in my opinion. And
this boilerplate multiplies btw.
Say you have <em>3</em> allocations in a functions. When you assume that they can fail,
you would have to make sure to <code>free</code> the ones <em>before</em> the failing one, right?</p>
<p>And what do you want to do anyway? Just return <code>NULL</code> from your function?
What if the function has a different return function? Will it silently fail?
Can you actually recover from a possible allocation failure? Your program needs
memory to do its job. If it doesn’t, it can’t do its job, and it might be the
best idea to just crash hard.</p>
<p>The other question is, will you ever get a <code>NULL</code> from <code>malloc</code> anyway? I have
read some quite good blog posts about this topic in the past, but don’t have
any links handy.</p>
<p>In any case. Nowadays, most software you run will be 64-bit, which means that
virtual adress space is practically unlimited. And most systems, even
smartphones have a lot of physical memory. A lot more than a typical program
should allocate. If it does, it is very likely that it has some leaks anyway.</p>
<p>And it is not only about your own program. It is kind of the behavior of the OS.
Some time ago, there was <a href="https://lkml.org/lkml/2019/8/4/15">post</a> about linux
behaving horribly under low memory conditions, which I have also experienced
some time.
Your system will <strong>stall hard</strong>, up to the point of requiring you to
power-cycle, long before your programm will get a <code>NULL</code> from <code>malloc</code>.</p>
<p>There has really been said enough about this, but C developers still cargo-cult
these <code>NULL</code>-checks everywhere.
Rust allocations are infallible. If they <em>do</em> fail, I think it raises a <code>panic</code>,
which you can decide to recover from, or not. Anyway, from a developer point of
view, the code looks a lot cleaner! You can actually start to see the business
logic underneath all the boilerplate.</p>
<p>Oh, and the use of <code>Option</code> makes intentions very clear, which brings me to the
next point.</p>
<h1 id="documentation"><a class="anchor-link" href="#documentation" aria-label="Anchor link for: documentation">#</a>
Documentation</h1>
<p>This has been praised a lot, and for good reason. The Rust documentation is
excellent! The format is awesome, and most of the docs have examples, which
thanks to doctests will also never be out of sync.
When looking for C docs, there are a ton of different websites, and most of them
are just horrible.</p>
<p>Rustdoc itself is awesome, but the whole
<a href="https://www.rust-lang.org/learn">spectrum of rust documentation</a> is a delight!</p>
<h1 id="ownership-and-mutability"><a class="anchor-link" href="#ownership-and-mutability" aria-label="Anchor link for: ownership-and-mutability">#</a>
Ownership and Mutability</h1>
<p>Speaking of Documentation and Memory-management.</p>
<p>The ownership model of Rust actually makes so much sense! Working with C code,
I often don’t know who is responsible for freeing some memory. And I would
guess that there is a lot of unnecessary copying going on because of that. And
not to mention memory leaks. Sure, you can also leak memory in Rust, but its
a lot harder!</p>
<p>One kind-of way to guess this in C is the <code>const</code> keyword. If a function returns
something <code>const</code>, it usually means that ownership is not transfered. But the
other way around, ownership and mutability is something completely different.
Maybe I return something that is <em>mutable</em>, but must not be freed!</p>
<h1 id="strings"><a class="anchor-link" href="#strings" aria-label="Anchor link for: strings">#</a>
Strings</h1>
<p>Another thing that deserves a lot of praise is the Rust <code>&str</code> type, which
really is just a <code>&[u8]</code> slice, which is guaranteed to be valid utf8, which is
a really awesome guarantee to have! For interfacing with the OS, there is
<code>OsStr</code>, with appropriate conversion functions. I had to touch a bit of
OS-specific string code in C recently, and it was horrible.</p>
<p>But the real power actually lies in the way that strings in Rust are
represented as slices, as a pair of <code>(pointer, length)</code>, whereas strings in
C need to be <code>\0</code> terminated. This makes Rust strings <em>a lot</em> more efficient.</p>
<p>In Rust, you can trivially get a sub-slice of the string, whereas in C, you
have to copy the sub-slice, and <code>\0</code>-terminate it.
To actually make a copy, you will also need the length of the string, which is
a <code>O(N)</code> operation in C, but <code>O(1)</code> in Rust.</p>
<p>Apart from this, the <code>&str</code> API of Rust is <em>very</em> rich! I miss <code>.lines()</code> and
<code>.ends_with()</code> so much!</p>
<p>On the other hand, I also made the experience that Rust strings are not as easy
to deal with than for example JS. But now I think that maybe the way that I
index into, and slice my JS strings is actually unsafe, considering unicode
outside the ASCII range.</p>
<h1 id="api-and-abi"><a class="anchor-link" href="#api-and-abi" aria-label="Anchor link for: api-and-abi">#</a>
API and ABI</h1>
<p>Now that I have touched a bit on both memory allocation, and having to copy
a lot when working with C, one way Rust avoids this is by better dealing with
<em>value-types</em> and <em>reference-types</em>. In Rust, you can more easily return
structs from functions, and move them into functions via arguments. Those will
live on the stack and don’t require allocation, which makes it more efficient
than in C. Most of the time though you will deal with references, as in C. And
from a coding perspective, there is no difference, whereas in C, you will have
to learn the difference between <code>-></code> and <code>.</code>, which makes refactoring more
annoying in some places.
One of the reasons C has to allocate and return pointers in a lot of places is
that there is no other way to make a struct <em>opaque</em>, hiding its members, and
also making it extensible.
In C, you can either expose your structs, making them public API and requiring
breaking changes when touching them, or you use opaque pointers, which require
allocation.</p>
<p>Rust decouples API and ABI, and really Rust has no stable ABI at all. This means
that you can hide details of a struct, change its size without requiring major
version bumps, and still have the advantages of stack allocation.</p>
<p>Speaking of stack allocation. I actually ran into uninitialized memory issues
with structs on the stack already a couple of times. Very annoying, and for
some reason, the compiler didn’t warn me of those.</p>
<h1 id="generics-and-traits"><a class="anchor-link" href="#generics-and-traits" aria-label="Anchor link for: generics-and-traits">#</a>
Generics and Traits</h1>
<p>Another thing that came to my mind is that Rusts Traits, Iterators and Generics
make it super easy to deal with streaming data, which can further improve
performance, and avoid a ton of intermediate allocations.</p>
<p>I am actually considering to re-implement something like <code>Write</code> in C, which
would abstract away serializing data either into an in-memory buffer, onto disk,
or right onto the network, without having to allocate a lot of intermediate
buffers. But I already know that the C-version can never be as fast as Rust, as
it would likely involve dynamic function calls, whereas Rust can just
specialize and inline everything.</p>
<h1 id="dependency-management"><a class="anchor-link" href="#dependency-management" aria-label="Anchor link for: dependency-management">#</a>
Dependency Management</h1>
<p>A bit related to ABI is also the question of <em>static</em> vs <em>dynamic</em> linking.
Rust does not really do dynamic linking (or does it?)</p>
<p>There are some technical differences between static and dynamic linking. Dynamic
linking can better namespace things, and also share both memory, and disk space
among programs. But seriously, in a world where our phones run Java, our
Desktops run JavaScript, and the Cloud does heavy sandboxing and
containerization, we are way past caring about memory usage.</p>
<p>Static linking has some performance advantages, with link-time-optimization and
dead-code-elimination. And Rust has a good story on symbol mangling, avoiding
some of the pitfalls of static linking. And since it has no stable ABI anyway,
it pretty much can’t do dynamic linking anyway.</p>
<p>Anyhow, I recently asked colleagues about this, before I realized that I wanted
their opinion on something completely different. I was actually refering to
vendoring dependencies vs relying on OS provided libraries.</p>
<p>One of the only times I had problems compiling an older (unmaintained) rust
app was because of <code>openssl-sys</code>, which was trying to compile and link against
my OS provided version. Which got out of sync, prevented the already compiled
version of my app from starting, and made it impossible for me to actually
re-compile.</p>
<p>This is not a new problem either. There is a lot of talk about vendoring
dependencies. That way you are independent of the libraries and the versions
thereof, that your OS provides.
As always, there are tradeoffs. It might be a good idea that the Distribution
can update system libraries, to patch vulnerabilities, in case you don’t update
your own vendored version. On the other hand, this limits the version of a
library you can use, and also requiring your users to have that certain
library installed in the first place.</p>
<p>Having to deal with such things in C again is a real throwback, and I would
love to just be able to consume whatever version of a dependency that I want,
and have it statically link and just work, no matter where I copy my resulting
binary. This is true portability and <em>“run everywhere”</em>.</p>
<h1 id="building"><a class="anchor-link" href="#building" aria-label="Anchor link for: building">#</a>
Building</h1>
<p>Speaking of portability and dependencies. Rust has a really awesome story around
cross compilation. And the way it does <code>feature</code>-flags and platform specific
conditional code is awesome! This is just so much better than having tons of
inconsistent, platform and compiler specific define flags.</p>
<p>Oh, and it has a standard module system! And <code>cargo</code>!!!</p>
<p>Having dealt with <code>CMake</code> for the past week, I really can’t understand how it
has ever gained such popularity. The configuration syntax is horrific!
It is case-insensitive, functions have space separated, optional and variadic
arguments. Strings don’t need to be quoted unless you want to use certain
special chars (which ones?). And there is no clear distinction between plain
strings, and lists, at least not that I can tell. It has a global namespace of
artifacts, with frequent name clashes, and it is absolutely not obvious to me
how variables are scoped when you are dealing with multiple files. But at least
I have figured out that it is a good idea to set target-specific flags. Which
is not really obvious in the first place. Oh, and have I mentioned that the
documentation is also horrible.</p>
<p>How to best consume and integrate with external (vendored) dependencies is also
absolutely not obvious.</p>
<p>Since I had to look at build systems again, I want to quote from the
<a href="https://mesonbuild.com/">meson docs</a>:</p>
<blockquote>
<p>every moment a developer spends writing or debugging build definitions is a
second wasted.</p>
</blockquote>
<p>I am so happy that Rust has <code>cargo</code> and <code>crates</code>. It is so refreshing to work
with! Things just work as they should, and as you would expect them to.</p>
<h1 id="the-paradox-of-choice"><a class="anchor-link" href="#the-paradox-of-choice" aria-label="Anchor link for: the-paradox-of-choice">#</a>
The paradox of choice</h1>
<p>Building C code is very much non-trivial, which explains the plethora of tools
that exist out there. Not to mention that almost every project I know of has
its own way of building, its own way of dealing with feature flags, etc.</p>
<p>While choice and competition are certainly a good thing to have, and to allow.
Too much can lead to fragmentation, and is quite frankly overwhelming.</p>
<p>Rust on the other hand has one <em>clear and obvious</em> way of doing things. But it
still offers the possibility to extend this if necessary.</p>
<p>Rust has one way of building things. It has one way of configuring your builds.
It has one way of documenting things. It has one way of doing testing. Of
doing benchmarks. Etc, etc.</p>
<p>And these are very <em>good</em> choices as well. IMO, it is not the case of Rust being
too young to have fragmented. I have the impression that the things just work.</p>
<p>Less time spent dealing with all that, more time to actually getting stuff done.</p>
<h1 id="onboarding-and-confidence"><a class="anchor-link" href="#onboarding-and-confidence" aria-label="Anchor link for: onboarding-and-confidence">#</a>
Onboarding and Confidence</h1>
<p>Coming full circle to the beginning. One thing that people criticize about Rust
is its learning curve. Well yes, Rust takes some time to learn. But I think
that investment provides a great return. As I said in my #rust2020 post, I do
think learning Rust makes you a better developer. And most of the time, when
there is no obvious easy solution to a problem, Rust kind-of leads the way to
a better and more correct solution. Hard things are still hard.</p>
<p>But once you have learned Rust, it is so much easier to get started and
anboarded to a bigger project, and feel productive very quickly. This is
important!</p>
<h1 id="conclusion"><a class="anchor-link" href="#conclusion" aria-label="Anchor link for: conclusion">#</a>
Conclusion</h1>
<p>In my short time being a C developer again, I have seen already seen logic
errors, threading problems, memory unsafety problems, and just plain
inefficient code, which could all have been avoided by using Rust. And some of
that code has been written by engineers far better than me. So much for the
argument that smart engineers don’t make mistakes.</p>
<p>And yes, I would love to rewrite everything is Rust, <em>just because</em>!
I am also very much in favor of a completely
<a href="https://github.com/rust-lang/rfcs/issues/2610"><code>libc</code>-free Rust</a>!
Where we have completely self-contained binaries which do their own syscalls
with their only dependency being a specific kernel version. I have too little
knowledge about how this would look like on other platforms than linux, tbh.
This could be a true <em>cross-compile once, run everywhere</em> language.</p>
<p>Especially this cross-compiling, and the good things that I have heard about
<code>cbindgen</code> make me wish that I could just ship pre-built static and dynamic
libraries for all the platforms for users who don’t want to deal with compiling
rust themselves, instead of having to deal with building C on all kinds of
systems and compilers.</p>
<p>There is just so many good things to say about Rust! I didn’t even mention
things like enums, pattern-matching and the fact that it has integer types that
make sense (what is an <code>unsigned long long int</code> anyway?)!</p>
Improving your JS Tooling2020-02-06T00:00:00+00:002020-02-06T00:00:00+00:00
Unknown
https://swatinem.de/blog/js-tooling/<p>So, I have recently given a talk at the local [viennajs] meetup, which was a
huge success apparently. I was basically talking about and live demoing my
improvements to the <code>tsc</code> compiler which I wrote about
<a href="../optimizing-tsc">on my blog already</a>.
(Side note: the <a href="https://github.com/microsoft/TypeScript/pull/33431">PR</a> is still
not merged <em>sigh</em>)</p>
<p>Anyway. Following my talk, <a href="https://www.michaelbromley.co.uk/">Michael Bromley</a>
of <a href="https://www.vendure.io/">vendure.io</a> actually went ahead and tried some of
my suggestions on his codebase, achieving
<a href="https://twitter.com/michlbrmly/status/1222920727172669446">massive wins</a>!
BTW, this is the best feedback anyone can give me! To see that I can have this
positive effect on people and their projects makes me super happy :-)</p>
<p>I have then gone ahead and looked in more detail into their testing setup and
<a href="https://github.com/vendure-ecommerce/vendure/commit/3ebf6de1498b7ce887bd65def9a8dd18df44fc55#r37050063">suggested further improvements</a>,
which have also propagated to
<a href="https://github.com/getsentry/sentry/pull/16837">other projects</a>.
(As a side note, both Priscila and me have recently started at Sentry ;-)</p>
<hr />
<p>So in writing this post, I want to give more details and explanations on why
such simple things can have such profound impact, and also give further
suggestions on how to speed things up.
Because, to be honest, I am quite surprised how people can just put up with their
workflows being so slow.</p>
<h1 id="explaining-skiplibcheck"><a class="anchor-link" href="#explaining-skiplibcheck" aria-label="Anchor link for: explaining-skiplibcheck">#</a>
Explaining <code>skipLibCheck</code></h1>
<p>The massive improvements mentioned above more or less boil down to using
<code>skipLibCheck</code>. To understand why, we need to understand how typescript treats
the files it loads.
There are basically 4 groups of files,
here is an excerpt from <code>tsc --listFiles</code> when run on my own <a href="https://github.com/Swatinem/rollup-plugin-dts">rollup-plugin-dts</a>:</p>
<ol>
<li><code>./node_modules/typescript/lib/lib.es2020.d.ts</code>:
This and all other files in this directory are the type definitions of the
JS built in types themselves. You can select the javascript edition you want
<em>write your project in</em> via the <code>tsconfig/lib</code> setting.</li>
<li><code>./node_modules/rollup/dist/rollup.d.ts</code>:
This is an example of the type definitions of an external library. <code>rollup</code>
ships with its own type definitions, but very often, you will consume
definitions via <code>@types/XXX</code>, for example <code>@types/node</code>, which you will very
likely use.</li>
<li><code>./.build/index.d.ts</code>:
You might also author <code>.d.ts</code> definition files manually, for example if you
want to type external libraries that do not have any type definitions. Or if
you want to augment global types, and for various reasons.</li>
<li><code>./src/index.ts</code>:
Well these are your source files. Nuff Said.</li>
</ol>
<p>Now, typescript comes with two confusingly named settings, <code>skipDefaultLibCheck</code>
and <code>skipLibCheck</code>.</p>
<p>By default, typescript will parse, and typecheck <strong>all</strong> of the files. When
using <code>skipDefaultLibCheck</code>, it will only parse, but not typecheck files from
category 1, the type definitions that are bundled with typescript itself. In
99.99% of the cases, you would want to use this setting in your project, since
you will never ever touch those files yourself, or ever mess with definitions of
js builtins. The typescript team is even
<a href="https://github.com/microsoft/TypeScript/issues/25658">thinking about</a> making
it the default.</p>
<p>The other setting <code>skipLibCheck</code> will skip all <code>.d.ts</code> files, so categories 1, 2
and 3 in the list above. This is fine to use if you don’t write <code>.d.ts</code> files
yourself, which I would say is very likely. Most of the time, you can trust
the authors of your external dependencies to have done a proper job.
Or can you? :-D
Anyway, it is very unlikely that you will either mess with, or conflict with
any of the definitions that come from <code>node_modules</code>. And setting this flag can
considerably speed up your type checking times.</p>
<h1 id="optimizing-for-iteration-speed"><a class="anchor-link" href="#optimizing-for-iteration-speed" aria-label="Anchor link for: optimizing-for-iteration-speed">#</a>
Optimizing for Iteration Speed</h1>
<p>To shine some more light on the massive saving that Michael was able to achieve,
we have to understand the tools that are very common in JS-land, and how they
use typescript in the background.</p>
<p>And what everyone has to decide for themselves is what you want to optimize for.
I personally want to optimize for development and iteration speed on the one
hand, and for correctness and efficiency on the other. As always, it is a matter
of tradeoffs.</p>
<p>But lets look at common tools and workflows first. I would argue there are more
or less these tasks that you want to do:</p>
<ul>
<li>run tests, for example via <a href="https://jestjs.io/">jest</a> and <a href="https://kulshekhar.github.io/ts-jest/">ts-jest</a>.</li>
<li>lint your code, for example via <a href="https://eslint.org/">eslint</a> and <a href="https://github.com/typescript-eslint/typescript-eslint">typescript-eslint</a>.</li>
<li>build / bundle / package your app, for example via <a href="https://rollupjs.org/">rollup</a> and <a href="https://github.com/rollup/plugins/tree/master/packages/typescript">rollup-plugin-typescript</a> or <a href="https://webpack.js.org/">webpack</a> and <a href="https://github.com/TypeStrong/ts-loader">ts-loader</a>.</li>
<li>and make sure everything is correct :-)</li>
</ul>
<p>Wow, this is a long list of tools, and unfortunately, the default settings they
ship with are not optimal, as I think all of these tools run typescript in full
typechecking mode by default.</p>
<p>Lets show this visually because it is easier to understand:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span> typescript eslint jest webpack
</span><span>┌────────────┐ ┌──────┬───────┐ ┌──────┬───────┐ ┌──────┬────────┐
</span><span>│typechecking│ │typeck│linting│ │typeck│testing│ │typeck│bundling│
</span><span>└────────────┘ └──────┴───────┘ └──────┴───────┘ └──────┴────────┘
</span></code></pre>
<p>In my beautiful unicode-art, you see that it is very inefficient if each one of
these tools does its own typechecking. Especially if the typechecking is super
slow. I even had to truncate the label to not overflow the diagram :-D</p>
<p>Now, for <code>eslint</code>, doing a typecheck is actually mandatory, since a lot of
rules actually use the typechecking information to function. But for jest and
webpack, this is both wasted time, and a distraction.</p>
<p>In my opinion, iteration speed is the major selling point of js. I want to
quickly try and validate some idea, and then polish it up later.
<code>jest</code> and <code>webpack</code> are tools that help me iterate, while <code>eslint</code> will help
me polish. I therefore have quite some beef with the default behavior of
<code>ts-jest</code> and also with things like <a href="https://github.com/TypeStrong/fork-ts-checker-webpack-plugin">fork-ts-checker</a>. They are just slowing me
down.</p>
<p>And yes, I do use the default <code>ts-jest</code> behavior on two projects that I
maintain, <a href="https://github.com/Swatinem/rollup-plugin-dts">rollup-plugin-dts</a> and <a href="https://github.com/eversport/intl-codegen">intl-codegen</a>, purely because of convenience
and because the projects are <em>small</em>. But even then <code>ts-jest</code> is very frequently
annoying me. I want to iterate over my tests quickly dammit, not having to
perfect my typings! And for webpack, all the errors scrolling through the
console are just distracting noise.</p>
<p>With eslint, it will most likely run together with my editor anyway, so I will
get auto-fix on save and live highlights. But I know I can ignore those for now
if all I want to do is iterate quickly.</p>
<hr />
<p>So to summarize this point. My suggestion to other projects, especially big
ones, is to optimize your workflow for quick iteration on the one hand, for
example by disabling the slow, redundant and distracting typechecks in your
test runner and bundler. And rather enforce strict typechecks when linting.
Essentially configure <code>ts-jest</code> and whatever bundler you use to <em>transpile only</em>.</p>
<h1 id="thinking-in-compute-time"><a class="anchor-link" href="#thinking-in-compute-time" aria-label="Anchor link for: thinking-in-compute-time">#</a>
Thinking in Compute Time</h1>
<p>The above diagram showed very clearly the redundant and often very slow
typechecking pass that certain tools do by default.
Another thing that developers should be more aware of is
<a href="https://en.wikipedia.org/wiki/Amdahl%27s_law">Amdahl’s law</a>. What this says is
that there are limits to speedings things up when parallelizing.</p>
<p>Lets think of an example. Say your whole testsuite runs for <em>4 minutes</em>, of which
<em>1 minute</em> is spent typechecking, which is not parallelizable. So no matter how
much cores or machines you throw at it, it will never be faster than <em>1 minute</em>.
For example, we can split up our testsuite equally into <em>3 parts</em>. But each of
those runs a typecheck. We end up with a total runtime of <em>2 minutes</em>, but in
reality we actually used up <em><code>3 * 2 = 6</code> minutes</em> of compute time. Meh!</p>
<p>Also, parallelism has some overhead itself. Especially in JS land!
For the two <em>small</em> projects that I maintain, I use <code>jest</code>s <code>--runInBand</code> flag,
to prevent its default behavior of spawning multiple workers, since all that
parallelism overhead actually <em>slows things down</em>!</p>
<p>Of course, this very much depends on the size of your project and testsuite, but
give it a try!</p>
<h1 id="bundling"><a class="anchor-link" href="#bundling" aria-label="Anchor link for: bundling">#</a>
Bundling</h1>
<p>Now to another topic which I am very opinionated about. Mostly because I am a
big fan, advocate and contributor to <a href="https://rollupjs.org/">rollup</a>. This small digression is prompted
by a project at my new job that I worked on this past week. Please note that
this here is my personal opinion only, yadda yadda yadda.</p>
<p>So, my work assignment made me aware of
<a href="https://facebook.github.io/metro/">metro bundler</a>, which is yet another js
bundler, as if we didn’t have enough of those already.</p>
<p>I was creating a very small testcase, similar to this:</p>
<pre data-lang="js" style="background-color:#fafafa;color:#61676c;" class="language-js "><code class="language-js" data-lang="js"><span style="font-style:italic;color:#abb0b6;">// module.js
</span><span style="color:#fa6e32;">export function </span><span style="color:#f29718;">foo</span><span>() {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">console</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">log</span><span>(</span><span style="color:#86b300;">"foo"</span><span>)</span><span style="color:#61676ccc;">;
</span><span>}
</span><span style="font-style:italic;color:#abb0b6;">// index.js
</span><span style="color:#fa6e32;">import </span><span>{ foo } </span><span style="color:#fa6e32;">from </span><span style="color:#86b300;">"./module"</span><span style="color:#61676ccc;">;
</span><span style="color:#f29718;">foo</span><span>()</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>And I was expecting to get fairly simple code as a result. Like rollup gives
you. You can <a href="https://rollupjs.org/repl/?version=1.31.0&shareable=JTdCJTIybW9kdWxlcyUyMiUzQSU1QiU3QiUyMm5hbWUlMjIlM0ElMjJtYWluLmpzJTIyJTJDJTIyY29kZSUyMiUzQSUyMmltcG9ydCUyMCU3QiUyMGZvbyUyMCU3RCUyMGZyb20lMjAlNUMlMjIuJTJGbW9kdWxlLmpzJTVDJTIyJTVDbmZvbygpJTNCJTIyJTJDJTIyaXNFbnRyeSUyMiUzQXRydWUlN0QlMkMlN0IlMjJuYW1lJTIyJTNBJTIybW9kdWxlLmpzJTIyJTJDJTIyY29kZSUyMiUzQSUyMmV4cG9ydCUyMGZ1bmN0aW9uJTIwZm9vKCklMjAlN0IlNUNuJTVDdGNvbnNvbGUubG9nKCU1QyUyMmZvbyU1QyUyMiklM0IlNUNuJTdEJTIyJTJDJTIyaXNFbnRyeSUyMiUzQWZhbHNlJTdEJTVEJTJDJTIyb3B0aW9ucyUyMiUzQSU3QiUyMmZvcm1hdCUyMiUzQSUyMmVzbSUyMiUyQyUyMm5hbWUlMjIlM0ElMjJteUJ1bmRsZSUyMiUyQyUyMmFtZCUyMiUzQSU3QiUyMmlkJTIyJTNBJTIyJTIyJTdEJTJDJTIyZ2xvYmFscyUyMiUzQSU3QiU3RCU3RCUyQyUyMmV4YW1wbGUlMjIlM0FudWxsJTdE">check the repl</a>
for the output.</p>
<p>See, the native module syntax is <em>static</em>, which means it only has instructions
for either the native js engine, or a bundler like rollup to know how certain
modules interact with each other. All these statements do not actually
<em>execute any code at runtime</em>. That is why rollup can actually remove all of.
Even without any kind of dead code elimination, bundling should <strong>always</strong> make
your bundle smaller than the sum of your source files, because it should be
able to remove all of the boilerplate module code.</p>
<p>Well, at least that is what <em>should</em> happen. Or what I would expect to happen,
living in the year 2020, 5 years after the module syntax was specified.</p>
<p>But, alas, running this example through metro gave me a <strong>35k</strong> bundle!
It was wrapping every module in a function scope, replacing static module syntax
with <code>require</code> calls executed at runtime, and throwing in some polyfills for
good measure.</p>
<p><strong>WHYYYYYYY?¿?¿?¿?</strong></p>
<p>Please tell me I have done something wrong! This can’t be true! It’s 2020 ffs!</p>
<p>Well, there is one code pattern however that rollup can’t really deal with that well:</p>
<pre data-lang="js" style="background-color:#fafafa;color:#61676c;" class="language-js "><code class="language-js" data-lang="js"><span style="color:#fa6e32;">if </span><span>(process</span><span style="color:#ed9366;">.</span><span>env</span><span style="color:#ed9366;">.</span><span>NODE_ENV </span><span style="color:#ed9366;">=== </span><span style="color:#86b300;">"production"</span><span>) {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">module</span><span style="color:#ed9366;">.</span><span style="font-style:italic;color:#55b4d4;">exports </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">require</span><span>(</span><span style="color:#86b300;">"./production.min.js"</span><span>)</span><span style="color:#61676ccc;">;
</span><span>} </span><span style="color:#fa6e32;">else </span><span>{
</span><span> </span><span style="font-style:italic;color:#55b4d4;">module</span><span style="color:#ed9366;">.</span><span style="font-style:italic;color:#55b4d4;">exports </span><span style="color:#ed9366;">= </span><span style="color:#f07171;">require</span><span>(</span><span style="color:#86b300;">"./development.js"</span><span>)</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>There are some tricks to make things like this work, but there are some things
that you can’t handle with static <em>synchronous</em> module syntax. Sadface!</p>
<hr />
<p>Long story short: Give rollup a try! It is amazing! Especially when publishing
a library to npm, bundling via rollup is a must! And when you also ship type
definitions, give <a href="https://github.com/Swatinem/rollup-plugin-dts">rollup-plugin-dts</a> a try as well ;-)</p>
<h1 id="conclusion"><a class="anchor-link" href="#conclusion" aria-label="Anchor link for: conclusion">#</a>
Conclusion</h1>
<p>Well, this got a bit longer than expected.</p>
<p>What I wanted to do is raise awareness about how your favorite tools work, and
how you can configure them in a way that might better suit your workflows and
save you time and money. And to also advocate to bundle your libraries and
types ;-)</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>
</span></code></pre>
Optimizing TypeScript Memory Usage2020-01-08T00:00:00+00:002020-01-08T00:00:00+00:00
Unknown
https://swatinem.de/blog/optimizing-tsc/<h2 id="update-2020-02-28"><a class="anchor-link" href="#update-2020-02-28" aria-label="Anchor link for: update-2020-02-28">#</a>
Update (2020-02-28)</h2>
<p>The recording of my talk is online:</p>
<div class="video" >
<iframe
src="https://www.youtube-nocookie.com/embed/455MyKMwXGY"
webkitallowfullscreen
mozallowfullscreen
allowfullscreen
></iframe>
</div>
<p>Also, my PR was merged a few week ago!
Motivated by that, I created
<a href="https://github.com/microsoft/TypeScript/pull/36844">two</a>
<a href="https://github.com/microsoft/TypeScript/pull/36845">followups</a>,
and I plan to write another post on one of those, so watch closely.</p>
<hr />
<p>Since quite some time, I have been completely sold on TypeScript, though the
typechecker itself can be very slow sometimes.</p>
<p>At my previous job, we had a <em>huge</em> TS project, with more than <strong>4.000</strong> files,
which took roughly <em>30 seconds</em> to check. But the worst problem was, that it
was running very close to the node memory limit, and would frequently just go
out of memory.</p>
<p>That problem was even worse when running <code>tsserver</code> in combination with <code>tslint</code>,
which would crash due to OOM every few minutes, as I already wrote about
<a href="../dx-challenges/">in a previous post</a>.
Well, since one of the more recent VSCode updates, it is possible to
<a href="https://code.visualstudio.com/updates/v1_40#_typescripttsservermaxtsservermemory">increase the memory limit</a>
of <code>tsserver</code>, which would have saved my life back then.</p>
<p>At some point, all this got too unbearable, and I started profiling and looking
deeper into how things worked. In the end, I was able to save up to <strong>6~8%</strong> of
memory usage with a very trivial change.</p>
<p>Let me take you on a journey of what I did to achieve these improvements.</p>
<h2 id="creating-a-reduced-testcase"><a class="anchor-link" href="#creating-a-reduced-testcase" aria-label="Anchor link for: creating-a-reduced-testcase">#</a>
Creating a reduced testcase</h2>
<p>Demonstrating this with a <em>4.000</em> file project is not really feasible, but
luckily, we can reduce this to a <strong>very</strong> simple testcase.</p>
<pre data-lang="sh" style="background-color:#fafafa;color:#61676c;" class="language-sh "><code class="language-sh" data-lang="sh"><span style="color:#ed9366;">></span><span> npm </span><span style="color:#f29718;">i</span><span> typescript @types/node
</span></code></pre>
<p>Throughout this post, I will be using versions
<strong>typescript@3.7.4</strong> and <strong>@types/node@13.1.4</strong>, the most recent versions at
the time. My <code>tsconfig.json</code> looks like this:</p>
<pre data-lang="json" style="background-color:#fafafa;color:#61676c;" class="language-json "><code class="language-json" data-lang="json"><span>{
</span><span> </span><span style="color:#86b300;">"compilerOptions"</span><span style="color:#61676ccc;">: </span><span>{
</span><span> </span><span style="color:#86b300;">"diagnostics"</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">true</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#86b300;">"noEmit"</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">true</span><span style="color:#61676ccc;">,
</span><span>
</span><span> </span><span style="color:#86b300;">"strict"</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">true</span><span style="color:#61676ccc;">,
</span><span>
</span><span> </span><span style="color:#86b300;">"target"</span><span style="color:#61676ccc;">: </span><span style="color:#86b300;">"ES2020"</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#86b300;">"lib"</span><span style="color:#61676ccc;">: </span><span>[</span><span style="color:#86b300;">"ESNext"</span><span>]</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#86b300;">"moduleResolution"</span><span style="color:#61676ccc;">: </span><span style="color:#86b300;">"Node"</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#86b300;">"module"</span><span style="color:#61676ccc;">: </span><span style="color:#86b300;">"ESNext"
</span><span> }
</span><span>}
</span></code></pre>
<p>Very basic stuff. Using the latest lib version and target, with node modules, and
without generating any emit output.</p>
<p>The <code>diagnostics</code> option is the same as if you would use it on the command line
with <code>tsc --diagnostics</code>, just a convenient shortcut, because I always find the
infos useful.</p>
<p>And then just create an empty file:</p>
<pre data-lang="sh" style="background-color:#fafafa;color:#61676c;" class="language-sh "><code class="language-sh" data-lang="sh"><span style="color:#ed9366;">></span><span> touch </span><span style="color:#f29718;">index.ts
</span></code></pre>
<p>Running <code>tsc</code> now gives us some (abbreviated) output:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>> tsc
</span><span>Files: 82
</span><span>Lines: 22223
</span><span>Memory used: 61029K
</span><span>Total time: 1.28s
</span></code></pre>
<p>You can use the command line option <code>tsc --listFiles</code> to find out what those
<code>82</code> files are. Hint: it is just all the ts internal <code>lib</code> files,
plus all of <code>@types/node</code>.</p>
<p>Ok, so far this is not really interesting, lets extend our testcase a little bit:</p>
<pre data-lang="sh" style="background-color:#fafafa;color:#61676c;" class="language-sh "><code class="language-sh" data-lang="sh"><span style="color:#ed9366;">></span><span> npm </span><span style="color:#f29718;">i</span><span> aws-sdk
</span><span style="color:#ed9366;">></span><span> echo </span><span style="color:#86b300;">'export * from "aws-sdk";' </span><span style="color:#ed9366;">></span><span> index.ts
</span></code></pre>
<p>(Note: This just installed <strong>aws-sdk@2.598.0</strong> which btw is <strong>48M</strong> on disk)</p>
<p>Lets run <code>tsc</code> again:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>> tsc
</span><span>Files: 345
</span><span>Lines: 396419
</span><span>Nodes: 1178724
</span><span>Identifiers: 432925
</span><span>Memory used: 465145K
</span><span>Parse time: 2.38s
</span><span>Bind time: 0.78s
</span><span>Check time: 2.22s
</span><span>Total time: 5.38s
</span></code></pre>
<p><strong>Say whaaaaaat?¿?¿</strong> Adding a single dependency adds a whooping <strong>400M</strong> of
memory usage and roughly <strong>4 seconds</strong> of runtime.</p>
<p>I will let you in on a little secret: <code>tsc</code> is actually typechecking <em>all</em> of
the <code>aws-sdk</code>, which can be slow. We can avoid that by using <code>--skipLibCheck</code>,
which is recommended all over the internet to speed up <code>tsc</code>:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>> tsc --skipLibCheck
</span><span>Memory used: 375234K
</span><span>Parse time: 2.28s
</span><span>Bind time: 0.77s
</span><span>Check time: 0.00s
</span><span>Total time: 3.05s
</span></code></pre>
<p>Not <em>that</em> much of an improvement, but we got rid of the <code>check time</code>, and about
<em>~100M</em> of memory usage.</p>
<h2 id="lets-start-profiling"><a class="anchor-link" href="#lets-start-profiling" aria-label="Anchor link for: lets-start-profiling">#</a>
Lets start profiling</h2>
<p>In order to find out where all of this memory usage is coming from, we need to
start profiling. Luckily, the
<a href="https://nodejs.org/en/docs/guides/debugging-getting-started/">node docs</a> are
quite good. Take a minute to read that page.</p>
<p>So, from now on, we will start <code>tsc</code> like this:
<code>node --inspect-brk node_modules/.bin/tsc --skipLibCheck</code>.
And I will be using chromium, navigate to <code>chrome://inspect</code> and wait for my
node process to appear.</p>
<p>Once the debugger is attached, we can resume execution
(the <code>--inspect-brk</code> switch actually suspends execution). We watch our console
in the background, and once we get the <code>--diagnostics</code> output, <code>tsc</code> is basically
done, but it still holds on to its memory.</p>
<p>Now we can switch to the <code>Memory</code> tab, and take a heap snapshot. This will take
a while. In my opinion, the
<a href="https://developers.google.com/web/tools/chrome-devtools/memory-problems/heap-snapshots">documentation</a>
for this tool could be a lot better, but it gives you the very basics.</p>
<p><img src="https://swatinem.de/blog/optimizing-tsc/heap-overview.png" alt="Heap Profiler" /></p>
<p>For someone who has never before seen this, this might be a bit overwhelming
and confusing. And well, yes, it is. Memory profiling is actually a lot about
intuition, and digging deeper into things.</p>
<p>I have expanded the <code>(string)</code> category. We see <strong>9M</strong> for <code>tsc</code> itself, and
then a number of files which look very much like the sources of the <code>aws-sdk</code>,
for a total of <strong>67M</strong>. <code>tsc</code> essentially reads all the source files of <code>aws-sdk</code>
and keeps them in-memory. According to our <code>--diagnostics</code> output, that is roughly
<em>~250</em> files, and the complete <code>aws-sdk</code> is roughly <em>48M</em> on disk, so the
numbers start to add up.</p>
<p>Moving on, lets expand the <code>Node</code>:</p>
<p><img src="https://swatinem.de/blog/optimizing-tsc/nodes.png" alt="Nodes" /></p>
<p>Here we see that each of the nodes is <strong>160 bytes</strong>, and both according to the
memory profiler, and the <code>tsc --diagnostics</code> output, we have a bit more than
1 million <code>Node</code>s, which adds up to almost <strong>180M</strong> of memory.</p>
<p>Expanding some <code>Node</code>s, we also see that the <code>Node</code>s have very different
properties on them. One very relevant detail is also that not every property is
shown, more on that later.</p>
<h2 id="diving-into-some-theory"><a class="anchor-link" href="#diving-into-some-theory" aria-label="Anchor link for: diving-into-some-theory">#</a>
Diving into some theory</h2>
<p>To progress further, we need to know a little bit about how v8 manages its
memory. Luckily, the v8 team talks quite a bit about this and other performance
relevant topics. Go and read <a href="https://v8.dev/blog/react-cliff">one of</a> the very
good posts on the <a href="https://v8.dev/blog">v8 blog</a>, or watch one of the recordings
from various conferences.</p>
<p><em>Also note that this is specific to v8, and other JS engines are different,
though surprisingly still quite similar. Also, I might get some of the details
wrong, or they might get outdated, so take this with a grain of salt.</em></p>
<hr />
<p>Alright! To move on, we have to understand how v8 saves JS objects in memory.
Very simplified, an object looks like this:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>┌────────────┐
</span><span>│ Map │
</span><span>├────────────┤
</span><span>│ Properties │
</span><span>├────────────┤
</span><span>│ Elements │
</span><span>├────────────┤
</span><span>│ … │
</span><span>└────────────┘
</span></code></pre>
<p>Each one of these entries (slots) is <em>"pointer sized"</em>, which on 64-bit system
means <strong>8 bytes</strong>.</p>
<ul>
<li>The <strong>Map</strong>, also called <em>Hidden Class</em> or <em>Shape</em>, is an internal
data-structure which describes the object. V8 and other JS engines have a lot of
internal optimizations that depend on this <em>Shape</em>. For example, optimized code
is specialized for one or more <em>Shape</em>s. When you pass in an object of a
different <em>Shape</em>, the engine will bail out to slower code.</li>
<li><strong>Properties</strong>, is a pointer to an optional hashmap, which can hold additional
properties that get added later to an object. You will sometimes hear or read
about <em>"dictionary mode" objects</em>. This is it.</li>
<li><strong>Elements</strong> is a pointer to some optional indexed properties, like for an
array.</li>
<li><strong>…</strong>: And then each object can have a number of <em>inlined</em> properties. This is
what makes property access fast. The <em>Map</em> describes which properties are
inlined at which index, and optimized code will just fetch the property from
index <code>X</code> instead of looking it up through <em>Properties</em>.</li>
</ul>
<p>Each object at least has the three special properties, so each object is at
least <strong>24 bytes</strong>. In our example, each <code>Node</code> is <strong>160 bytes</strong>, so it has
<strong>20 slots</strong>, minus the special ones leaves us with up to <strong>17 slots</strong> for
arbitrary properties. That is quite a lot.</p>
<hr />
<p>So, what is such a <code>Node</code> anyway? When typescript, or any other parser
essentially, parses the source code, it creates an internal data-structure,
called the <em>Abstract Syntax Tree</em> (AST). And as the name says, it is a tree,
consisting of <code>Node</code>s. Each syntax construct is represented by a different
type of node.</p>
<ul>
<li>An <em>Identifier</em> (<code>ident</code>) for example only has to know its <em>name</em>.</li>
<li>A <em>MemberExpression</em> (<code>object.property</code>) has references the <em>object</em> and the <em>property</em>.</li>
<li>An <em>IfStatement</em> (<code>if (condition) { consequent } else { alternate }</code>) also has references to its child blocks.</li>
<li>… and so on …</li>
</ul>
<p>While each one of these nodes share some common properties, like their location
in the source file for example, each syntax node has very different properties.
Which makes it hard for JS engines to optimize this particular data structure,
and functions that work with these.</p>
<h2 id="trying-to-improve-things"><a class="anchor-link" href="#trying-to-improve-things" aria-label="Anchor link for: trying-to-improve-things">#</a>
Trying to improve things</h2>
<p>There is one more very important detail I left out.</p>
<p>V8 has a lot of heuristics, and one of them is that it groups all these objects
based on <strong>the constructor function</strong>. And typescript unfortunately uses a single
constructor function for all of these very different node types. It is quite
unlikely that every AST node will need <em>17</em> properties.</p>
<p>With this is mind, we can try to improve things.</p>
<p>For a live demo, we can just live-patch the <code>node_modules/typescript/lib/tsc.js</code>
file, and search for <code>function Node(</code>. In the typescript source tree, we find
the code <a href="https://github.com/microsoft/TypeScript/blob/8ed92dcecda8b9d5cc5b9e22c5ebe2aae91a9670/src/compiler/utilities.ts#L4987-L5016">here</a>.</p>
<p>Surprisingly, right next to it is this thing called the <code>objectAllocator</code>:
(I added a <code>prettier-ignore</code> comments, otherwise my editor will auto-format this)</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="color:#fa6e32;">function </span><span style="color:#f29718;">Node</span><span>(</span><span style="color:#ff8f40;">kind</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">pos</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">end</span><span>) {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#ed9366;">.</span><span>pos </span><span style="color:#ed9366;">= </span><span>pos</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#ed9366;">.</span><span>end </span><span style="color:#ed9366;">= </span><span>end</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#ed9366;">.</span><span>kind </span><span style="color:#ed9366;">= </span><span>kind</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#ed9366;">.</span><span>id </span><span style="color:#ed9366;">= </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#ed9366;">.</span><span>flags </span><span style="color:#ed9366;">= </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#ed9366;">.</span><span>modifierFlagsCache </span><span style="color:#ed9366;">= </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#ed9366;">.</span><span>transformFlags </span><span style="color:#ed9366;">= </span><span style="color:#ff8f40;">0</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#ed9366;">.</span><span>parent </span><span style="color:#ed9366;">= </span><span style="color:#ff8f40;">undefined</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#ed9366;">.</span><span>original </span><span style="color:#ed9366;">= </span><span style="color:#ff8f40;">undefined</span><span style="color:#61676ccc;">;
</span><span>}
</span><span style="font-style:italic;color:#abb0b6;">// [… snip …]
</span><span style="font-style:italic;color:#abb0b6;">// prettier-ignore
</span><span>ts</span><span style="color:#ed9366;">.</span><span>objectAllocator </span><span style="color:#ed9366;">= </span><span>{
</span><span> </span><span style="color:#f29718;">getNodeConstructor</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">function </span><span>() { </span><span style="color:#fa6e32;">return </span><span style="font-style:italic;color:#55b4d4;">Node</span><span style="color:#61676ccc;">; </span><span>}</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#f29718;">getTokenConstructor</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">function </span><span>() { </span><span style="color:#fa6e32;">return </span><span style="font-style:italic;color:#55b4d4;">Node</span><span style="color:#61676ccc;">; </span><span>}</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#f29718;">getIdentifierConstructor</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">function </span><span>() { </span><span style="color:#fa6e32;">return </span><span style="font-style:italic;color:#55b4d4;">Node</span><span style="color:#61676ccc;">; </span><span>}</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#f29718;">getSourceFileConstructor</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">function </span><span>() { </span><span style="color:#fa6e32;">return </span><span style="font-style:italic;color:#55b4d4;">Node</span><span style="color:#61676ccc;">; </span><span>}</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#f29718;">getSymbolConstructor</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">function </span><span>() { </span><span style="color:#fa6e32;">return </span><span style="font-style:italic;color:#55b4d4;">Symbol</span><span style="color:#61676ccc;">; </span><span>}</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#f29718;">getTypeConstructor</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">function </span><span>() { </span><span style="color:#fa6e32;">return </span><span>Type</span><span style="color:#61676ccc;">; </span><span>}</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#f29718;">getSignatureConstructor</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">function </span><span>() { </span><span style="color:#fa6e32;">return </span><span>Signature</span><span style="color:#61676ccc;">; </span><span>}</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#f29718;">getSourceMapSourceConstructor</span><span style="color:#61676ccc;">: </span><span style="color:#fa6e32;">function </span><span>() { </span><span style="color:#fa6e32;">return </span><span>SourceMapSource</span><span style="color:#61676ccc;">; </span><span>}</span><span style="color:#61676ccc;">,
</span><span>}</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>So apparently, TypeScript already has all the necessary infrastructure in place
to at least split the Nodes into four categories. Also note that it uses the
same constructor function for <code>SourceFile</code>s, which are <em>very</em> different from
AST Nodes.</p>
<p>So just for fun, lets copy-paste this <code>Node</code> function, rename it, and use it for
all of these different types…</p>
<p>With this trivial change done, lets try running <code>tsc</code> again:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>Memory used: 353732K
</span></code></pre>
<p>Scrolling back up, and running these commands a few more times, the numbers are
very reproducible. Our memory usage went from <strong>375M</strong> to <strong>353M</strong>. We just
saved ourselves <strong>22M</strong> of memory usage, which amounts to roughly <strong>~6%</strong>.</p>
<p>Lets double-check using the memory profiler.</p>
<p><img src="https://swatinem.de/blog/optimizing-tsc/nodes-after.png" alt="Nodes after Optimization" /></p>
<p>In the end, we end up with these sizes:</p>
<table><thead><tr><th>Type</th><th>Size (bytes)</th><th>Frequency</th></tr></thead><tbody>
<tr><td>SourceFile</td><td>160</td><td>~0%</td></tr>
<tr><td>Identifier</td><td>104</td><td>~37%</td></tr>
<tr><td>Token</td><td>104</td><td>~13%</td></tr>
<tr><td>Node</td><td>144</td><td>~50%</td></tr>
</tbody></table>
<p>What we see from this is that mixing <code>SourceFile</code> with all the rest of the
<code>Node</code>s is not a really good idea. Also, <strong>104 bytes</strong> equals <strong>10</strong> non-special
properties, which is a lot for things like <code>Token</code>s, which are usually
punctuation, but TS uses them for literals, or <code>Identifier</code>s,
which just represent one word in the source text.
Careful analysis could further shrink the memory usage, by removing unused
properties, or further splitting up and organizing the different token types.</p>
<h2 id="bad-news"><a class="anchor-link" href="#bad-news" aria-label="Anchor link for: bad-news">#</a>
Bad news</h2>
<p>While I only write about this in early January, I did all the analysis and
patching in mid September last year. You can check the
<a href="https://github.com/microsoft/TypeScript/pull/33431">pull request</a> on the
typescript repo; it is still open as I write this blog. :-(
When running typescripts own performance test suite,
my patch demonstrated a <strong>6~8%</strong> decrease in memory usage, so even more
significant than the saving demonstrated with the testcase here. But there is
apparently no interest from the maintainers to merge it. I asked again early
December, <em>one month ago</em>, to get some feedback, but got no reply whatsoever.
Compared to <a href="https://github.com/microsoft/TypeScript/pull/33390">my first PR</a>
, which was merged in less than 24 hours, this is super disappointing and
frustrating for an external contributor. So if anyone has any connections to
the maintainers, please kick some ass to get some progress here. :-)</p>
<p>The other thing is <code>aws-sdk</code>, which I used as the testcase here.
One thing people could do it to better organize their library, for example by
bundling both library code and their types. And it just so happens that I
maintain <a href="https://github.com/Swatinem/rollup-plugin-dts">rollup-plugin-dts</a>
which you should definitely check out :-)
But introducing bundling after the fact might be a breaking change for library
users, so I understand its not always feasible.</p>
<p><em>BUT</em>, after some digging around, I found out that the <code>aws-sdk</code> actually has
more focused imports, so instead of <code>import { S3 } from "aws-sdk"</code>, one can do
<code>import S3 from "aws-sdk/clients/s3"</code> (one reason why bundling would break things).
You might want to use such focused imports to save both startup time and memory
usage <em>at runtime</em>. I haven’t checked what the runtime code actually does, but
the type definitions end up including <em>the whole world</em>, even though you would
like to use focused imports.
I <a href="https://github.com/aws/aws-sdk-js/issues/2846">created an issue</a>, also in
September, which got a single comment along the lines of
<em>"we don’t really care, wait for the next major version"</em>, which is also quite
disappointing. I don’t have such a deep insight, but I would guess that a fix
for this would be quite simple; especially since <code>aws-sdk</code> has a ton of
duplicated type aliases.</p>
<h2 id="conclusion"><a class="anchor-link" href="#conclusion" aria-label="Anchor link for: conclusion">#</a>
Conclusion</h2>
<p>Memory optimization is hard, especially in JS. Also, parsers and compilers are
even harder to optimize in JS. It is amazing that something like an <code>Identifier</code>,
which in minified code is only <strong>1 character = 1 byte</strong>, is blown up to
<strong>160 bytes</strong> by parsing it into a data structure that a compiler can work with.</p>
<p>Profiling JS is a complex thing to do. Engines have a ton of optimizations and
heuristics. They try to be very smart. They mostly succeed, but there are some
code patterns that are very hard to optimize. Figuring out what is really
happening requires a lot experience, knowledge, guessing, and sometimes
just luck. I hope I have opened the eyes of some by showing how I approach these
kinds of problems.</p>
<p>One recommendation for other developers, that you can also read and hear about
<em>a lot</em> is to use constructor functions, which initialize all the properties
that an object can have, with correct types. Just putting random properties on
objects at random times, like typescript apparently does is really bad for
performance.</p>
<p>But in the end, the number one rule is to:
<strong>measure, measure, measure! and then measure some more!</strong></p>
My immersion-cooled Oil PC2019-12-06T00:00:00+00:002019-12-06T00:00:00+00:00
Unknown
https://swatinem.de/blog/oil-immersion-cooling/<p><strong>TLDR</strong>: scroll down for pictures and a video :-)</p>
<p>It was actually quite some time ago when I stumbled upon the concepts of
immersion cooling, by which you immerse your whole hardware in a di-electric
cooling fluid, and by phase-change cooling, which means that the coolant
actually evaporates on the hot parts to carry away the heat, and then condenses
again to build a cycle. Go ahead and search the internet/youtube for the
keywords
<a href="https://www.youtube.com/results?search_query=novec+cooling">novec cooling</a>,
you <em>will</em> be amazed! So was I! There is just this
sense of peace and calm when you see bubbles rising from your PC components.</p>
<p>I wanted to have something like that myself. So I put in a bit of research.
There are a few specifically engineered fluids, which are di-electric (insulators)
and have a quite low boiling point. The <code>3M Novec</code> line of fluids are an example.
One of them has a boiling point of 30°C, which in theory means that your
hardware components will rarely exceed those temps.</p>
<p>But there are a few problems:</p>
<ul>
<li>With global warming, and without an AC, I get incredibly hot summers,
sometimes up to 35°C in my room, which is absolutely no pleasure, believe me!
But this means that I simply can’t realistically cool the fluid down below its
boiling point in summer with conventional means.</li>
<li>You simply can’t buy <code>3M Novec</code> as a private person. It’s just not possible.</li>
<li>Even if it was, I read that it is <em>prohibitively expensive</em>, well above the
<em>100€/l</em> point apparently.</li>
<li>Novec is very volatile / fleeting (sorry, I don’t really know the correct word),
which means you would have to a very air-tight and possibly pressurized
container for it.</li>
</ul>
<p>So considering all this, phase-change cooling was out of the picture. But
immersion cooling was still an option. And there is quite some prior art on just
using oil for that, which is a di-electric, and has quite good thermal properties
as well.</p>
<p>But information on how to properly do it was quite scarce. Do I actually need a
heatsink? (With Novec, you don’t really) How about fans? How large? How to cool
the oil? When using standard water-cooling equipment, how large should that be?
BTW, I am still missing a comprehensive overview of the <em>actually performance</em>
of water-cooling hardware. Like how many Watts of heat can a <code>2 * 140mm x 45mm</code>
radiator dissipate, when combined with Fans that have an Airflow of <code>X</code>. I
haven’t really found any info on that one! Also, what kind of pump do you need
for a combination of radiators, etc… Info such as this is really scarce to
non-existent on the web. I really wish such info was more widespread. Also,
oil is a lot more viscous than water, so that has to be taken into account as
well.</p>
<h1 id="build-timeline"><a class="anchor-link" href="#build-timeline" aria-label="Anchor link for: build-timeline">#</a>
Build Timeline</h1>
<p>… Anyway, I just made a best guess on the components and started planning.</p>
<p>My first idea was to have the Motherboard and the GPU back-to-back,
such as in the <a href="https://www.dan-cases.com/dana4.php">DAN A4</a>.</p>
<p>I actually bought the hardware itself about a year ago, together with a PCI-E
riser card to try different configurations. Turns out that a faulty riser card
can actually randomly crash your games! It was either faulty, or I just bent it
too much, but hey! Those cards are supposed to be bent, so they better handle
the stress! Anyway, I got a refund for the faulty riser and went with a
different one.</p>
<p>Along the way, I also had the epiphany to do some kind of triangle or star shape.
And thus, also the projects code-name was born: <strong>Trinity-Force</strong>.</p>
<p>I actually sketched it up on paper first:</p>
<p><img src="https://swatinem.de/blog/oil-immersion-cooling/sketch.jpg" alt="Paper Sketch" /></p>
<p>The ideas got a lot more concrete when I started modeling everything in
<a href="https://www.freecadweb.org/">FreeCAD</a>, which I learned just to make this project.</p>
<p>Researching all the data-sheets for the different Mainboard Form-Factors, and
being able to precisely apply them in CAD was really appealing to my inner
engineer.</p>
<p><img src="https://swatinem.de/blog/oil-immersion-cooling/mb-mount.png" alt="MB Mount" /></p>
<p>After more learning, and long nights of CAD Design, I was done:</p>
<p><img src="https://swatinem.de/blog/oil-immersion-cooling/cad-3d.png" alt="3D CAD Model" /></p>
<p>And then came the actual fabrication. I had all the parts cut out of an acrylic
sheet by a laser cutter:</p>
<p><img src="https://swatinem.de/blog/oil-immersion-cooling/cutting.jpg" alt="Laser-cut Acrylic" /></p>
<p>I then continued to glue all the pieces together. An important note here: Do not
glue laser-cut edges, they <em>will</em> get ugly! Make sure to smooth / file them off
somehow before glueing. Well, lesson learned.
The next step was to assembly all the hardware into their place, which ran on
air cooling for another month or so:</p>
<p><img src="https://swatinem.de/blog/oil-immersion-cooling/assembled.jpg" alt="Assembled Innards" /></p>
<p>And then with the outer basin completed came the final assembly and pouring in
the oil:</p>
<p><img src="https://swatinem.de/blog/oil-immersion-cooling/mating.jpg" alt="Mating the two parts" />
<img src="https://swatinem.de/blog/oil-immersion-cooling/ready.jpg" alt="Ready to fill" />
<img src="https://swatinem.de/blog/oil-immersion-cooling/half.jpg" alt="Half Full" />
<img src="https://swatinem.de/blog/oil-immersion-cooling/full.jpg" alt="Full" /></p>
<p>And some shaky-cam footage of the final beast running:</p>
<div class="video" >
<iframe
src="https://www.youtube-nocookie.com/embed/Q0BQbAdOaBQ"
webkitallowfullscreen
mozallowfullscreen
allowfullscreen
></iframe>
</div>
<p>BTW, from planning to actually pouring in the oil was almost half a year.
I started ordering hardware around December last year, and I poured in the oil
around April…</p>
<h1 id="the-good"><a class="anchor-link" href="#the-good" aria-label="Anchor link for: the-good">#</a>
The Good</h1>
<p>Well I wanted to do a proper case-mod for quite some time. And I can truly say
that I have an absolutely unique PC! It was an amazing project to plan, design
and build. I’m really proud to have pulled this off!
And oh boy does it look cool!</p>
<p>Also, the performance is great, and the cooling works. I rarely hit ~70°C on
either CPU or GPU. And this Setup is running an <strong>RTX 2070</strong> and a
<strong>Ryzen 2700X</strong>, with a tiny low-profile cooler! The ambient oil temperature
(as measured by chipset / SSD) does hit >40°C in summer, but things are still
stable and performing well. I haven’t noticed any thermal throttling so far.</p>
<h1 id="the-bad"><a class="anchor-link" href="#the-bad" aria-label="Anchor link for: the-bad">#</a>
The Bad</h1>
<p>Although this is an amazing piece of work, it did fail some of the requirements
for a PC that I have.</p>
<p>I it actually quite big! The radiator itself is already quite bulky, but the
unique trinity-shaped design actually takes up quite some place. And it is still
sitting on my desk, taking up precious space.
The other bad thing is that it is not actually <em>that</em> quiet. You see, the pump
is quite loud, and it even emits some low-frequency vibrations. I’m not quite
sure if this is just the way that watercooling pumps are, or if it’s a faulty
pump itself. Or maybe I made a mistake when starting it up. Apparently you have
a specific procedure to prime the pump. But I have had it running dry while I
was slowly pouring in the Oil. Not sure if that is the reason it is this loud
now.</p>
<p>Also, having two separate outlets from the Radiator was probably not a good idea,
since the flow is uneven. Which also reminds me that, while definitely nice to
look at, the trinity-shaped design is not really that good for fluid dynamics.
Especially the Mainboard compartment gets hotter than other parts of the
assembly. You can even feel it when you touch the outsides.</p>
<p>Another thing to mention. You might not see it so well on the pictures, but I
have disassembled the power supply into its own compartment as well. Which
meant soldering cables and so on. Very tedious work, I don’t recommend doing it.</p>
<h1 id="the-ugly"><a class="anchor-link" href="#the-ugly" aria-label="Anchor link for: the-ugly">#</a>
The Ugly</h1>
<p>Well… What is the worst that can happen when you are dealing with oil? Well,
it leaks. And it does. Very slowly, most likely in tiny cracks of the basin.
Remember, never directly glue laser-cut edges! After all, the unique shape makes
it very difficult to glue, and I didn’t really do a great job at that :-(
Anyway, I have the PC on a piece of cardboard to soak up the leaks which I had
to change once so far.</p>
<p>Also, this is something that I have read already somewhere, but it is still
very surprising because it kind-of defies the laws of gravity. You see, oil can
actually flow <strong>upward</strong> <em>inside</em> of cables! Incredible. But the oil actually
travels upwards in one of the USB cables and drips under my table. Haha!</p>
<p>The problem is, the RTX 2070 Mini that I have has a fan which is quite close to
the PCI bracket, so when angled upside-down, this means I have to fill the oil
quite near the top for the fan not to splash it. That means that all USB and
other connectors on the Mainboard are also soaked in oil, which gets on all the
cables you plug in. Not nice :-(</p>
<h1 id="future-ideas"><a class="anchor-link" href="#future-ideas" aria-label="Anchor link for: future-ideas">#</a>
Future ideas</h1>
<p>Well this was basically it. It was an amazing project to build, and I learned
a tremendous amount doing it.</p>
<p>Am I done though? I’m not quite sure. I do have the slight urge to re-do the
project at some point. To correct all the mistakes that I made and learned from.</p>
<ul>
<li>Go with a more usual rectangular design, without the need for a riser card.</li>
<li>At least for the outer basin, have someone else assemble/glue it who knows
their job. Using a rectangular shape should make that a lot easier anyway!</li>
<li>Do not disassemble and mess with the power supply, just get an SFX PSU with
cable management and some nicely sleeved cables. Even doing the cable-sleeving
yourself can’t be as tedious as soldering cables :-D</li>
<li>Get a GPU that has more space between bracket and fan, so I don’t have to
fill the oil quite to the top.</li>
<li>Better optimize for fluid dynamics. Hot fluid rises to the top, which means
that somewhere on the top should be an outlet that leads to the radiator, and
from there an inlet on the bottom somewhere.</li>
<li>Not sure why this is currently so bad, but make sure to get a <em>quiet</em> pump.</li>
<li>Make sure to design the inner part to be more easily accessible / removable.</li>
</ul>
<p>I do have some ideas revolving around a dual-basin design. One basin with the
hardware, which overflows like a waterfall into a second basin at the bottom.
This should in itself already cool the oil down a bit. From there, through the
radiator and into the bottom of the inner tank. This would however necessitate
a directional flow valve. And as I have learned that oil is quite fleeting, I’m
not sure this is possible? Well it has to somehow!</p>
Lets learn Dependency Injection2019-11-25T00:00:00+00:002019-11-25T00:00:00+00:00
Unknown
https://swatinem.de/blog/learn-di/<p>I believe a lot in the saying “learning by doing”, and often the best thing I
can do to better understand a specific problem, topic, library or paradigm, is
to actually implement it myself.</p>
<p>Often its enough to only think about how I would implement it, but sometimes
its good to also write it down, so that’s what I will do here.</p>
<p>So this specific journey started already a few month ago, when at my previous
job, we started to embrace DI (dependency injection), more specifically in the
form of <a href="https://nestjs.com/">nestjs</a>. The specific question I wanted answered
was: Why does nestjs come with its own module system while JS already has
modules, and the nestjs modules, depending on your project size only ever
have one Service/Provider in it. The concept didn’t immediately click for me,
I only got it after I thought about how I would implement a DI solution myself.</p>
<hr />
<p>So lets go!</p>
<p>Actually the concept behind DI is very simple. In my own words, what it does is
to decouple the <strong>what</strong> from the <strong>how</strong>.</p>
<p>That is also how we can think about the moving parts. The central part in DI is
called the <code>Container</code>. What you do is ask the container to give you the thing
you want, the <strong>what</strong>, which essentially boils down to just a <em>type</em>. This is
most commonly the type of your service, but you can also use it with primitives
such as <code>string</code> or <code>number</code> if you wan’t to manage configuration via DI.</p>
<p><a href="https://github.com/typestack/typedi">typedi</a> calls this a <code>Token</code>:</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="font-style:italic;color:#abb0b6;">// The class is empty, its only purpose is to hold the type `T`, the **what**.
</span><span style="color:#fa6e32;">class </span><span style="color:#399ee6;">Token</span><span><</span><span style="color:#399ee6;">T</span><span>> {}
</span><span>
</span><span style="color:#fa6e32;">interface </span><span style="color:#399ee6;">Service </span><span>{
</span><span> foo</span><span style="color:#ed9366;">: </span><span style="font-style:italic;color:#55b4d4;">string</span><span style="color:#61676ccc;">;
</span><span>}
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// Define some specific things you want to expose.
</span><span style="color:#fa6e32;">const </span><span>MyConfig </span><span style="color:#ed9366;">= new </span><span style="color:#399ee6;">Token</span><span><</span><span style="font-style:italic;color:#55b4d4;">string</span><span>>()</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">const </span><span>MyService </span><span style="color:#ed9366;">= new </span><span style="color:#399ee6;">Token</span><span><</span><span style="color:#399ee6;">Service</span><span>>()</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>But how do we actually construct the things that we want, the <strong>how</strong>?
Really, the DI container itself does not really need to know itself how things
are created. It just delegates this to any kind of function that does so, which
is called the <code>Provider</code>.</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="color:#fa6e32;">type </span><span style="color:#399ee6;">Provider</span><span><</span><span style="color:#399ee6;">T</span><span>> </span><span style="color:#ed9366;">= </span><span>(</span><span style="color:#ff8f40;">container</span><span style="color:#ed9366;">: </span><span style="color:#399ee6;">Container</span><span>) </span><span style="color:#fa6e32;">=> </span><span style="color:#399ee6;">T</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>In my example, I want to keep things as simple as possible from the container
point of view, which means it is the responsibility of the <code>Provider</code> to:</p>
<ol>
<li>initialize any dependency and</li>
<li>cache/memoize things. Thus, we arrive at this very simple <code>Container</code>:</li>
</ol>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="color:#fa6e32;">class </span><span style="color:#399ee6;">Container </span><span>{
</span><span> </span><span style="font-style:italic;color:#abb0b6;">/** The registry holds the `Provider`s keyed by `Token`s. */
</span><span> </span><span style="color:#fa6e32;">private </span><span>registry </span><span style="color:#ed9366;">= new </span><span style="color:#399ee6;">Map</span><span><</span><span style="color:#399ee6;">Token</span><span><</span><span style="font-style:italic;color:#55b4d4;">any</span><span>></span><span style="color:#61676ccc;">, </span><span style="color:#399ee6;">Provider</span><span><</span><span style="font-style:italic;color:#55b4d4;">any</span><span>>>()</span><span style="color:#61676ccc;">;
</span><span>
</span><span> </span><span style="font-style:italic;color:#abb0b6;">/** Register a new Provider for a Token, the **how**. */
</span><span> </span><span style="color:#fa6e32;">public </span><span style="color:#f29718;">register</span><span><</span><span style="color:#399ee6;">T</span><span>>(</span><span style="color:#ff8f40;">token</span><span style="color:#ed9366;">: </span><span style="color:#399ee6;">Token</span><span><</span><span style="color:#399ee6;">T</span><span>></span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">provider</span><span style="color:#ed9366;">: </span><span style="color:#399ee6;">Provider</span><span><</span><span style="color:#399ee6;">T</span><span>>)</span><span style="color:#ed9366;">: </span><span style="font-style:italic;color:#55b4d4;">this </span><span>{
</span><span> </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#ed9366;">.</span><span>registry</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">set</span><span>(token</span><span style="color:#61676ccc;">, </span><span>provider)</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">return </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#61676ccc;">;
</span><span> }
</span><span>
</span><span> </span><span style="font-style:italic;color:#abb0b6;">/** This will give you **what** you want, you don’t need to care **how**. */
</span><span> </span><span style="color:#fa6e32;">public </span><span style="color:#f29718;">get</span><span><</span><span style="color:#399ee6;">T</span><span>>(</span><span style="color:#ff8f40;">token</span><span style="color:#ed9366;">: </span><span style="color:#399ee6;">Token</span><span><</span><span style="color:#399ee6;">T</span><span>>)</span><span style="color:#ed9366;">: </span><span style="color:#399ee6;">T </span><span>{
</span><span> </span><span style="color:#fa6e32;">return </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#ed9366;">.</span><span>registry</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get</span><span>(token)</span><span style="color:#ed9366;">!</span><span>(</span><span style="font-style:italic;color:#55b4d4;">this</span><span>)</span><span style="color:#61676ccc;">; </span><span style="font-style:italic;color:#abb0b6;">// NOTE: this may throw!
</span><span> }
</span><span>}
</span></code></pre>
<p>This is essentially a very simple but working DI Container in <strong>12</strong> lines of
code! Lets use it!</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="color:#fa6e32;">class </span><span style="color:#399ee6;">ConcreteService </span><span>{
</span><span> </span><span style="color:#fa6e32;">constructor</span><span>(</span><span style="color:#fa6e32;">public </span><span style="color:#ff8f40;">foo</span><span style="color:#ed9366;">: </span><span style="font-style:italic;color:#55b4d4;">string</span><span>) {}
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">const </span><span>container </span><span style="color:#ed9366;">= new </span><span style="color:#399ee6;">Container</span><span>()</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// We can use a static value
</span><span>container</span><span style="color:#ed9366;">.</span><span style="color:#f29718;">register</span><span>(MyConfig</span><span style="color:#61676ccc;">, </span><span>() </span><span style="color:#fa6e32;">=> </span><span style="color:#86b300;">"my config value"</span><span>)</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// Here, our `Provider` constructs a new value matching the `Service` interface,
</span><span style="font-style:italic;color:#abb0b6;">// and uses the DI container to get any dependency value.
</span><span style="font-style:italic;color:#abb0b6;">// Again, neither the DI container itself nor the user does not need to care.
</span><span>container</span><span style="color:#ed9366;">.</span><span style="color:#f29718;">register</span><span>(
</span><span> MyService</span><span style="color:#61676ccc;">,
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// If we want to have a singleton, we can just wrap this function with some
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// kind of `memoize`, which is left as an excercise for the reader.
</span><span> </span><span style="color:#ff8f40;">container </span><span style="color:#fa6e32;">=> </span><span style="color:#ed9366;">new </span><span style="color:#399ee6;">ConcreteService</span><span>(container</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get</span><span>(MyConfig))
</span><span>)</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// This will lazily create a new instance:
</span><span style="color:#fa6e32;">const </span><span>myService </span><span style="color:#ed9366;">= </span><span>container</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get</span><span>(MyService)</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>You can also play with the complete example in the
<a href="https://www.typescriptlang.org/play/#code/MYGwhgzhAEAqD2BrApgOwDywHzQN4F8AoAFwE8AHZaABQCd4A3ASwBNlbMcBeaACmHipiYJqnYAuaAGFBw0ewCU0LjlgBuQqEgwZQkWNp5C0aAHoAVObgALKrWQBzJhGK1S0a-BAsYxW9AADOkZWdgCYFFJkFmgAI3cAhBRUcIA6aHNTY2hyWiYGMGI7R2dXdx4xAHdoAFkwckwkNHQwVFIsABoaemY2Dlb2rF4FDWyLKwAlEpd2aDBoKu6QvugAM3hDeaS0Lr8qS09Ky3TM7PIAV1iQJmBoeycZjmxeYibUSW2MbC7c5Ykl3rsTgKSR+ZxGEwmMEQVL3UpuVIQZDEF5vH49UK0EbZEz2YjnWioaDQjQmIhjSw2cGVJggEDQJwMKikeDnDLmSrWQqUllsyqtYhdXnQFiCQCYBMQFshosT4NBgGB7OzDscMlkTBcrjcGcjOKjkh83sCPhDIXiCUTobDpmVUg5kfq0AoAIQvazOEZmUzQAByAHlYABRUHumAAWzA7j89EqzuyRHJoiKtFWYGAVAAyuxmOnTet4JIXHlUA4NOTTN6ACLIVbyaAQeBhqgQSjAJi125gkswYX8oSy6DIAAe5HgSNSmkELlqpF0tYcyil1U+6CLogcQ2xAlQ05qpCztBzVAqyGXRoPR83oy0UGkgmAeOQF5uVFw2W3RfOwFetF4muutz5oWrjrkoBCEOSH6StucgGIuiy6LB7DDKMFbQAA6lQCpEucSJzPWwjENqBQgOcyCTno8i0NaDzJrwe5zkwDhdMMyg4AARGG7jbvO0AkWR7HYoQaEABLsMgXSsoYQQYn0ATylOrhfsQMDzIs-FUBGxDAO6JbEv4ATPum8lJuwqbph0wneq0MS4cgvj+JWACSCmUXBrw6pKAwisglCoGwqDAO4GkTmhACCDj6F0YhMHshh7NAzmuUhhixUiICrAsGz6VQdmGKK9lZZKYgyh5Cr2BOMH6OwNGlMh2R7kZEljN6TmZZUVB9pKHlcky+EQOuIDIoIXQdfKrTQAAVrhkqVLQ9T6eCqznIFRGCNANJ+PWjbkSYaGIKIMTwJlARNmG8BMAAXsgASje6OnQOCQ2rF5qlEsO6a0G2eHrPF-j2GAfQTiYVVUWxS53oFj5NfwsjVdR9oogxgjzgoCiEEJaGwKGG20vS4CXbS3EA0U+GLKILitOm4gUdOXFNYuoMGHaDqNdmL5blOXjIKkIDwA4vD0+z6bYkAA">TypeScript Playground</a>.</p>
<h1 id="making-it-more-useful"><a class="anchor-link" href="#making-it-more-useful" aria-label="Anchor link for: making-it-more-useful">#</a>
Making it more useful</h1>
<p>Obviously, the example is optimized for simplicity and has some obvious
problems:</p>
<ul>
<li>As noted in the comment, using <code>get</code> without previously <code>register</code>-ing a
<code>Provider</code> will throw.</li>
<li>It will run into infinite recursion when you have circular dependencies. I
would argue to avoid circular dependencies in general. They work only in very
specific circumstances and will blow up and burn your house down when you don’t
take very good care.</li>
<li>We can easily make this <code>async</code> as well.</li>
<li>It might be a good idea to actually bake more sophisticated knowledge about
dependencies into the <code>Container</code> itself, to optimize your dependency graph.</li>
<li>The example is also completely missing things like scopes and inheritance.</li>
<li>You might want to be able to use a <code>Service</code> both as <code>Token</code> and as <code>Provider</code>.</li>
</ul>
<p>But even implemented like this, it very clearly highlights the main selling
point of DI:
Neither you as a programmer, nor any of your <code>Provider</code>s need to know <strong>how</strong>
other values are constructed. It just works.
This makes it very easy to override one of your <code>Provider</code>s to construct a
<em>mock</em> object for your unit tests. Or to delegate to two different specific
implementations of a service, depending on your configuration, etc.</p>
<h1 id="circling-back-to-modules"><a class="anchor-link" href="#circling-back-to-modules" aria-label="Anchor link for: circling-back-to-modules">#</a>
Circling back to modules</h1>
<p>Coming back to my original question about modules. We don’t see them in this
very simple example. So lets think a bit about what happens when we start to
scale this, when we have a lot more Providers to worry about, tens, or even
hundreds of them.</p>
<p>We would need to call the <code>register</code> function of our <code>Container</code> for every
single one, which gets tedious very quickly. Also, we want to both have some
kind of encapsulation, and to not have to care about what specific <code>Provider</code>s
there are.</p>
<p>So how can we simplify that? Lets add an extremely simple Module definition,
and extend our <code>register</code> method to deal with it:</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="font-style:italic;color:#abb0b6;">// Tokens and Providers go hand-in-hand, let’s call it a `Definition`.
</span><span style="color:#fa6e32;">interface </span><span style="color:#399ee6;">Definition</span><span><</span><span style="color:#399ee6;">T</span><span>> {
</span><span> token</span><span style="color:#ed9366;">: </span><span style="color:#399ee6;">Token</span><span><</span><span style="color:#399ee6;">T</span><span>></span><span style="color:#61676ccc;">;
</span><span> provider</span><span style="color:#ed9366;">: </span><span style="color:#399ee6;">Provider</span><span><</span><span style="color:#399ee6;">T</span><span>></span><span style="color:#61676ccc;">;
</span><span>}
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// A `Module` is essentially just a list of definitions.
</span><span style="color:#fa6e32;">type </span><span style="color:#399ee6;">Module </span><span style="color:#ed9366;">= </span><span style="color:#399ee6;">Array</span><span><</span><span style="color:#399ee6;">Definition</span><span><</span><span style="font-style:italic;color:#55b4d4;">any</span><span>>></span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">class </span><span style="color:#399ee6;">Container </span><span>{
</span><span> </span><span style="color:#fa6e32;">public </span><span style="color:#f29718;">register</span><span><</span><span style="color:#399ee6;">T</span><span>>(</span><span style="color:#ff8f40;">token</span><span style="color:#ed9366;">: </span><span style="color:#399ee6;">Token</span><span><</span><span style="color:#399ee6;">T</span><span>></span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">provider</span><span style="color:#ed9366;">: </span><span style="color:#399ee6;">Provider</span><span><</span><span style="color:#399ee6;">T</span><span>>)</span><span style="color:#ed9366;">: </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">public </span><span style="color:#f29718;">register</span><span>(</span><span style="color:#ff8f40;">module</span><span style="color:#ed9366;">: </span><span style="color:#399ee6;">Module</span><span>)</span><span style="color:#ed9366;">: </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#61676ccc;">;
</span><span> </span><span style="color:#fa6e32;">public </span><span style="color:#f29718;">register</span><span><</span><span style="color:#399ee6;">T</span><span>>(</span><span style="color:#ff8f40;">modOrToken</span><span style="color:#ed9366;">: </span><span style="color:#399ee6;">Module </span><span style="color:#ed9366;">| </span><span style="color:#399ee6;">Token</span><span><</span><span style="color:#399ee6;">T</span><span>></span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">provider</span><span style="color:#ed9366;">?: </span><span style="color:#399ee6;">Provider</span><span><</span><span style="color:#399ee6;">T</span><span>>) {
</span><span> </span><span style="color:#fa6e32;">if </span><span>(</span><span style="font-style:italic;color:#55b4d4;">Array</span><span style="color:#ed9366;">.</span><span style="color:#f29718;">isArray</span><span>(modOrToken)) {
</span><span> </span><span style="color:#fa6e32;">for </span><span>(</span><span style="color:#fa6e32;">const </span><span>def </span><span style="color:#ed9366;">of </span><span>modOrToken) {
</span><span> </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#ed9366;">.</span><span>registry</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">set</span><span>(def</span><span style="color:#ed9366;">.</span><span>token</span><span style="color:#61676ccc;">, </span><span>def</span><span style="color:#ed9366;">.</span><span>provider)</span><span style="color:#61676ccc;">;
</span><span> }
</span><span> } </span><span style="color:#fa6e32;">else </span><span>{
</span><span> </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#ed9366;">.</span><span>registry</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">set</span><span>(token</span><span style="color:#61676ccc;">, </span><span>provider)</span><span style="color:#61676ccc;">;
</span><span> }
</span><span> </span><span style="color:#fa6e32;">return </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#61676ccc;">;
</span><span> }
</span><span>}
</span><span>
</span><span style="font-style:italic;color:#abb0b6;">// And then we can create and use a `Module`:
</span><span style="color:#fa6e32;">const </span><span>ConfigModule </span><span style="color:#ed9366;">= </span><span>[
</span><span> { token</span><span style="color:#61676ccc;">: </span><span>MyConfig</span><span style="color:#61676ccc;">, </span><span style="color:#f29718;">provider</span><span style="color:#61676ccc;">: </span><span>() </span><span style="color:#fa6e32;">=> </span><span style="color:#86b300;">"module config value" </span><span>}
</span><span>]</span><span style="color:#61676ccc;">;
</span><span>container</span><span style="color:#ed9366;">.</span><span style="color:#f29718;">register</span><span>(ConfigModule)</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>Here is another <a href="https://www.typescriptlang.org/play/#code/MYGwhgzhAEAqD2BrApgOwDywHzQN4F8AoAFwE8AHZaABQCd4A3ASwBNlbMcBeaACmHipiYJqnYAuaAGFBw0ewCU0LjlgBuQoQD0WuEjQwwqFjXrM2tGAHN40ABZGWAWlFOHxgDTQQyYoEwCGGAwEBBoJmJoMGgAAwARZAAzUXCmQWiAOkJRYnYEsGAqeKTUFMFOPEJoaGJ9VEkEFAxsDSryM1YJU0YOjmbCIh1oAEEYgFl4FgBXH2iwmGQoNGImYJBSaAArSYgIqJAmHeh4BOg2YtLUCEyySmhxqZ9lYdpaMFJ0IuTlsqNSLCwNIRQJAYDIhCIxLQKlUtAAqWFwOxUWjIKwHYi0dZ2eAgFgwYhImJ0boWaIwFCkZAmABG62iDTQZPS0FhWkq0DaTAYYBy0BRaJ2mKeYgA7ncwORMLV0L8sF5ieZ2DLUH8sLwFIDWpNqftgHzUeildheDVGvVpdgvG0SZ0FT1OApJASDi0OdrdfqBTlaLwALYTabIST3QOO6p2F3s8juph6-mG3pq-0sADytAZdTuAceAB89I1OFb2hYAPySO0WB3QqphE68IYvN7pA4N16kP0TNMZhRKXDsmvQBLwKH8QSHM5HE7Jru1Xv9gfhg7peOC0jpCC+XhndKmtBebfWxW0DXzqpEAf4aDIEAb6sD51XFcYtcb4gd1Pp2pFm20ACEJ4vecUWISZaFQRcIFdIh2ThBFYAjGARSYEJoDRBgqFIeBJhZWERQcYh4WgTDsJFIxiC8YjTkEPwIjEKlqlsIIURw7ERXhZlWSjGM9SsXxOBNWpzQLbAw1gO99RAsCIOXA1V3SXi313VAFF-E0EI1aBBgAORTWAAFEnQQ6BfTecN6BFX92SIaDslyfIqAAZXYZgCmrId4EkQVRCsDRoMGT4xGgCB4F9KgIEoYAmCSPVnVQKwYEo0ihAYq8AA9yHgDdMgES4IlGUgwSSKxhWQMUM3QLy4rVE8csOfKnNoFyqB4UV8zQdAGqa6rAWBKBpEEYBgOQTrYyoPsqlqjFJmAGofWjHVY0HeAPKCjFvN7azNEm6AcrkSESrFME9vYdVAUGAB1KggnA7YqCiHYeUW7kQEmZAgVkCF2Bkr0TvywqmCsLx1WUHAACJfXWHKiugZ7XtBk9tF0AAJdhkC8LCoWiCt2FmSbaGm4hDGgVrYaoEziGACM4vDKhohGgpZls2g8gKDxEciYxoFu-FCViABJHaPvkKEalQ3wOfWNhKGMNBgHWUnMkGIYrAhLwxHCJERd5gXds+qFwg3EATlQYcaa5jcoRYeAFmJ+BaOQejRaY5BsqFyFvoTXh2Xq5zRrZmFdD5k4RSoJKIlFhx0MiILvJ8GpUC8EOdqMTZtgiEVXnICDB0mVAZtScCkIJIKQregPoEQUQTGOGJQv9JgAC9kGiROI0puZvESXZDHA5BUoKWhItvIcteRZAwAsTIJrd9gQeJ0r+rzob6eQUdwWF+TNz+wQip7QgEaVzmCTQaAk+unaUR5O7Odu6PohDGZxHe3LF6Kh-mugABtdlcAYs07gKjvAG34jySGBioaA4NsxXSAcVUmoNoBEAALoaF1hvJ8J1-pWHfgfXQ8EDin2QqEcADdkKQ0vryKIrVRAPTzkGZ+hwIYryeGg92CleA+0aqNGqY4cQuxAPAKwfpSArxPEAA">Playground Link</a>.</p>
<p>There you go! A <code>Module</code> here is just an opaque set of Providers. Again, the
benefit is that you don’t need to care about what is actually in it.</p>
<p>You can use it to group multiple configuration values together, or to mock
<em>all the things</em> at once.</p>
<p>There is one last pitfall though: we only have one registry per container, and
the way it is implemented mains that whatever you get now depends on the
<strong>order</strong> in which you register things. And there could be potentially nasty
surprises depending on your modules, because well, modules are supposed to be
opaque. In some more complex examples you would still have to manually optimize
the order of things.</p>
<p>Anyway, this is it. Actually writing this blog post showed me even more how
simple the basic concepts behind DI actually are!</p>
<p>It does get more complicated if you want a richer API, which is <code>async</code>,
supports circular dependencies (please don’t), and if you want to handle
dependencies and memoization in the <code>Container</code> itself.</p>
Rust 2021: Confidence2019-11-05T00:00:00+00:002019-11-05T00:00:00+00:00
Unknown
https://swatinem.de/blog/rust-2020/<p>It is that time of year again. The Rust Team
<a href="https://blog.rust-lang.org/2019/10/29/A-call-for-blogs-2020.html">is soliciting ideas</a>
about rusts roadmap for the next years, so here goes mine.</p>
<p>Thinking about the tagline I would give the coming editions, it is this:</p>
<ul>
<li>Rust 2015: <strong>Stability</strong></li>
<li>Rust 2018: <strong>Productivity</strong></li>
<li>Rust 2021: <strong>Confidence</strong></li>
</ul>
<p>I will explain what this means for me, but lets digress a bit first</p>
<h1 id="about-me"><a class="anchor-link" href="#about-me" aria-label="Anchor link for: about-me">#</a>
About me</h1>
<p>I would describe myself as a Rust developer by heart, but a TypeScript developer
by profession. What does this mean?</p>
<p>Well, I have been <em>watching</em> the Rust project ever since the 2010 Mozilla Summit,
when Graydon Hoare presented a new programming language in a small overcrowded
room full of curious developers.
(Was this the first ever <em>public</em> announcement? I’m not sure…)</p>
<p>I have been <em>following along</em> since Rust still had a <code>libuv</code>-based runtime, and
the syntax was full of strange sigils.
And I am super excited about the progress made since then, and all the great
things the language itself, and especially the community have done.</p>
<p>But there is one problem though: So far, I have been mostly idly standing by,
since apart from some tiny PRs here and there, I don’t really write any
<em>serious</em> Rust. <em>Why is that?</em></p>
<h1 id="the-social-perspective"><a class="anchor-link" href="#the-social-perspective" aria-label="Anchor link for: the-social-perspective">#</a>
The social perspective</h1>
<p>Well, simply put, there haven’t been any real employment opportunities so far.
I feel that, so far, Rust has been a kind-of grassroots, bottom-up movement.
Most developers I meet have at least heard of Rust, and some are also
actively experimenting with it. But so far I have only met resistance when talking
about Rust to potential employers.</p>
<p>As developers, I would say we have an intrinsic urge to try and experiment with
exciting new tech. Companies have other priorities.</p>
<p>It is only now that I start to see demand from enterprise for engineers that
are working with Rust. However, it is still far from where I would like it to be.
I see a few reasons for that, and most of them have to do with <em>confidence</em>.
Over my professional career, I have mostly worked in smaller companies or teams,
around 2-5 people. Only recently have I been part of a larger team of 15-20
engineers, spread across multiple countries, with a diverse set of backgrounds
and skills. I was in a lead position, mostly focused on product quality,
developer happiness and mentoring. One internal challenge was to gradually
introduce static typing in the form of TypeScript, and to teach and educate the
rest of the team, and seeing the challenges others have with it.
I would also like to thank my former colleagues for all the great feedback I got :-)
While I haven’t done any direct management duties, I have learned a lot about
the bigger picture.</p>
<p>See, one priority of a company focused on a longtime project is to actually find
engineers that can maintain a projects over years to come, and to deliver
features in a timely fashion with a reasonable quality.</p>
<p>For other languages, there is a huge pool of engineers to hire from, but I do
have the feeling that most of them are quite novice. The cynic in me says that,
well, companies get what they pay for. And the salaries here in Vienna are not
that great, with most companies not willing to pay good money for good engineers.</p>
<p>There is a joke about Java, that it makes both experts and novice engineers into
mediocre engineers. I do feel the same about the JS community by now, where it
is very easy to find <em>some</em> engineers, but really hard to find truely <em>good</em> ones.</p>
<p>I am hopeful that this will be different with Rust. I just have the feeling that
both the language itself, as well as the larger community produce some really
<em>high quality</em> software, and just do things <em>right</em>. There is a lot of inspiration
that I take from the Rust ecosystem that I try to apply to TypeScript where
possible.</p>
<p>I do feel that knowing/following Rust has made me a better developer in general.
And this is the hope that I have for the future; that Rust <em>empowers</em> people
to become better.</p>
<p>Instead of lamenting the steep learning curve, and fighting the compiler, we
should rather see it as an opportunity to become better engineers.</p>
<hr />
<p>Hm, I might have digressed a little.
The point I wanted to make is that companies
need to have the <em>confidence</em> to find great talent that works with Rust, and
to have the <em>confidence</em> that betting on Rust is a good choice.
As developer, I want to have the <em>confidence</em> that learning Rust is a great
investment. That there are good employment opportunities, and that learning
Rust actually makes me a better engineer :-)</p>
<p>And well… maybe employers will someday realize they need to pay good salaries
to get good people :-)</p>
<h1 id="the-ecosystem-perspective"><a class="anchor-link" href="#the-ecosystem-perspective" aria-label="Anchor link for: the-ecosystem-perspective">#</a>
The ecosystem perspective</h1>
<p>From the ecosystem perspective, I can still see that large parts of it are
still very immature and experimental.
It is not as bad as JS, where there are memes about having a new framework every
day of the week.
For some usecases though, I feel there is just too much choice right now.</p>
<p>I want to be <em>confident</em> that the framework / library that I pick will be well
supported in the future, and that it is high quality.</p>
<p>I also want to be <em>confident</em> that whatever I do is correct. Here, it is very
important to have good defaults.</p>
<p>One specific thing I can point out that can be improved here is <code>wasm</code>
optimization. I see quite a few guides about this, which mess with compiler flags
and a variety of external tools to make wasm binaries smaller.
TBH, I don’t feel <em>confident</em> doing all that by hand. I want wasm code be optimal
by default, and to integrate well into the host systems exception mechanism etc.</p>
<h1 id="the-technical-perspective"><a class="anchor-link" href="#the-technical-perspective" aria-label="Anchor link for: the-technical-perspective">#</a>
The technical perspective</h1>
<p>From a technical perspective, I feel that Rust is generally on the right track.
Great progress has been made so far, and is still underway. Now it is just a
matter of getting things over the finish line. I will just dump a list of
projects here that are already well defined and I’m excited to read more progress
on them: rust-analyzer, salsa, demand-driven compiler, chalk, polonius, async in
traits, wasi, wasm interface types, etc…</p>
<p>I would like to thank the Rust teams that have recently started to give regular
status updates. It is really great to follow along!</p>
<p>Also, please do take your time! Good things need time! I wish for the Rust developers
to be <em>confident</em> in being able to deliver high quality features when they are
ready, without any time pressure :-)</p>
<h1 id="tldr"><a class="anchor-link" href="#tldr" aria-label="Anchor link for: tldr">#</a>
TLDR</h1>
<p>I want Rust to become the <em>obvious</em> choice when deciding on the technology for
a new project or a rewrite.</p>
<p>For this, I wish that enterprises can be <em>confident</em> to find good talent,
to be <em>confident</em> that the libraries and frameworks they choose are high quality
and well maintained.
I want both newcomer and experienced developers to be
<em>confident</em> that learning Rust is a great investment, and have the <em>confidence</em>
that they might find employment working with Rust.</p>
<p>Looking into the future, I hope that by 2021, rust will be <strong>mainstream</strong>, and
not only a niche thing that only enthusiasts know about.</p>
Comparing Cypress and Puppeteer2019-10-08T00:00:00+00:002019-10-08T00:00:00+00:00
Unknown
https://swatinem.de/blog/cypress-puppeteer/<p><em>Note:</em> I actually wrote most of this post 2 months ago when I did a deep dive
into comparing <a href="https://www.cypress.io/">cypress</a> and <a href="https://pptr.dev/">puppeteer</a>. Unfortunately I cannot give a clear
recommendation on either. You will have to make up your mind yourself, but I hope
I can help a bit by presenting the learnings I did with this experiment,
so here goes…</p>
<h1 id="cypress-api"><a class="anchor-link" href="#cypress-api" aria-label="Anchor link for: cypress-api">#</a>
Cypress API</h1>
<p>First of, one of the main selling points of cypress is that it is a very
convenient all-in-one solution for e2e testing.
It comes with its own way to structure your tests, and with its own test runner.</p>
<p>The tests themselves are also written in a very special way, which certainly
seems very strange and not quite intuitive at times.</p>
<p>One of the confusion comes from the fact that cypress actually hides the fact
that every interaction with the website is by definition asynchronous.</p>
<p>Cypress test code might frequently look like this:</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span>cy</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get</span><span>(</span><span style="color:#86b300;">"#A"</span><span>)</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">click</span><span>()</span><span style="color:#61676ccc;">;
</span><span>cy</span><span style="color:#ed9366;">.</span><span style="color:#f07171;">get</span><span>(</span><span style="color:#86b300;">"#B"</span><span>)</span><span style="color:#ed9366;">.</span><span style="color:#f29718;">should</span><span>(</span><span style="color:#86b300;">"be.visible"</span><span>)</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>First of all, I personally dislike the usage of <code>jQuery</code> for selectors and basically
everything else. Second, it uses <code>chai-jQuery</code> in the background, the way it does
that is horrible. cypress’ <code>should</code> method basically takes the chai chainable
<strong>as a string</strong>. So long type checking and typos!
I have also seen the usage of <code>.then()</code> inside some tests. But guess what, while
you can <code>return</code> a <code>Promise</code> from that function, the return value of that
function itself is <strong>not</strong> a <code>Promise</code>. You can’t <code>await</code> that.</p>
<p>It is just now that I write this blog post that I begin to understand how
cypress actually works. Essentially, you just can’t think of the test code you
write as real <em>code</em> that is <em>executed</em>. Rather, the code you write just
defines a list of commands and assertions. And cypress is free to run them and
re-run them however it sees fit. Wow, mind blown.</p>
<p>This is extremely unintuitive for someone who is used to write imperative async
code. You have no control over what is actually run when, and in which context.</p>
<h1 id="puppeteer-api"><a class="anchor-link" href="#puppeteer-api" aria-label="Anchor link for: puppeteer-api">#</a>
Puppeteer API</h1>
<p>Puppeteer is <strong>very</strong> different! To start with, puppeteer is not a testing
framework. It is an <strong>imperative, async</strong> API to automate a browser. Everything
else is up to you, such as using it for building e2e tests.</p>
<p>In this sense, the puppeteer API makes <strong>a lot more sense</strong>. You know exactly
<em>what</em> runs <em>when</em>, and you have control over the <code>context</code> your code runs in.
You have code which is evaluated <strong>inside</strong> the browser frame, and your normal
testing code runs <strong>outside</strong> the browser.</p>
<p>The distinction, at least to me, is very clear and <em>just makes sense</em>. However,
there are also some limitations and pitfalls to be aware of.
Puppeteer has so called <em>page functions</em> which are evaluated in the context of
the page/frame. But they are defined in your code just like normal JS functions.
But they <strong>can’t</strong> reference any values of their containing scope <strong>!!!</strong>.
You have to explicitly pass everything as additional parameter.
It can also lead to surprising and unintuitive errors when using
<a href="https://github.com/facebook/jest/issues/7962">code coverage</a>. At least there
well documented workarounds for this.
And with time you will get used to spotting and treating page functions
differently eventually.</p>
<p>There are two more pain points with the puppeteer API.
The first one is that the API is <em>too</em> async. You know you can extend Native
Promises to offer a conveniently chainable API right? Maybe I will blog about
that separately.</p>
<p>The other annoyance is that the API feels a bit inconsistent at times. Up until
<code><1.20</code>, some convenience methods like <code>page.select(selector, ...options)</code>
and <code>page.click(selector)</code> were only available on the <code>Page</code> object. It would
make a lot more sense to provide such helpers on a generic <code>Parent</code> or <code>Container</code>
type, which could be used to scope everything to a DOM subtree, such as a modal
dialog, because right now such scoping is a huge pain.</p>
<p>Combining such lazy chainable Promises with a <code>Container</code>-focused API, I could
imagine an API like this:</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="color:#fa6e32;">await </span><span>page
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f29718;">$</span><span>(</span><span style="color:#86b300;">"#my-modal"</span><span>)
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f29718;">$</span><span>(</span><span style="color:#86b300;">".some-other-container"</span><span>)
</span><span> </span><span style="color:#ed9366;">.</span><span style="color:#f07171;">click</span><span>(</span><span style="color:#86b300;">".nested-child"</span><span>)</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>As you can see, it is absolutely possible to chain methods onto a custom Promise
type and just await the final result.</p>
<p>But with great power also comes great responsibility. You definitely have more
control with puppeteer, but you also have to take care to correctly use it.
While cypress automatically retries commands until it hits a timeout, with puppeteer
I had to insert explicit <code>.waitForSelector</code> or <code>.waitForNavigation</code> calls quite
frequently.
While this might be tedious and inconvenient, it also makes sense. And it kind
of highlights that you should actually optimize your app for more instantaneous
interactions :-)</p>
<h2 id="running-non-headless"><a class="anchor-link" href="#running-non-headless" aria-label="Anchor link for: running-non-headless">#</a>
Running Non-Headless</h2>
<p>One big issue I had with puppeteer was the fact that it is basically a different
browser depending on if you run it headless vs when you really have a browser
window.</p>
<p>One problem was <strong>language</strong>. Running a headless puppeteer apparently has no
language at all. I don’t really know what <code>Accept-Language</code> header it provides,
but express’ <code>.acceptsLanguages()</code> turns it into <code>["*"]</code>, which revealed
<a href="https://github.com/eversport/intl-codegen/issues/39">a bug</a> in a library that I
maintain, which I then promptly fixed.</p>
<p>Anyhow, <em>headless</em> puppeteer has <em>no</em> language by default, and setting it needs
to be done via the startup parameter <code>--lang=en</code>.
But that on the other hand does not work with <em>non-headless</em> puppeteer, which
instead uses the <code>LANG</code> environment variable.
Well this took me some time to figure out, so as a recommendation, you better
set an explicit language via both startup parameter as well as env.</p>
<h2 id="working-with-multiple-tabs"><a class="anchor-link" href="#working-with-multiple-tabs" aria-label="Anchor link for: working-with-multiple-tabs">#</a>
Working with multiple tabs</h2>
<p>One thing that puppeteer supports, which is not possible in cypress is to run
multiple <code>Page</code> objects / tabs in parallel, well kind of.
In <em>headless</em> puppeteer, you can do that mostly without problems. But when
running in <em>non-headless</em> mode, only the currently focused foreground tab will
actually do anything. Background tabs will just hang indefinitely.
To make it work, you will have to call <code>page.bringToFront()</code> every time you want
to switch focus between pages. And of course make sure that your testing
framework of choice does not run multiple tests in parallel.
This has also caused my a lot of headaches. Depending on what you want to
test, and what testing tools you are using, it might not be worth the hassle to
use multiple pages.
So essentially when your usecase is to run E2E tests, you should try to work with
only one page object / tab.</p>
<h2 id="integrating-with-jest"><a class="anchor-link" href="#integrating-with-jest" aria-label="Anchor link for: integrating-with-jest">#</a>
Integrating with Jest</h2>
<p>Speaking of tools, since we use Jest for all of our other tests, I thought it
would be a good idea to stick with it, since most engineers are already
familiar with it. So I went ahead and set up <a href="https://github.com/smooth-code/jest-puppeteer">jest-puppeteer</a>, which was a bit
tedious but otherwise quite straight forward.</p>
<p>Since we have other limitations about testing a website running in a separate
process, and the tests not being independent of each other in the first place,
I went with running jest with <code>--runInBand</code> anyway.
But coming back to what I just said about being limited to only one tab that has
focus, I’m not quite sure how <em>non-headless</em> mode would actually work with the
normal way that jest splits up tests into multiple worker processes.</p>
<p>Oh, and I also
<a href="https://github.com/smooth-code/jest-puppeteer/issues/272">filed a bug</a> with
<code>expect-puppeteer</code> which fails to work when using a different <code>Page</code> instance.</p>
<p>Another really severe bug I found was in jest itself, which just
<a href="https://github.com/facebook/jest/issues/8688">ignores thrown errors in async <code>beforeAll</code></a>.
Wow!</p>
<p>What I also noticed is that sometimes the stack traces of errors are just swallowed up
somehow. Not sure why, but I get hit by the infamous
<code>Node is either not visible or not an HTMLElement</code> quite often without knowing
which command, selector or element is responsible because the error has no stack trace.
This makes it a nightmare to debug. Especially if things run fine for 90% of the
time when run locally but fail all the time when run on CI.</p>
<p>But the problem of unreliable and flaky tests can happen with any tool. Both the
website you are testing as well as the test code itself just needs to written
in a way that either minimizes random failure, or explicitly optimize for it.</p>
<h1 id="conclusion"><a class="anchor-link" href="#conclusion" aria-label="Anchor link for: conclusion">#</a>
Conclusion</h1>
<p>It has definitely been a bumpy ride, but I learned a lot. In the end I am still
quite disappointed with the current state of tools.
I am also still not very confident in all of this considering that it took quite
some time to get tests to pass on CI that were successful locally.</p>
<ul>
<li>
<p>To summarize, puppeteer definitely has the more intuitive imperative async API.</p>
<p>IMO, a declarative test syntax such as cucumber can make a lot of sense, but
not when you are writing JS code and really expect things to be imperative.</p>
</li>
<li>
<p>Puppeteer is a lot less opinionated, so you can use whatever test runner and
assertion library you want.</p>
<p>Which ofc means that you have to invest time into that. Also, I am not quite
happy with <code>jest-puppeteer</code> and <code>expect-puppeteer</code>, so I might recommend to
just roll your own.</p>
</li>
<li>
<p>Puppeteer also forces you to be more explicit, especially around the
different cases of <code>waitForXXX</code>. While this might be more tedious at first,
I think in the end its a good thing to think about this, and to optimize the
app under test itself to avoid long wait times.</p>
</li>
<li>
<p>Really think about if you want to use multiple tabs. It might not be worth the
hassle. Here also the app we are testing is the problem, because the app loses
state when you refresh the page or open the same URL in a different tab.</p>
</li>
<li>
<p>The debugging experience, at least with the other tools I use puppeteer with
is horrible. As I said in the previous section, I had to struggle a lot with
errors that had absolutely no context, which makes it impossible to debug.</p>
</li>
<li>
<p>In the end, the choice is yours. I might still prefer cypress when the goal is
to write E2E tests. The visual test runner that you can pause, and inspect the
real DOM is really convenient to write and debug testcases. The automatic video
recordings also add incredible value for tests run on CI.</p>
</li>
<li>
<p>I just wish cypress would have done better technological choices. I mean it
largely uses jQuery and is itself still largely written in coffeescript FFS.
Also it does not yet have any predictable release schedule.
I hope electrons move to a predictable release schedule will propagate to all
the projects that depend on it.</p>
</li>
<li>
<p>Oh, and another note on cypress is that its pricing for its premium service
is based around the number of <em>test recordings</em>, which they define as:
<code>[…] each time the it() function is called […]</code>. This definition is totally
broken because you can easily game it by just putting <em>everything</em> into one
giant <code>it</code> function, which runs contrary to the general notion of keeping your
test cases as small as possible. A much better metric would be total runtime
or something like that, since they save all your video recordings which scale
with the runtime of your tests.</p>
</li>
</ul>
Lets talk about Pagination2019-08-06T00:00:00+00:002019-08-06T00:00:00+00:00
Unknown
https://swatinem.de/blog/pagination/<p>Just recently, I have had to tackle some quite challenging problems related to
pagination. And during some discussions I noticed that some concepts are not
yet clear to some engineers. So I will try to explain all of these.</p>
<p>I will focus on both <strong>page-based</strong>, as well as <strong>cursor-based</strong> pagination,
and explain how both methods work. I will also focus on some common
questions an API consumer might have, and how easy it is to implement, with a
special look at SQL. And lastly, I will look at how these pagination methods
behave when the data itself changes in certain ways.</p>
<h1 id="prerequisites"><a class="anchor-link" href="#prerequisites" aria-label="Anchor link for: prerequisites">#</a>
Prerequisites</h1>
<p>To make any kind of pagination useful at all, we need to have a sorted list of
entities with some kind of stable sorting order. We do not want the entities to
randomly re-order, unless we explicitly change entities inside of that list.</p>
<p>Lets try to visualize such list, so behold my awesome ascii box drawing skills:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>E = Entity, K: Sort Key, W: Pagination Window, C: Cursor
</span><span> ┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
</span><span>E: │ A│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
</span><span> ├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
</span><span>K: │ 2│ 3│ 5│ 7│ 9│10│10│15│20│28│99│
</span><span> └──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
</span><span> └ W ┘ ↑
</span><span> C:9
</span></code></pre>
<p>So here we have 11 <em>Entities</em>, sorted by their <em>Sort Key</em>. And we a view into
that list, which I call the <em>Pagination Window</em>. I will also talk about a
left-to-right traversal order, so anything <em>left</em> in the drawing can also be
called <em>before</em> or <em>front</em>, and likewise <em>after</em> or <em>end</em> for the things on
the right.
Also, to make things easier, I will assume a <em>Pagination Window</em> of <code>2</code> elements.
And we assume that we are walking this list from front to back.</p>
<p>We also have some meta information that we want to know about, as stated above.</p>
<ul>
<li>Are there <em>any</em> items <em>before</em> the pagination window, and <em>how many</em>?</li>
<li>Are there <em>any</em> items <em>after</em> the pagination window, and <em>how many</em>?</li>
<li>Can I <em>jump</em> to an <em>absolute</em> or <em>relative</em> position in the list? <em>Absolute</em>
here means to the item <em>at position N</em>, and <em>relative</em> means to the item
<em>with key K</em>.</li>
</ul>
<p>I will also consider a few mutations to the list, such as:</p>
<ul>
<li>Adding a new entity in <em>front</em> of the pagination window</li>
<li>Removing an entity <em>before</em> the pagination window</li>
<li>Moving an entity</li>
</ul>
<p>As we are walking the list front to back, adding/removing new entities to
the <em>back</em> of the list are neither observable, nor do we really care about them.
Unless its a <em>move</em>.</p>
<h1 id="page-based-pagination"><a class="anchor-link" href="#page-based-pagination" aria-label="Anchor link for: page-based-pagination">#</a>
Page-Based Pagination</h1>
<p>Implementing this kind of pagination is very simple, you just split the list
into equally-sized slices called pages. Obviously, the pagination window is the
page size. Depending on the implementation, you might also chose to allow an
arbitrary offset, but for simplicity we will not.</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
</span><span>│ A│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
</span><span>├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
</span><span>│ 2│ 3│ 5│ 7│ 9│10│10│15│20│28│99│
</span><span>└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
</span><span> └ 1 ┘ └ 2 ┘ └ 3 ┘ └ 4 ┘ └ 5 ┘ └ 6 ┘
</span></code></pre>
<ul>
<li>Answering <em>how many</em> items are <em>before</em> the pagination window is trivial:
It is exactly <code>(p - 1) * w</code> where <code>p</code> is the page number and <code>w</code> is the
page size.</li>
<li>Answering <em>how many</em> items are <em>after</em> the pagination window is trickier, as
we need the total number of items, <code>N</code>: <code>N - (p * w)</code>.</li>
<li>There is one neat shortcut if you only care about <em>if</em> there are <em>any</em> items
<em>after</em> the pagination window. Just select <code>w + 1</code> items, and if the resulting
slice has more than <code>w</code> items, you know there are items following the
pagination window. Just remove that superfluous item and you are done.</li>
<li>Jumping to an <em>absolute</em> position in the list is trivial, it is on page
<code>floor(p / w) + 1</code>, assuming zero-indexed position <code>p</code>.</li>
<li>Jumping to a <em>relative</em> position is non-trivial, it basically requires you to
do a binary search, usually in <code>O(log n)</code> time.</li>
</ul>
<h2 id="changing-data"><a class="anchor-link" href="#changing-data" aria-label="Anchor link for: changing-data">#</a>
Changing Data</h2>
<p>One problem with page-based pagination is how it reacts to changes in the data.</p>
<p>Lets look at insertion and deletion <em>before</em> the pagination window.</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
</span><span>│ A│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
</span><span>├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
</span><span>│ 2│ 3│ 5│ 7│ 9│10│10│15│20│28│99│
</span><span>└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
</span><span> └ W ┘
</span><span> ↓ inserted here
</span><span>┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
</span><span>│ X│ A│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
</span><span>├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
</span><span>│ 1│ 2│ 3│ 5│ 7│ 9│10│10│15│20│28│99│
</span><span>└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
</span><span> └ W ┘
</span></code></pre>
<p>As you can see, we moved one page right, but the whole list was shifted by
one position, which means we see <code>F</code> twice. Not good, not terrible.</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
</span><span>│ A│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
</span><span>├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
</span><span>│ 2│ 3│ 5│ 7│ 9│10│10│15│20│28│99│
</span><span>└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
</span><span> ↑ └ W ┘
</span><span> └ removed this
</span><span>┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
</span><span>│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
</span><span>├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
</span><span>│ 3│ 5│ 7│ 9│10│10│15│20│28│99│
</span><span>└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
</span><span> └ W ┘
</span></code></pre>
<p>Again, the whole list was shifted, but in this case we have a more severe
problem, since element <code>G</code> was skipped completely.</p>
<h2 id="implementation"><a class="anchor-link" href="#implementation" aria-label="Anchor link for: implementation">#</a>
Implementation</h2>
<p>Implementing page-based pagination is really trivial, both in memory, as well as
in SQL, as you have the dedicated syntax <code>LIMIT</code> / <code>OFFSET</code> exactly for this
use case. However, jumping to a relative position is complex.</p>
<h1 id="cursor-based-pagination"><a class="anchor-link" href="#cursor-based-pagination" aria-label="Anchor link for: cursor-based-pagination">#</a>
Cursor-Based Pagination</h1>
<p>With cursor based pagination, you have a <em>Cursor</em> that points to a position
on the <strong>sort axis</strong>. With cursors, you can usually select items <em>before</em> and
<em>after</em> the cursor, since we focus on left to right traversal, we will only
consider items <em>after</em> the cursor.</p>
<p>Navigating using cursors is possible when in addition to the item itself, we
return the <code>cursor</code> of the item as well. In the next step, we set our new cursor
to the cursor of the <em>last</em> item, and query again.</p>
<p>One popular example of cursor-based pagination is the
<a href="https://facebook.github.io/relay/graphql/connections.htm">relay connections</a>
specification which is popular when dealing with graphql.</p>
<h2 id="unique-cursors"><a class="anchor-link" href="#unique-cursors" aria-label="Anchor link for: unique-cursors">#</a>
Unique Cursors</h2>
<p>One important restriction cursor based pagination has is that the cursor itself
needs to be <em>unique</em>! Why? Lets demonstrate with an example.</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
</span><span>│ A│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
</span><span>├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
</span><span>│ 2│ 3│ 5│ 7│ 9│10│10│15│20│28│99│
</span><span>└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
</span><span> ↑ └ W ┘
</span><span> C:7
</span><span>┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
</span><span>│ A│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
</span><span>├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
</span><span>│ 2│ 3│ 5│ 7│ 9│10│10│15│20│28│99│
</span><span>└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
</span><span> ↑ ↑ └ W ┘
</span><span> C:10
</span></code></pre>
<p>Here, first we were at cursor <code>7</code>, and selected 2 items <em>after</em> it. No problem
so far. The last item <code>F</code> has the cursor <code>10</code>.
Lets move one step further, selecting 2 items <em>after</em> <code>F</code>, using its cursor <code>10</code>.
Now we have the problem that we skipped <code>G</code>, because its cursor is duplicated
to the one of <code>F</code>.</p>
<p>So depending on your use case, if the property you want to sort by is not unique,
such as a typical <em>modification time</em>, you have to combine it with another
property that is, such as <em>uuid</em>.</p>
<h2 id="getting-at-metadata"><a class="anchor-link" href="#getting-at-metadata" aria-label="Anchor link for: getting-at-metadata">#</a>
Getting at metadata</h2>
<p>Answering some meta questions becomes a lot more complicated with cursor based
pagination.</p>
<ul>
<li>Knowing <em>how many</em> items are <em>before</em> or <em>after</em> the pagination window
requires you to <em>actually count them</em>.</li>
<li>Depending on how the cursor itself is implemented, it might be easier to
count <em>all</em> the items and use math: <code>before = all - after - w</code>.</li>
<li>The same shortcut to know <em>if</em> there are any items <em>after</em> the pagination
window also applies to cursor-based pagination. Just select <code>w + 1</code> elements,
and if the result set has more than <code>w</code>, there <em>are</em> items <em>after</em> the
pagination window.</li>
<li>I also found a shortcut to know <em>if</em> there are any items <em>before</em> the cursor.
Here we rely on the fact that the cursor itself needs to be unique. So instead
of selecting only items following the cursor, I use an <em>inclusive</em> selection
which might also return the item that <em>exactly</em> matches the cursor. If it does,
I know there are items <em>before</em> the cursor. Also make sure to remove this
superfluous item. If there is no item matching the cursor <em>exactly</em>, you have
to fall back to counting however.</li>
<li>Jumping to a <em>relative</em> position in the list is essentially the same as normal
cursor based navigation. Your cursor just happens to be the relative position
you want to jump to.</li>
<li>Jumping to an <em>absolute</em> position however is more complex. I think it can be
done with a binary search, but each iteration would need to answer the question
<em>what is my absolute position</em>, which itself is not a trivial question as shown
above.</li>
</ul>
<h2 id="changing-data-1"><a class="anchor-link" href="#changing-data-1" aria-label="Anchor link for: changing-data-1">#</a>
Changing Data</h2>
<p>As the cursor denotes a <em>relative</em> position inside the list, it is immune to
changes in the data. Even removing the element under the cursor is not a problem.</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
</span><span>│ A│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
</span><span>├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
</span><span>│ 2│ 3│ 5│ 7│ 9│10│12│15│20│28│99│
</span><span>└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
</span><span> └ W ┘
</span><span> ↑ removed this
</span><span>┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
</span><span>│ A│ B│ C│ D│ E│ G│ H│ I│ J│ K│
</span><span>├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
</span><span>│ 2│ 3│ 5│ 7│ 9│12│15│20│28│99│
</span><span>└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
</span><span> ↑└ W ┘
</span><span> C:10
</span></code></pre>
<p>Since the cursor of <code>F</code> is <code>10</code>, we can still query everything <em>after</em> <code>10</code>.</p>
<h2 id="implementation-1"><a class="anchor-link" href="#implementation-1" aria-label="Anchor link for: implementation-1">#</a>
Implementation</h2>
<p>Since cursors themselves give a <em>relative</em> position inside the list, implementing
this in-memory requires doing a binary search.</p>
<p>Implementation in SQL is also quite simple, at least for a simple non-compound
cursor. You just apply a simple <code>WHERE</code> condition.
I have found
<a href="https://stackoverflow.com/questions/38017054/mysql-cursor-based-pagination-with-multiple-columns/38017813#38017813">this stack overflow post</a>
to be a really good in explaining how to deal with compound cursors.</p>
<p>In general, after implementing cursor based pagination, together with the
optimizations explained above, I can say it is a lot of work, with a lot of
details you have to pay attention to.</p>
<h1 id="conclusion-and-recommendations"><a class="anchor-link" href="#conclusion-and-recommendations" aria-label="Anchor link for: conclusion-and-recommendations">#</a>
Conclusion and Recommendations</h1>
<p>Lets repeat some important things.</p>
<p>First of all, both approaches require you to have a stable sort order.
Furthermore, cursor-based pagination additionally needs to have a <em>unique</em>
sort key for each entity, that you also have to expose.</p>
<p>So for this reason, good performance requires that you either have the data
pre-sorted, or have some kind of index, especially in the case of SQL.</p>
<p>This also means that sorting by some computed property or one that is not indexed
can result in bad performance.</p>
<p>Cursor-based pagination makes <em>relative</em> jumps easier, while page-based
pagination makes <em>absolute</em> jumps easier.</p>
<p>Both methods make it easy to implement the typical <em>previous</em> / <em>next</em>
navigation. But cursor-based navigation makes it more resilient to changing data.</p>
<p>Like always, it is a matter of tradeoffs and use-cases.</p>
<ul>
<li>Do you want to support <em>absolute</em> jumps, or <em>page links</em>, go with <em>page-based</em>.</li>
<li>Do you want to support <em>relative</em> jumps, go <em>cursor-based</em>.</li>
<li>Do you want to have more consistent data when having typical <em>previous</em> / <em>next</em>,
buttons, use <em>cursor-based</em> pagination.</li>
<li>Do you want something that is both easy to use and to implement? And also
possibly more performant? Go <em>page-based</em>.</li>
</ul>
<p>And also summarize the problems around changing data:</p>
<ul>
<li>Considering left-to-right traversal, both are fine with <em>appending</em> data.</li>
<li><em>Removing</em> data can lead to skipped items when using <em>page-based</em> pagination.</li>
<li><em>Prepending</em> data can lead to <em>double</em> processing of items when using <em>page-based</em> pagination.</li>
</ul>
<p>Considering these limitations, when you are dealing with <em>prepend only</em> data,
starting at the front again once you finished traversing should make sure you
get <em>all</em> the data.</p>
<hr />
<p>After implementing both page-based and cursor-based pagination, and now that I
have written down all my thoughts about it, I do feel a bit more comfortable
with the choice we made for <em>cursor-based</em> pagination.
But for <em>one reason only</em>: the data we return depends on <em>the current time</em>, so
it can have unpredictable data removal, which can lead to missed items when
using page-based pagination as shown above.</p>
<p>Otherwise I would probably go with <em>page-based</em> pagination, for reasons of
performance and ease of implementation. Which also reminds me to rant about the
fact that cursor-based pagination is really complicated to implement using SQL.</p>
<p>One thing we still struggle with is performance. Not because of the pagination
method we chose but rather that our sort order depends on a non-indexed compound
value. Meh. So we come full circle to the matter of caching and cache invalidation :-)</p>
Announcing intl-codegen 22019-07-10T00:00:00+00:002019-07-10T00:00:00+00:00
Unknown
https://swatinem.de/blog/intl-codegen-2/<p>I have been thinking for a long time about how <a href="https://github.com/eversport/intl-codegen">intl-codegen</a> 2 would look like,
and some time ago I went about implementing it.
Since then I have validated the concepts by migrating the <a href="https://eversports.com">eversports</a> codebase
to it. The migration was quite painless, with some mechanical steps.</p>
<p>Since it is a proper <em>version 2</em>, it does have some breaking changes, together
with some exciting features, so lets dive in.</p>
<h1 id="changes-in-v2"><a class="anchor-link" href="#changes-in-v2" aria-label="Anchor link for: changes-in-v2">#</a>
Changes in v2</h1>
<h2 id="fluent-syntax-support"><a class="anchor-link" href="#fluent-syntax-support" aria-label="Anchor link for: fluent-syntax-support">#</a>
Fluent syntax support</h2>
<p>One very big item here is support for the <a href="https://projectfluent.org/">fluent</a> syntax. Well, limited support
that is. Fluent has some features that are not yet supported by <code>intl-codegen</code>,
but might be in the future. One example is missing support for <code>terms</code> and
referencing other messages. Another feature that is missing is support for fluent
<code>attributes</code>.</p>
<p><code>MessageFormat</code> is still supported, and has some nice improvements in this
release.</p>
<p>But in general, I consider fluent to be the better format in general, and I will
likely drop <code>MessageFormat</code> support at some point when the tooling around fluent
matures.</p>
<h2 id="proper-typing-support"><a class="anchor-link" href="#proper-typing-support" aria-label="Anchor link for: proper-typing-support">#</a>
Proper typing support</h2>
<p>Another very big item is support for proper types. Every placeholder that is
used in translations needs to be declared beforehand. The easiest way to do so
is via doc comments in fluent syntax.
There is a <a href="https://github.com/projectfluent/fluent/issues/140">proposal</a> to
properly add these type of comments to the fluent syntax, but it is not final
yet.</p>
<pre data-lang="fluent" style="background-color:#fafafa;color:#61676c;" class="language-fluent "><code class="language-fluent" data-lang="fluent"><span># $value (monetary)
</span><span>fluent-monetary = a monetary value: { $value }
</span></code></pre>
<p>So far, it supports the types <code>string</code>, <code>number</code>, <code>datetime</code>, <code>monetary</code> and
<code>element</code>. Together with these changes, there was also a split to separate the
message <code>template</code> declaration from the translations.</p>
<p>When using <code>MessageFormat</code>, there is an explicit API to declare messages and
the placeholder types.</p>
<p>Having type declarations improves both the typescript side of things, since it
gives better code completion and errors.
But it also made it possible to better check the correctness of the translations
themselves.
At <a href="https://eversports.com">eversports</a>, we have a small team of translators, which are not engineers
and do struggle with the <code>MessageFormat</code> syntax a bit and sometimes translate
parts of the syntax itself.</p>
<p>I plan to further improve this, such as validating the plural rules, since
translators that struggle with the syntax actually translate the <code>one</code> or <code>other</code>
selectors.</p>
<p>Some examples of useful errors:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>test.tsx (15,24): Type 'number' is not assignable to type 'string'.
</span><span>[wrong-type: template/msgfmt-string-as-plural]: Messageformat `plural` selector is only valid for type "number", but parameter `param` has type `string`.
</span><span>> 1 | {param} {param,plural,
</span><span> | ^^^^^^^^^^^^^
</span><span>> 2 | one {parameter}
</span><span> | ^^^^^^^^^^^^^^^^^
</span><span>> 3 | other {parameters}
</span><span> | ^^^^^^^^^^^^^^^^^
</span><span>> 4 | }
</span><span>| ^^
</span><span>[missing-other: template/msgfmt]: MessageFormat requires an `other` case to be defined.
</span><span>> 1 | selector: {param, select, foo {its foo} bar {its bar}}.
</span><span> | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
</span></code></pre>
<p>All these checks have actually caught some real bugs, both in our typescript
code and especially inside of the translation strings themselves.</p>
<h2 id="proper-language-detection"><a class="anchor-link" href="#proper-language-detection" aria-label="Anchor link for: proper-language-detection">#</a>
Proper language detection</h2>
<p>One pain points with <code>intl-codegen 1.x</code> was that it was only able to load the
defined list of locales, and one had to build language detection around it.
Version 2 now ships with a small runtime that uses <a href="https://www.npmjs.com/package/fluent-langneg">fluent-langneg</a> to do proper
language detection, based on either the <code>Accept-Language</code> header,
or the <code>navigator.languages</code> property.</p>
<h2 id="better-formatting-and-pluralization"><a class="anchor-link" href="#better-formatting-and-pluralization" aria-label="Anchor link for: better-formatting-and-pluralization">#</a>
Better formatting and pluralization</h2>
<p>Related to this, version 1 also had a severe design limitation, as it hardcoded
the formatting based on the translation. So it would use the same formatting
for the <code>de</code> language, even though the formatting differs quite a bit based on
the locale. When formatting monetary values, you have <code>1.234,56 €</code> for <code>de-de</code>,
<code>€ 1.234,56</code> for <code>de-at</code> and <code>CHF 1’234.56</code> for <code>de-ch</code>, but it is still the
same language.</p>
<p>In version 2, the <code>loaded</code> language and the locale used for the <code>formatter</code> are
now decoupled, so you should always get the correct formatting for the locale
you requested.</p>
<p>Apart from formatting, version 2 also has proper support for pluralization. This
means you can use the <code>one</code> selector instead of an explicit <code>=0</code>, or any of the
other <strong>6</strong> cases. There is also support for <code>ordinal</code> cases. This all depends
on platform support for <code>Intl.PluralRules</code>, so the developer needs to provide
appropriate polyfills if support for old platforms is a priority.</p>
<h2 id="split-out-react-support"><a class="anchor-link" href="#split-out-react-support" aria-label="Anchor link for: split-out-react-support">#</a>
Split out react support</h2>
<p>All the react-specific codegen is now output into a separate <code>react</code> file. So
it is possible to use <a href="https://github.com/eversport/intl-codegen">intl-codegen</a> without react, at least when you do not
declare <code>element</code>-type placeholders.</p>
<h1 id="some-recommendations-and-best-practices"><a class="anchor-link" href="#some-recommendations-and-best-practices" aria-label="Anchor link for: some-recommendations-and-best-practices">#</a>
Some recommendations and best practices</h1>
<p>There are some clear dos and don’ts that pop into your eyes when you are
involved with localization tools, which are not that obvious to other engineers.</p>
<h2 id="put-everything-into-translations"><a class="anchor-link" href="#put-everything-into-translations" aria-label="Anchor link for: put-everything-into-translations">#</a>
Put <em>everything</em> into translations</h2>
<p>I still often see engineers that are translating single words, and then building
those fragments into sentences in code.
A constructed example would be <code>_("Hello, ") + name + _(". How are you?")</code>.
A more common, and less obvious, case is when you are just combining formatted
values with some whitespace and punctuation such as <code>{date} - {time}</code>.</p>
<p>The problems you could potentially have here are not that obvious if you are
primarily working with germanic languages. But there are other languages out
there, which change the order of some placeholders based on grammar. Or
left-to-right languages. Some languages may want to use different punctuation
symbols. And so on… So the easiest thing to do is to just put <strong>everything</strong>
into a translation string. Also your translators will thank you, because it
gives them both more freedom and more context to know what needs to be done.</p>
<h2 id="use-formatters"><a class="anchor-link" href="#use-formatters" aria-label="Anchor link for: use-formatters">#</a>
Use formatters</h2>
<p>Similar to the case above, I still see engineers that are not using formatters
properly. Things like formatting a <code>datetime</code> value ahead-of-time, and putting
it into the translation as a <code>string</code>.</p>
<p>Or the quite frequent case where engineers are not aware of the builtin
<code>monetary</code> support, and are creating translation strings such as
<code>{value}{currency}</code>, which will be wrongly formatted for 2 out of the 3 german
languages I highlighted above.</p>
<p>One problem, both related to formatters and to translation context is
<code>element</code>-type placeholders.
I considered experimenting with a feature called <a href="https://github.com/eversport/intl-codegen/issues/15">DOM Overlays</a>, but decided to
postpone it to later. Essentially, DOM Overlays would give a much larger context
to translators, and would make it more easily possible to put some placeholders
into styled elements, with proper typing support. Maybe :-D</p>
<h2 id="establish-guidelines"><a class="anchor-link" href="#establish-guidelines" aria-label="Anchor link for: establish-guidelines">#</a>
Establish guidelines</h2>
<p>Apart from the two cases above, to put <em>everything</em> into the translations, and
to properly use formatters. There is also the question of how to structure
translations. How to name them? How to deal with conditional placeholders?</p>
<p>How should you name your translation keys?
<a href="https://github.com/eversport/intl-codegen">intl-codegen</a> is a little bit opinionated already in this regard. Mainly because
there is a syntactic difference between identifiers in js, and translation keys.
<code>a-translation-key</code> will become <code>aTranslationKey()</code>.
In general, I would recommend using dashed translation keys, as in this example.
Give the keys descriptive names that give some context. Do not name the key
<code>continue</code>, but rather <code>registration-finished-continue-button</code>, or something. :-D</p>
Database Access2019-06-18T00:00:00+00:002019-06-18T00:00:00+00:00
Unknown
https://swatinem.de/blog/database-access/<p>I had the pleasure recently of looking at a few js/typescript solutions to
database access, mostly focused on SQL.</p>
<p>I was tasked with evaluating <a href="https://typeorm.io/">TypeORM</a>, which is gaining quite some traction,
in comparison to <a href="https://knexjs.org/">knex</a>, which we currently use.</p>
<p>While knex is more focused on being a generic query-builder, typeorm is a fully
featured ORM that can model entities and their relation, but with additional
functionality for query building.</p>
<p>Some people may know that I’m a big rust fan, even though I still didn’t manage
to write any significant amount of production code in that language.
But the broader rust community does a lot of things really well, one of them
is <a href="https://diesel.rs/">diesel</a>, the go-to database access and query builder library. I will also
mention some things about diesel and what we can learn from it.</p>
<h2 id="type-safety"><a class="anchor-link" href="#type-safety" aria-label="Anchor link for: type-safety">#</a>
Type Safety</h2>
<p>One of the major problems with <a href="https://knexjs.org/">knex</a> in particular is that it is inherently
untyped. Everything is based on strings, which can either break because of typos,
or because of refactoring mistakes. Also the return value of a knex query is
<code>any</code> by definition, which is really bad. Engineers will have to manually type
the result, which of course is prone to bugs and type mismatches.</p>
<p>This is one of the points that is even embedded in the name of <a href="https://typeorm.io/">typeorm</a>, but
it does not quite deliver on its promise. While yes, when you work with decorated
entities, you have strict typing for results and for <em>simple</em> find conditions.
But once you dig into the lower level querybuilder, things also become
<code>stringly-typed</code>, and all bets are off. Not quite what I had hoped for.</p>
<h2 id="code-organization"><a class="anchor-link" href="#code-organization" aria-label="Anchor link for: code-organization">#</a>
Code Organization</h2>
<p>Another problem we have is that our business domain is very broad, and each feature
we need to build needs to hook very deeply into the database. Apart from that,
there are also challenges around organizing the code itself. Where to put code,
if and when to use query builders, how to make the code more maintainable etc.
A lot of these problems are also very related to the API that the database
access library provides.</p>
<p>Here, both <a href="https://knexjs.org/">knex</a> and <a href="https://typeorm.io/">typeorm</a> are coming up short.
What I essentially want, is to define an <em>abstract representation</em> of what I want
to query, and then execute it.
Well for <a href="https://knexjs.org/">knex</a>, I need an established connection to be able to use the query
builder. For <a href="https://typeorm.io/">typeorm</a> as well, One only gets an <code>entity manager</code> or <code>repository</code>
from an established connection.
Things are even worse when it comes to transactions.
Also, is the connection or transaction <em>per request</em> scoped? I think it is, in
order to offer better consistency guarantees. It would certainly be bad if one
query hits the <code>master</code> inside of a transaction and a different query hits a
<code>read replica</code>. Good luck debugging the resulting problems.</p>
<p>In contrast, I quite like how in <a href="https://diesel.rs/">diesel</a>, you can statically define your queries,
which then have a <code>execute(connection)</code> method. There, the application author
has more control over which queries run on which connection.</p>
<p>This plays a lot nicer together with prepared statements.
Essentially, the SQL server has to parse a query, run it through the query
optimizer and then run it with some parameters. Wouldn’t it be nice if we could
re-use the first two steps over a few executions? Just like how the JS JIT engines
can optimize some hot code better if it runs more frequently. I’m no expert in
neither SQL servers, nor the libraries I am writing about here.
But as far as the JS APIs are structured, it seems like they are building
a fresh query string each and every time you call the query builder. Which is
bad for the performance of JS code in the first place, but I think the SQL side
of things could also be improved with a better structure.</p>
<h2 id="performance-pitfalls"><a class="anchor-link" href="#performance-pitfalls" aria-label="Anchor link for: performance-pitfalls">#</a>
Performance Pitfalls</h2>
<p>One problem we face increasingly with the increasing amounts of data we handle
is fetching too much data.
Here, all three APIs that I mentioned mostly offer the same methods to access
your data, lets call them <code>findOne</code> and <code>findAll</code>. None of the APIs has a streaming
interface as first class citizen. Which is quite bad, because you can quite
easily DDOS a complete service when one query fetches so much data at once that
the complete process runs OOM.
While I do love <a href="https://diesel.rs/">diesel</a> in general, it pre-dates rusts <code>async/await</code> story by
quite some time, and only offers <em>synchronous</em> calls, which requires you to
manually manage a thread pool. But I hope this will all be solved when <code>async/await</code>
in rust becomes stable.
Also, async iteration is only supported since <code>node 10</code>, and the js projects are
also around for quite a longer time before that.</p>
<p>What I would like essentially is an API that at least has first class support
for streaming, or even goes one step further and offers <em>only</em> streaming in its
base API. Things like <code>findOne</code> and <code>findAll</code> can be built on top of that.</p>
<p>Also, none of the APIs offer a good solution to pagination, but more on that
later.</p>
<h2 id="do-one-thing-and-do-it-right"><a class="anchor-link" href="#do-one-thing-and-do-it-right" aria-label="Anchor link for: do-one-thing-and-do-it-right">#</a>
Do one thing and do it right.</h2>
<p>Something else that kind of bothers me, is that both js projects have a far too
broad scope. Yes, <a href="https://typeorm.io/">typeorm</a> is a fully featured ORM, but still both come with
solutions for managing migrations, and additional features that a focused
project could serve better.</p>
<h1 id="dreaming-up-an-ideal-api"><a class="anchor-link" href="#dreaming-up-an-ideal-api" aria-label="Anchor link for: dreaming-up-an-ideal-api">#</a>
Dreaming up an ideal API</h1>
<p>So with all these things in mind, lets dream up an API that can do better than
what we have right now. First things first, lets focus on one thing only:
Creating a type safe query builder, which makes it possible to deliver good
performance and flexibility for developers.</p>
<p>Already more than half a year ago, I experimented a bit with creating a
<a href="https://diesel.rs/">diesel</a>-like project in typescript called <a href="https://github.com/Swatinem/motorina/">motorina</a>. (Some people will get the
reference :-)</p>
<p>Sadly enough, I haven’t had any time or motivation to continue that effort.
Its most important goal was to be completely typesafe, and convenient as well.
It should make typos impossible and type mismatches impossible.</p>
<p>A second goal was to be high performance. I wanted to create <em>abstract queries</em>
on the toplevel scope, which have a strictly typed set of <em>placeholders</em> and a
strictly typed result, even with the possibility to define custom type conversions.
Since well you know, mysql does not even have a native boolean type, and it does
not map well to typescript enums.</p>
<p>Defining <em>abstract queries</em> in the toplevel scope also decouples queries from
connections. I don’t really want to care if a query runs on a read-only replicate,
or inside a transaction.
This should also in theory play very well with prepared statements. You can cache
the query on a per-connection basis, and reuse it a thousand times, with different
parameters each time.</p>
<p>One problem here is the sql data model itself, which at least for mysql does not
allow to actually pass in an <em>array</em> for a <code>IN(?)</code> placeholder. Meh :-(</p>
<p>Apart from the query builder itself, I think a connection wrapper, which exposes
a streaming interface as first class citizen, with maybe helpers to just
return a single entity might also be a good idea.</p>
<h1 id="excursion-cursor-based-pagination"><a class="anchor-link" href="#excursion-cursor-based-pagination" aria-label="Anchor link for: excursion-cursor-based-pagination">#</a>
Excursion: Cursor based pagination</h1>
<p>I said I will also briefly talk about cursor based pagination, which is not a
first class citizen in sql unfortunately. Page-based pagination with <code>limit</code> and
<code>offset</code> is the standard. But that does have some problems when the data
underneath changes. Cursor-based pagination itself also has some problems, but
at least is a bit more stable. I am currently working on generalizing such a
cursor based pagination system, but it seems to be quite a bit more complex than
initially thought. But I don’t think a generic query builder should itself
provide such functionality, but it should be easy to extend it with such
functionality.</p>
GraphQL Code generators2019-05-25T00:00:00+00:002019-05-25T00:00:00+00:00
Unknown
https://swatinem.de/blog/graphql-codegen/<p>Prompted by a recent meetup with former work colleagues, I had another short look at <a href="https://facebook.github.io/relay/en/">relay</a>
as an alternative to <a href="https://www.apollographql.com/docs/react">apollo</a> which we currently use and with which we are not quite that happy.</p>
<p>Both libraries have support for typescript code generation, but relay puts it more in the center.</p>
<p>All in all, the problems, or rather improvements and dreams I have come from the fact that
strict typing and code generation was added as an afterthought.
You can use both libraries a) with untyped JS and b) without code generation at all.</p>
<p>Apart from the very obvious problem that the apollo-cli itself is horrible, both from a usage as well
as a project viewpoint (I mean it pulls in <code>yarn</code> as a dependency transitively, wtf?), I see
the following problems or lack of features in both relay and apollo when it comes to typescript
support and codegen.</p>
<h2 id="there-is-a-lot-of-footguns"><a class="anchor-link" href="#there-is-a-lot-of-footguns" aria-label="Anchor link for: there-is-a-lot-of-footguns">#</a>
There is a lot of footguns</h2>
<p>Lets create an example slightly changed from the <a href="https://facebook.github.io/relay/docs/en/type-emission">relay docs</a>.</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="color:#fa6e32;">import </span><span>{ ExampleQuery } </span><span style="color:#fa6e32;">from </span><span style="color:#86b300;">"__generated__/ExampleQuery.graphql"
</span><span>
</span><span style="color:#fa6e32;">const </span><span>QUERY </span><span style="color:#ed9366;">= </span><span style="color:#f29718;">graphql</span><span style="color:#86b300;">`
</span><span style="color:#86b300;"> query ExampleQuery($artistID: ID!) {
</span><span style="color:#86b300;"> artist(id: $artistID) {
</span><span style="color:#86b300;"> name
</span><span style="color:#86b300;"> }
</span><span style="color:#86b300;"> }
</span><span style="color:#86b300;"> `
</span><span>
</span><span style="color:#ed9366;"><</span><span>QueryRenderer</span><span style="color:#ed9366;"><</span><span>ExampleQuery</span><span style="color:#ed9366;">>
</span><span> query</span><span style="color:#ed9366;">=</span><span>{QUERY}
</span><span> variables</span><span style="color:#ed9366;">=</span><span>{{ artistID</span><span style="color:#61676ccc;">: </span><span style="color:#86b300;">'banksy' </span><span>}}
</span><span style="color:#ed9366;">/>
</span></code></pre>
<p>As this example shows, code generation and typescript support revolves around importing a generated
type definition and manually providing it as optional type parameter to the <code>QueryRenderer</code> component.
This perfectly demonstrates the two points I made before. Remove the <code>import</code> and the type parameter,
and the code becomes valid untyped JS.</p>
<p>However, there are a couple of obvious footguns even with this very simple example.</p>
<ul>
<li>The type parameter is optional and defaults to <code>any</code>. Developers are lazy, and using codegen in reality is a
lot more tedious than this example.</li>
<li>The type parameter needs to be provided <em>manually</em>, which means developers could potentially mess up,
the typescript compiler will not warn about this.</li>
<li>One way to mess up is to use mix up the <code>query</code> prop with a wrong type parameter.</li>
<li>It is not <em>DRY</em>, I will explain later on.</li>
</ul>
<p>The example from the <a href="https://www.apollographql.com/docs/react/recipes/static-typing">apollo docs</a> is even worse, since the <em>results</em> and the <em>variables</em> are provided as two
separate type parameters. So there is even more opportunity to mess up.</p>
<h2 id="digression-component-based-api"><a class="anchor-link" href="#digression-component-based-api" aria-label="Anchor link for: digression-component-based-api">#</a>
Digression: Component-based API</h2>
<p>This might just be a personal preference, but I consider Components with a render-prop API a horrible design pattern in general.
Especially for usage with graphql, I consider this to be extremely unergonomic and tedious.</p>
<p>Using react <code>hooks</code> is a hugely better alternative. But true, both relay and apollo are older than the hooks API, so I will
let this one slide.</p>
<h2 id="types-are-mis-used"><a class="anchor-link" href="#types-are-mis-used" aria-label="Anchor link for: types-are-mis-used">#</a>
Types are mis-used</h2>
<p>One problem I see with the generated types in our codebase is that developers are actually mis-using them.
This is not really specific to the way graphql libraries work, but a problem I see in general when it comes to typed
react code.</p>
<p>One very good design pattern in react is to split code into <em>presentational</em> components, that just <em>display</em> some data,
and <em>container</em> components, which might include some business logic and data fetching logic.
In my opinion, a <em>presentational</em> component should itself declare what kind of props it wants to receive.
And the type checker makes sure that the <em>container</em> components provides valid props.</p>
<p>Well in reality, developers will just happily import graphql-generated types for use in their presentational components,
which I personally consider to be a red flag and a mis-use of those types.</p>
<p>Example:</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="color:#fa6e32;">import </span><span>{ VenueLogoWrapperVenueQuery_venue_logo } </span><span style="color:#fa6e32;">from </span><span style="color:#86b300;">"../container/genTypes/VenueLogoWrapperVenueQuery"</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">interface </span><span style="color:#399ee6;">LogoWrapperProps </span><span>{
</span><span> logo</span><span style="color:#ed9366;">?: </span><span style="color:#399ee6;">VenueLogoWrapperVenueQuery_venue_logo </span><span style="color:#ed9366;">| </span><span style="font-style:italic;color:#55b4d4;">null</span><span style="color:#61676ccc;">;
</span><span>}
</span></code></pre>
<p>This surely gets a huge <em>facepalm</em> from me.</p>
<h2 id="it-does-not-support-custom-scalars"><a class="anchor-link" href="#it-does-not-support-custom-scalars" aria-label="Anchor link for: it-does-not-support-custom-scalars">#</a>
It does not support custom scalars</h2>
<p>GraphQL itself has a <em>super simple</em> concept. You just provide a <code>query</code> string and optional <code>variables</code> and you get back some
json <code>data</code>. You don’t even need a big-ass library, all you need is <code>fetch</code>.
But this also means you are limited to datatypes representable as json.
Which means you have <code>null</code> instead of <code>undefined</code>, and you don’t have rich types such as <code>Date</code>.
There is even an old <a href="https://github.com/apollographql/apollo-tooling/issues/622">feature request</a> for apollo to generate
<code>undefined</code> instead of <code>null</code>, but I think it unlikely for that ever to happen.</p>
<p>For all of the sophistication these libraries have, they are actually quite dumb in this regard.
All they do is pass on the result json unaltered.</p>
<h1 id="a-dream-takes-form"><a class="anchor-link" href="#a-dream-takes-form" aria-label="Anchor link for: a-dream-takes-form">#</a>
A dream takes form</h1>
<p>With all these issues in mind, lets brainstorm and dream up an ideal API.</p>
<p><strong>What if</strong>, a next generation graphql code generator generated some code like this:</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="color:#fa6e32;">import </span><span>{ getVenue</span><span style="color:#61676ccc;">, </span><span>getMe</span><span style="color:#61676ccc;">, </span><span>requestPasswordReset } </span><span style="color:#fa6e32;">from </span><span style="color:#86b300;">"./some-generated-file"</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">const </span><span>venue </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">await </span><span style="color:#f29718;">getVenue</span><span>(conn</span><span style="color:#61676ccc;">, </span><span>{ id</span><span style="color:#61676ccc;">: </span><span style="color:#86b300;">"…" </span><span>})</span><span style="color:#61676ccc;">;
</span><span style="font-style:italic;color:#abb0b6;">// venue = { name: string, logo?: Image, … }
</span><span>
</span><span style="color:#fa6e32;">const </span><span>me </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">await </span><span style="color:#f29718;">getMe</span><span>(conn)</span><span style="color:#61676ccc;">;
</span><span style="font-style:italic;color:#abb0b6;">// me = { lastLogin: Date, … }
</span><span>
</span><span style="color:#fa6e32;">const </span><span>result </span><span style="color:#ed9366;">= </span><span style="color:#fa6e32;">await </span><span style="color:#f29718;">requestPasswordReset</span><span>(conn</span><span style="color:#61676ccc;">, </span><span>{ email</span><span style="color:#61676ccc;">: </span><span style="color:#86b300;">"…" </span><span>})</span><span style="color:#61676ccc;">;
</span><span style="font-style:italic;color:#abb0b6;">// …
</span></code></pre>
<p>So my proposed next-generation graphql code generator will generate <em>ready to use</em>, <em>strongly typed</em>
async functions, which you can <em>just call</em> with a <code>connection</code> (essentially a parameterized <code>fetch</code>,
<code>apollo-client</code>, or whatever), and some <em>strongly typed</em> variables.</p>
<p>These functions will embed the query string itself, and will do <em>type conversions</em> for both <em>input</em> and
<em>result</em> types, in the example above the <code>Image</code> and <code>Date</code> scalars, which would be automatically deserialized
to typescript classes, or enums, or whatever.</p>
<p>This dreamed-up example was not react-specific, but it would be super easy to also create <em>ready to use</em> hooks
which would consume the <code>connection</code> via <code>context</code> and thus be <em>even easier</em> to use.</p>
<p>Essentially, it will solve the following problems, plus give even more advantages:</p>
<ul>
<li>simple, ready-to-use API</li>
<li>automatic scalar deserialization</li>
<li>strongly typed, with no opportunity for mis-use</li>
<li>DRY, because it embeds the <code>query</code> portion right in the function</li>
<li>very friendly to dead-code-elimination / tree-shaking</li>
</ul>
<h2 id="inspiring-other-projects"><a class="anchor-link" href="#inspiring-other-projects" aria-label="Anchor link for: inspiring-other-projects">#</a>
Inspiring other projects</h2>
<p>Now that I have brainstormed the graphql code generator of my dreams, it also got me thinking about another one of my
projects. Another code generation success story is <a href="https://github.com/eversport/intl-codegen">intl-codegen</a>, which similarly generates <em>ready to use</em> functions out
of <code>MessageFormat</code> strings, with the goal of both avoiding to ship a <code>MessageFormat</code> parser to runtime as well as to provide
<em>not quite so strong</em> typing.</p>
<p>I could imagine that instead of having a single <code>Localized</code> component, it could actually generate a separate component for
each message.</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="font-style:italic;color:#abb0b6;">// now:
</span><span style="color:#ed9366;"><</span><span>Localized id</span><span style="color:#ed9366;">=</span><span style="color:#86b300;">"some-message-id" </span><span>params</span><span style="color:#ed9366;">=</span><span>{{ name</span><span style="color:#61676ccc;">: </span><span style="color:#86b300;">"some name" </span><span>}} </span><span style="color:#ed9366;">/>
</span><span style="font-style:italic;color:#abb0b6;">// future?:
</span><span style="color:#ed9366;"><</span><span>SomeMessageId name</span><span style="color:#ed9366;">=</span><span style="color:#86b300;">"some name" </span><span style="color:#ed9366;">/>
</span></code></pre>
<p>I’m not quite convinced yet if this would be a worthwile change, but it certainly would make it impossible to mis-use
the current implementation by using a dynamic <code>id</code>, which circumvents typing.</p>
<p>In general, I have neglected <a href="https://github.com/eversport/intl-codegen">intl-codegen</a> quite a bit recently, but I do have a lot of ambitious ideas about its future.
But those ambitious ideas are also hard to implement so it will take quite some more time to think about those.</p>
Error Handling Considerations2019-05-15T00:00:00+00:002019-05-15T00:00:00+00:00
Unknown
https://swatinem.de/blog/error-handling/<p>Error handling is a really important aspect of software engineering, and also a
very complex one with a lot of sides that need to be considered.
Lets try to break this down into a few key aspects.</p>
<h2 id="types-of-errors"><a class="anchor-link" href="#types-of-errors" aria-label="Anchor link for: types-of-errors">#</a>
Types of Errors</h2>
<p>By <em>types</em>, I do mean the distinction between <em>recoverable errors</em> and
<em>unrecoverable errors</em>, or <em>expected errors</em> and <em>unexpected errors</em>.
In general the difference is that <em>expected</em> / <em>recoverable</em>
errors are handled explicitly by code, and <em>unrecoverable</em> errors usually crash
the program or are handled at a much coarser granularity, such as per-thread
or per-request. One important thing to note first is that this distinction is up to the developers
on a per-project basis.
For example, a missing file can be treated as an unrecoverable error when reading
configuration on program start, where a crash will instruct the developers to
correctly configure it. In other parts of the program, a missing file must be
handled explicitly, because it might depend on user input and must not crash
the running program.</p>
<h2 id="errors-as-data"><a class="anchor-link" href="#errors-as-data" aria-label="Anchor link for: errors-as-data">#</a>
Errors as Data</h2>
<p>Especially <em>expected errors</em> need to be displayed to end users. For example with
a nicely formatted Error Box in the case of web UI.</p>
<p>One very important aspect here, which sadly almost noone gets right is that
<em>error messages need to be localizable</em> <strong>(!!!)</strong> For this to work, the error
needs to be serializable, including every kind of meta information that might
be displayed to the user. For the example above, an abstract representation of
the error should include information that <strong>opening</strong> a file with a certain
<strong>path/name</strong> failed (and possibly some more).</p>
<p>I would argue that this is not really important, or even bad for <em>unrecoverable</em>
errors. These should be <em>logged</em> instead, because they are most likely caused
by some developer fault. A stack trace should be provided as a form of metadata
to help developers fix this error. However, these kind of errors should only be
shown to end-users in a very opaque way, and avoid leaking internal implementation
details such as stack traces which could be used to maliciously attack a service.</p>
<h2 id="control-flow-of-errors"><a class="anchor-link" href="#control-flow-of-errors" aria-label="Anchor link for: control-flow-of-errors">#</a>
Control Flow of Errors</h2>
<p>Here I see a distinction between <em>explicit</em> treatment of errors, and <em>explicit</em>
handling. This is very tightly coupled to language syntax features or idioms.
In general, <em>explicit</em> error handling comes at the expense of more boilerplate
but can also potentially lead to better software in my opinion.</p>
<p>But what do I mean by <em>explicit</em> or <em>implicit</em> at all?
Well, per my definition <em>explicit</em> means that functions will <code>return</code> or pass as
<code>callback parameter</code> a strictly typed error object, and/or a value.
<em>Implicit</em> error handling in contrast means that errors are <code>throw</code>n <em>somewhere</em>
and unwind the call stack up to a <code>catch</code> block. This way they lose most/all of
their type information. However this reduces a lot of boilerplate, because code
does not deal with statements that could potentially throw everywhere.</p>
<h2 id="comparing-different-languages"><a class="anchor-link" href="#comparing-different-languages" aria-label="Anchor link for: comparing-different-languages">#</a>
Comparing different Languages</h2>
<p>Now lets look at some examples in different programming languages. Most of them
support both explicit and implicit error handling. And in most cases, explicit
handling does not need to be a language feature itself, but can also be
implemented as a library.</p>
<h3 id="example-haskell"><a class="anchor-link" href="#example-haskell" aria-label="Anchor link for: example-haskell">#</a>
Example: Haskell</h3>
<p>First off: I don’t really know this language well, so I might be wrong about
some points.</p>
<p>Haskell is a very strictly function language, and claims to be very <em>safe</em>.
Instead of nullable values, it has a <code>Maybe</code> type that either has <code>Just</code> a value
or <code>Nothing</code>. Similarly, it uses the type <code>Either</code> to denote a <code>Right</code> value or
a <code>Left</code> error.
The language has special syntax to chain functions together that either work with
a <code>Just</code> or <code>Right</code> value, or short circuit and just return the <code>Nothing</code> / <code>Left</code>.</p>
<p>I might be wrong about this, but I think Haskell and other very strict functional
languages don’t even have the notion of <code>throw</code> that unwinds the stack.</p>
<h3 id="example-go"><a class="anchor-link" href="#example-go" aria-label="Anchor link for: example-go">#</a>
Example: Go</h3>
<p>Again, I don’t really know go, this only summarized some things I have read online.</p>
<p>In Go, most functions will return a compound value, with a <code>value</code> and an <code>error</code>.</p>
<pre data-lang="go" style="background-color:#fafafa;color:#61676c;" class="language-go "><code class="language-go" data-lang="go"><span>value</span><span style="color:#61676ccc;">, </span><span>error </span><span style="color:#ed9366;">:= </span><span style="color:#f29718;">someFn</span><span>()
</span><span style="color:#fa6e32;">if </span><span>error </span><span style="color:#ed9366;">!= </span><span style="color:#ff8f40;">nil </span><span>{
</span><span> </span><span style="color:#fa6e32;">return </span><span style="color:#ff8f40;">nil</span><span style="color:#61676ccc;">, </span><span>error
</span><span>}
</span></code></pre>
<p>I don’t really know how strict the go typechecker is. But having to explicitly
check for <code>nil</code> everywhere is a real anti-pattern IMO.</p>
<p>Again, I don’t know if go actually has the concept of <code>throw</code>, however I have
never seen any example of it.</p>
<h3 id="example-ts"><a class="anchor-link" href="#example-ts" aria-label="Anchor link for: example-ts">#</a>
Example: TS</h3>
<p>TypeScript actually supports different kinds of error handling patterns.</p>
<p>The callback-based style that is common in <code>node</code> and in older libraries will
look similar to the <code>go</code> example. You will provide a callback function with
two parameters, and need to explicitly check and early-return on errors.</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="color:#f29718;">someFn</span><span>((</span><span style="color:#ff8f40;">error</span><span style="color:#61676ccc;">, </span><span style="color:#ff8f40;">value</span><span>) </span><span style="color:#fa6e32;">=> </span><span>{
</span><span> </span><span style="color:#fa6e32;">if </span><span>(error) {
</span><span> </span><span style="color:#fa6e32;">return </span><span style="color:#f29718;">callback</span><span>(error)</span><span style="color:#61676ccc;">;
</span><span> }
</span><span> </span><span style="font-style:italic;color:#abb0b6;">// …
</span><span>})</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>More modern <code>async/await</code> based code has support for <code>try/catch</code>.</p>
<p>Apart from this, some code might also use explicit return types such as
<code>Haskell</code>s <code>Maybe</code>.</p>
<p>But using either callbacks or explicit library provided types such as <code>Maybe</code> has
the significant drawback that basically <strong>any</strong> code can just <code>throw</code> and punch
though that abstraction layer.</p>
<p>Also, since the support for these explicit styles have no dedicated language/syntax
support, they come with some boilerplate and inconvenience.</p>
<p>The problem with <code>try/catch</code> however is, that there is absolutely no guarantee on
the value in a catch block by definition.
TS even has an explicit compiler error that states that
<code>Catch clause variable cannot have a type annotation.</code></p>
<p>You can just <code>throw 1</code> and that is valid code. This by itself can cause a lot of
problems. We actually shipped code to production that <code>throw</code>-ing inside of a
<code>catch</code> block because it made wrong assumptions on the shape of object it caught.</p>
<h3 id="example-rust"><a class="anchor-link" href="#example-rust" aria-label="Anchor link for: example-rust">#</a>
Example: Rust</h3>
<p>Even though I have not actually <em>used</em> Rust that much, I read a lot about it and
I admire a lot of the things it does.</p>
<p>IMO, Rust gets <em>a lot</em> of things <em>just right</em>. Error handling is no different.</p>
<p>It basically has two mechanisms, a Haskell-esque <code>Result</code> type for recoverable
errors, and a <code>throw</code>-like mechanism (called <code>panic!</code>) for unrecoverable errors.</p>
<p>Apart from this, it has dedicated syntax (<code>?</code>) to make explicit error handling
extremely convenient. It is also possible to implement the special <code>From</code>/<code>Into</code>
trait to convert between error types <em>completely automatically</em>, without any
additional boilerplate.</p>
<pre data-lang="rust" style="background-color:#fafafa;color:#61676c;" class="language-rust "><code class="language-rust" data-lang="rust"><span> </span><span style="color:#fa6e32;">let mut</span><span> s </span><span style="color:#ed9366;">= </span><span style="font-style:italic;color:#55b4d4;">String</span><span style="color:#ed9366;">::</span><span>new()</span><span style="color:#61676ccc;">;
</span><span> File</span><span style="color:#ed9366;">::</span><span>open(</span><span style="color:#86b300;">"hello.txt"</span><span>)</span><span style="color:#ed9366;">?.</span><span style="color:#f07171;">read_to_string</span><span>(</span><span style="color:#ed9366;">&</span><span style="color:#fa6e32;">mut</span><span> s)</span><span style="color:#ed9366;">?</span><span style="color:#61676ccc;">;
</span><span> </span><span style="font-style:italic;color:#55b4d4;">Ok</span><span>(s)
</span></code></pre>
<p>Here, both <code>open</code>ing the file and <code>read</code>ing can potentially error, and just
chaining the <code>?</code> operator will early-return a <code>Result</code> and <em>automatically</em>
convert the IoError into your application specific Error type if a corresponding
<code>From</code>/<code>Into</code> implementation exists.</p>
<p>In contrast to that, the <code>panic!</code> mechanism will unwind the callstack in the
case of unrecoverable errors.</p>
<p>In general, there is also the community consensus that <em>libraries</em> should always
return <code>Result</code>s. It is the choice of the application if and how to handle those
errors. An application can use <code>unwrap</code> or <code>expect</code> to essentially <code>throw</code> on
errors.</p>
<p><a href="https://doc.rust-lang.org/book/ch09-00-error-handling.html">Read more</a> on how
error handling in Rust works.</p>
<h2 id="where-to-handle-errors"><a class="anchor-link" href="#where-to-handle-errors" aria-label="Anchor link for: where-to-handle-errors">#</a>
Where to handle Errors</h2>
<p>There is quite some controversy in our team around where to actually <em>handle</em>
these errors.</p>
<p>Lets take a simple Database Repository as an example. Lets assume there is a
<code>findOne</code> method. By definition, this will return a <code>nullable</code> type. At least
if the <em>manual type declaration</em> is correct. Sadly, most database libraries are
completely untyped in TS :-(</p>
<p>Currently we have three different patterns around this:</p>
<ul>
<li>First, the type definition might just be wrong, and assume a non-nullable type
which is actually nullable and might result in the typical
<code>undefined is not a function</code> kind of errors.</li>
<li>Developers might use the non-nullable assertion operator (<code>!</code>) and <em>consciously</em>
decide to implicitly throw a <code>undefined is not a function</code> error.</li>
<li>A developer might add an <code>if</code> with an explicit <code>throw</code>. This is <em>a lot</em> of
boilerplate.</li>
</ul>
<p>When we go back one step and say that <em>libraries</em> should return the most correct
types, it means that the types need to be marked as nullable, so the first case
is definitely wrong.</p>
<p>But lets focus on <strong>where</strong> we are in the program flow.</p>
<p>When we are at the <strong>IO</strong> boundary to some user provided data, such as an <code>id</code>,
we have a <em>recoverable</em> error in the sense that we can provide the user with a
meaningful error message, such as a <code>404</code> error.</p>
<p>However, in a deeper layer of the application, I would argue that this case should
not occur, and if it does, it would be a programmer logic error.
In my opinion, doing explicit error handling in this layer is way too much
boilerplate and is actually harmful to the readability and understandability of
the code logic.</p>
<p>For this reason, I would argue that once user input is validated, any deeper code
should just use non-null assertions, or maybe a more explicit <code>.expect()</code> function
and throw with a normal JS error.</p>
<h2 id="summary"><a class="anchor-link" href="#summary" aria-label="Anchor link for: summary">#</a>
Summary</h2>
<p>We currently have a mix of different error handling patterns. I also experimented
with returning a <code>Result</code>-ish type using TS discriminated unions, which is just
too inconvenient in TS to be viable.
The first conclusion thus is to just stick to <code>throw</code> for the control flow.</p>
<p>I would also rephrase the distinction between <em>recoverable</em> and <em>unrecoverable</em>
errors, to better understand what the goals and requirements are.</p>
<p>Lets use the words <em>user facing error</em> in contrast to <em>developer facing error</em>.</p>
<p>A <em>user facing error</em>:</p>
<ul>
<li>must be <em>translatable</em>, and thus needs to include enough metadata to be able
to do so.</li>
<li>must be <em>serializable</em>, for example to be returned by an API to its users.</li>
<li>should not leak any implementation detail of the application.</li>
<li>should provide a link to user input if possible.</li>
<li>should make it possible to group / display multiple errors at once.</li>
</ul>
<p>In contrast, a <em>developer facing error</em>:</p>
<ul>
<li>must be <em>debuggable</em>, with enough metadata, such as a <em>stack trace</em>.</li>
<li>need not be translatable, since only developers should ever see it.</li>
<li>likely has no correlation to user input.</li>
<li>IMO, should actually <em>not</em> be translated, to make it easier to communicate
across teams and to search for solutions online.</li>
</ul>
<hr />
<p>From these requirements, I would first conclude that <em>unexpected</em> /
<em>unrecoverable</em> / <em>developer facing</em> errors should use the standard
<code>throw new Error()</code> pattern. I am also strongly in favor of just using non-null
assertions to cut down on unnecessary boilerplate. When both inputs and logic
assumptions are sufficiently validated, these kind of errors should ideally
never occur, so adding boilerplate for these kind of errors in unnecessary.</p>
<p>In contrast, for <em>expected</em> / <em>recoverable</em> / <em>user facing</em> errors, I would rather
create a custom error type, which might, but need not necessarily <code>extends Error</code>.
This type <em>must</em> be <em>serializable</em>, and include enough metadata to make translation
possible. It should also include metadata to link back to any user input.
Apart from that, it should be possible to return <em>multiple</em> such errors, even
though that needs to be done explicitly by developers.</p>
<p>These two different kinds of errors should also be handled separately in <code>catch</code>
blocks depending on the specific usecase.</p>
<p><em>User facing</em> errors should be explicitly checked for and either re-thrown when
deep inside of the application or returned to the user explicitly on a
<em>per request</em> / <em>per operation</em> basis. These kind of errors will most likely
both <em>happen</em> and be <em>handled</em> close to the user. Since <em>translation</em> is also
one requirement, this should also happen as close to the IO-layer as possible.</p>
<p><em>Anything else</em> should definitely be <em>logged</em> at least. Then it is up to the
developer how to handle these, and how course-grained the handling should be.
Possibilities are to just retry the operation, or maybe to ignore it completely.
But also these errors <em>must</em> be caught on a <em>per request</em> basis and converted to
an <em>opaque</em> user facing error.</p>
<h2 id="conclusion"><a class="anchor-link" href="#conclusion" aria-label="Anchor link for: conclusion">#</a>
Conclusion</h2>
<p>Well, error handling is a really big and controversial topic. Most of the hard
decisions really depend on the specific application usecase.</p>
<p>What makes me kind of sad is that most solutions fail my most important requirement
of <em>user facing</em> errors. They make translating errors really hard.
Especially for libraries that are focused on validating user input, this is a
<em>must have</em> requirement!</p>
Enforcing Rules2019-05-10T00:00:00+00:002019-05-10T00:00:00+00:00
Unknown
https://swatinem.de/blog/enforcing-rules/<p>Previously I have written extensively about the <a href="../dx-challenges">problems I face</a>
and some ideas how I would like to <a href="../managing-intermediates">organize a larga codebase</a>.</p>
<p>Another challenge is to integrate this all into a monorepo but in a way that
best isolates <em>new</em> code from <em>old</em> code.
Best way to do that is via linting, so lets look at that in more depth.</p>
<p>Put very simply, a linter is a tool to enforce certain rules on your code. From
formatting concerns to more sophisticated rules such as making sure that you
handle async promise-based code correctly.</p>
<p>Linting can happen at a few distinct phases during development:</p>
<ul>
<li>During development inside an IDE, which can point the developer to problems
immediately, and can do formatting and fix some auto-fixable lints directly at
<em>format on save</em> time.</li>
<li>At commit-time via a pre-commit hook. To make sure developers only commit valid
code.</li>
<li>On a CI server, to make sure that only code that conforms to the set of rules
will land in the repository.</li>
</ul>
<p>These use-cases are also listed in the order of <em>responsiveness expectations</em>.
The in-editor usage should be instant, while on the other hand you don’t really
care how long linting will take on CI.</p>
<p>And here comes the problem that I already mentioned in my previous posts.
I <em>suspect</em> that my editor integration via <a href="https://github.com/Microsoft/typescript-tslint-plugin">typescript-tslint-plugin</a> might be
the cause why the language server frequently becomes super slow, and might even
reliably OOM when I change a lot of files at one, for example by changing branches.</p>
<p>Also the pre-commit hook is super slow depending on many files are being changed.</p>
<p>One reason for this slowness might be that the linter needs to start up from scratch
every time, and needs to typecheck the code again every time. There is no way
to share some state between these tools.</p>
<p>Back in the days when I was still using vim, I used <a href="https://github.com/mantoni/eslint_d.js">eslint_d</a> which starts a
long running process and communicates with that via a socket. That way it can
avoid all the startup and warmup costs of node. (One very good reason why you
should bundle your code as much as possible)</p>
<hr />
<p>One of the problems with at least the <code>vscode</code> integration of <code>tslint</code> was that
it did not support rules that relied on typechecking. That was the main usecase
of <a href="https://github.com/Microsoft/typescript-tslint-plugin">typescript-tslint-plugin</a>. But now <code>tslint</code> is officially deprecated in
favor of <a href="https://github.com/typescript-eslint/typescript-eslint">typescript-eslint</a>.</p>
<p>I haven’t tried that one yet, but I do know that there is no deep integration
with the language server yet, I have
<a href="https://github.com/typescript-eslint/typescript-eslint/issues/254">asked specifically</a>.
And I am not sure yet if the <code>eslint</code> plugin of vscode can correctly work with
rules that depend on typechecking. I would like to think that it does.</p>
<p>But even so, that would make things even worse, since it means I would have two
long-running processes in the background, both doing redundant work. Maybe someone
will write a <code>tsserver</code> integration at some point, maybe even myself.</p>
<hr />
<p>But enough about that. The challenge at hand is to isolate code inside a
monorepo in a way that can guarantee to you can’t cross the import barrier from
<em>new</em> to <em>old</em> code.</p>
<p>I <em>hope</em> that a combination of eslints <a href="https://eslint.org/docs/rules/no-restricted-imports">no-restricted-imports</a> together with
<code>eslint-plugin-import</code>s <a href="https://github.com/benmosher/eslint-plugin-import/blob/master/docs/rules/no-restricted-paths.md">no-restricted-paths</a> will be enough in that regard.</p>
<p>Other than that, I think the story here is quite good. I still need to spend
some time evaluating all the linting rules we have, again. :-D</p>
Managing Intermediate Artifacts2019-05-09T15:00:00+00:002019-05-09T15:00:00+00:00
Unknown
https://swatinem.de/blog/managing-intermediates/<p>In my last post, I talked about <a href="../dx-small-projects">small public</a> projects.
When the project gets bigger however, the workflows I presented quickly become
a pain. As I showed in the <a href="../dx-challenges">first post of the series</a> we have
reached a size where typechecking, testing and linting have slowed to a crawl,
even to the point that the language intelligence of my IDE keeps crashing when
I switch branches.</p>
<p>As I also showed in the first post, another problem is that the developer tools
often do duplicate work, which both makes things slower, and opens the door for
bugs.
We use different tools to compile serverside code for production use, which is
different from the way we run the code in local testing, also jest does
its own thing to run the code, and webpack does its own thing when
bundling code for the web.</p>
<p>Now I want to define some goals I would like to achieve, as well as to define
some rules.</p>
<ul>
<li>First off, I would like to start with a <strong>clean slate</strong>
<ul>
<li>This implies that we are only caring about <em>TypeScript</em> code, which will be
important in a sec.</li>
</ul>
</li>
<li>There should be <em>as little difference as possible</em> between running code for
local development and running code in production</li>
<li>Running code in local development should be <strong>convenient</strong>
<ul>
<li>It should be <strong>fast</strong></li>
<li>It should involve as little boilerplate as possible</li>
</ul>
</li>
<li>It should support code that targets <code>web</code> as well as <code>node</code>!</li>
<li>It should make it easier and convenient to organize code
<ul>
<li>Specifically, it should support deep <code>import X from "deep/within/other/modules"</code></li>
<li>(Yes, I absolutely believe that <em>small, public</em> libraries should only support
a single entry point and hide their internal structure! But this usecase is different.)</li>
</ul>
</li>
<li>It should have strong rules in place to enforce best practices</li>
</ul>
<p>After extensively exploring the problem space, I think it will become necessary
to rethink some of the conveniences that I came to rely on coming from <em>small</em>
projects. I think it is necessary to explicitly manage intermediate artifacts.</p>
<p>I think this will come with some significant advantages for local development
as well as running code in production. But it comes with one significant
disadvantage, which is that most development workflows are not <em>self contained</em>
anymore, but rely on other steps.</p>
<h2 id="tsc-to-compile-files"><a class="anchor-link" href="#tsc-to-compile-files" aria-label="Anchor link for: tsc-to-compile-files">#</a>
<code>tsc</code> to compile files</h2>
<p>First step here would be to use <code>tsc</code> <em>explicitly</em> to transpile to code that
runs natively in node 8 and modern browsers, with one very important twist:
The code will use native <code>esm</code> modules instead of <code>commonjs</code>! To make this work
in node, I propose to use the <a href="https://github.com/standard-things/esm">esm</a> module to be able to natively load those.</p>
<p>I am very wary of using such require hooks in production, but I really want to
give this one a shot. Apart from this one, we already use <code>source-map-support</code>.</p>
<p>Using <code>tsc</code> in <code>--watch</code> mode, combined with <a href="https://github.com/standard-things/esm">esm</a> would mean the following:</p>
<ul>
<li>We would run <strong>the exact same</strong> code in local development as we will do in
production!</li>
<li>We wouldn’t need <em>any</em> webpack loader at all. Webpack/rollup can consume native
<code>esm</code> modules. So we would also run the same code on the web as we do on the
server.</li>
<li>Things would be <em>fast</em>: Since we don’t need any <em>transpiling</em> require hook or
webpack loader anymore, hot reloads should actually get faster.</li>
<li><strong>BUT</strong>: we would need to have <code>tsc --watch</code> running in the background at all
times, which is an inconvenience.</li>
</ul>
<hr />
<p>Now that we have decided to actually have <code>tsc</code> <code>emit</code> something, combined with
the fact that we will deal with TS files <em>only</em>, we can use <a href="https://www.typescriptlang.org/docs/handbook/project-references.html">project references</a>
which will hopefully significantly reduce the resource usage and startup time
of the IDE.</p>
<h2 id="code-organization"><a class="anchor-link" href="#code-organization" aria-label="Anchor link for: code-organization">#</a>
Code Organization</h2>
<p>We currently use path mapping, which needs to be set up separately for <code>tsc</code> and
<code>jest</code>, plus a custom require hook using <a href="https://github.com/dividab/tsconfig-paths">tsconfig-paths</a>, which I had to patch
myself BTW because it was both horribly slow and buggy.</p>
<p>After some time, I come to the conclusion that relying on path mapping was not
a good idea. Apart from the problems with <a href="https://github.com/dividab/tsconfig-paths">tsconfig-paths</a> itself and the need
to correctly set it up, it was also a source of problems because the code had
different behavior in local development as it had in production.</p>
<p>So far, we also used npm packages which were published to a private registry,
which in itself has caused us a lot of problems every now and then. Instead of
consuming code via npm, we decided to just put the whole <em>monorepo</em> (a more
fitting name would be <em>code dump</em>) into a docker image, to make us independent
from an npm registry.</p>
<p>However, I still think using npm packages, or more specifically <code>node_modules</code>
has its merits.</p>
<p>So we established that we want to use the <em>exact same</em> code in production as in
local development, and that we don’t want to rely on path mapping anymore. And
we would like to have both <em>convenient</em> import paths and <em>deep</em> import paths.
One of the reasons path mapping caused problems was the fact that we had <code>src</code>
and <code>dist</code> folders, which would allow deep import paths in local development but
fail in non obvious ways when running in production.</p>
<p>My proposal here, which I would still have to validate with a running example,
is to remove the <code>src</code>/<code>dist</code> folders, and have <code>tsc</code> emit its artifacts right
in the root folder. You would end up with a structure like this:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>| some-package
</span><span>+- README.md (maybe)
</span><span>+- package.json
</span><span>+- tsconfig.json
</span><span>+- .eslintrc.js (maybe)
</span><span>+- index.ts
</span><span>+- index.js
</span><span>+- index.js.map
</span><span>+- index.d.ts
</span><span>\- index.d.ts.map (not quite sure if these can be inlined?)
</span></code></pre>
<p>Yes, this does look very untidy. At least in <code>vscode</code>, the IDE can be configured
to hide all the output artifacts if a corresponding <code>.ts</code> file exists. Not sure
about other editors.</p>
<p>I think there are ways to organize things differently, for example by moving the
<code>package.json</code> file into a different folder that would be the <em>output</em> folder
with the intermediate artifacts, separate from the source files.
But I think that would be more confusing than beneficial.</p>
<p>Also note that since we will not rely on publishing to an npm registry anymore,
the <code>package.json</code> is free to define arbitrary names, such as this:</p>
<pre data-lang="json" style="background-color:#fafafa;color:#61676c;" class="language-json "><code class="language-json" data-lang="json"><span>{
</span><span> </span><span style="color:#86b300;">"private"</span><span style="color:#61676ccc;">: </span><span style="color:#ff8f40;">true</span><span style="color:#61676ccc;">,
</span><span> </span><span style="color:#86b300;">"name"</span><span style="color:#61676ccc;">: </span><span style="color:#86b300;">"~components"
</span><span>}
</span></code></pre>
<p><code>yarn</code> workspaces or <a href="https://pnpm.js.org/">pnpm</a> would make sure that a deep import such as
<code>import X from "~components/Button"</code> would find the correct file.</p>
<h2 id="digression-code-generators"><a class="anchor-link" href="#digression-code-generators" aria-label="Anchor link for: digression-code-generators">#</a>
Digression: Code Generators</h2>
<p>One other thing that is causing me a lot of concerns recently is how to deal
with other intermediate artifacts, such as code created via code generation.
We use <a href="https://github.com/eversport/intl-codegen">intl-codegen</a> and <a href="https://github.com/apollographql/apollo-tooling">apollo codegen</a> to produce code that depends on other
source files. The have written the former myself and I’m not quite sure how happy
I am with the latter.</p>
<p>We have multiple problems with the way we use these tools currently.</p>
<ul>
<li>The generated files are currently committed to git, and cause a lot of churn
and merge conflicts.</li>
<li>The files can get out of sync, since developers are not <em>forced</em> to re-generate
and commit them.</li>
<li>Generating these files can break either the typechecking, or far worse, the
code itself in unpredictable ways. Which is both inconvenient when CI builds
suddenly turn red, and dangerous when things are shipped to production.</li>
<li>Translators often mess up the <code>MessageFormat</code> syntax, which will only break
when a developer runs the codegen.</li>
</ul>
<p>I think to solve this problem, it would be a good idea to <code>.gitignore</code> these
files and rather integrate them better with file watcher running in the background.</p>
<p>For <a href="https://github.com/eversport/intl-codegen">intl-codegen</a>, this should be easy and straightforward, but apollo is more
complex, since it relies on a graphql schema, which itself depends on running
your code first.
In this case, I propose to actually commit the schema, but write an automated
test that runs the schema creation on CI and fails when the cached schema file
differs.</p>
<h2 id="conclusion"><a class="anchor-link" href="#conclusion" aria-label="Anchor link for: conclusion">#</a>
Conclusion</h2>
<p>I think the proposal shown here would solve quite some problems while introducing
only minimal inconveniences. I would really love to explore this further.</p>
DX on Small Projects2019-05-09T13:00:00+00:002019-05-09T13:00:00+00:00
Unknown
https://swatinem.de/blog/dx-small-projects/<p>Continuing my series, I will take a look at what tools and workflows I use to
manage my small projects. I will also explain some of the very opinionated
guidelines that I follow.</p>
<p>We will specifically talk about code that will be <em>published</em>, and can be consumed
<em>publicly</em> by anyone. This has some implications on the structure of the code.</p>
<h2 id="maintaining-a-public-api"><a class="anchor-link" href="#maintaining-a-public-api" aria-label="Anchor link for: maintaining-a-public-api">#</a>
Maintaining a public API</h2>
<p>I do have quite a strong opinion on bundling and how to best publish / expose
code that you write, which has implications on how you <em>consume</em> that code.</p>
<p>Writing a <em>small</em> and <em>focused</em> library means that you should ideally have only
one, or very limited and <em>explicit</em> set of entry points.</p>
<p>The problem is that <em>in theory</em>, people can just import any file that is included
in an npm package. And others will start to begin relying on internal
implementation details they really shouldn’t. And they will complain if you
break things by re-organizing your internal code.</p>
<p>People will just happily <code>import { SomeInternalClass } from "your-library/some/internal/file"</code>.</p>
<h2 id="bundling-code"><a class="anchor-link" href="#bundling-code" aria-label="Anchor link for: bundling-code">#</a>
Bundling Code</h2>
<p>One way to avoid this is to bundle your code, which I highly recommend everyone
should do.
I am a big fan of, and an early adopter and contributor to <a href="https://rollupjs.org/">rollup</a>, and one
project I would like to show off to highlight some of my recommendations is
<a href="https://github.com/Swatinem/rollup-plugin-dts">rollup-plugin-dts</a>, which you can use also bundle up TS type definitions alongside
your code.</p>
<p>The README of <a href="https://github.com/Swatinem/rollup-plugin-dts">rollup-plugin-dts</a> shows a clear example of how to best use it.</p>
<h2 id="managing-dependencies"><a class="anchor-link" href="#managing-dependencies" aria-label="Anchor link for: managing-dependencies">#</a>
Managing Dependencies</h2>
<p>I am still surprised how often people get this wrong. Or how little thought they
put into it.</p>
<p>For example, there are
<a href="https://renovatebot.com/docs/dependency-pinning/#ranges-for-libraries">multiple</a>
<a href="https://doc.rust-lang.org/cargo/faq.html#why-do-binaries-have-cargolock-in-version-control-but-not-libraries">sources</a>
out there that explain why <em>libraries</em> should not pin their dependencies, but
rather delegate that choice to the <em>users</em> of that library.</p>
<p>Another important thing to understand is the difference between direct
<code>dependencies</code> and <code>peerDependencies</code>.
<a href="https://yarnpkg.com/blog/2018/04/18/dependencies-done-right/">The yarn blog</a>
has a good article about that.
TLDR: When your libraries users should not care or even know, put it into
<code>dependencies</code>. If your library is used <em>alongside</em> or <em>together with</em> some other
dependency, put it into <code>peerDependencies</code>.
For example, <code>rollup-plugin-dts</code> put both <code>rollup</code> and <code>typescript</code> into
<code>peerDependencies</code>, because it can’t work independently of those two, and someone
using <code>rollup-plugin-dts</code> will have to use the other two as well.</p>
<p>Something else I see quite often, which I think is just wrong is that some
libraries are putting <code>@types</code> into their <code>dependencies</code>. This is <strong>only</strong> valid
for other <code>@types</code> packages!</p>
<p>Why? Because <code>@types</code> should never ever be used in production. They are by
definition <code>devDependencies</code>. Just because users of <code>your-library</code> happen to
also use typescript and are getting typechecking errors because they are missing
<code>@types/node</code> does not mean that <code>@types/node</code> belongs into <code>dependencies</code>.
The code will run in production without that!</p>
<h2 id="support-targets"><a class="anchor-link" href="#support-targets" aria-label="Anchor link for: support-targets">#</a>
Support Targets</h2>
<p>A little bit related to dependencies. I would recommend to people to <em>publish</em>
code in the most recent JS dialect possible, that you can run <em>natively</em> in the
<em>most up-to-date</em> runtime. Make your users pick a support target. Don’t force
transpiled code or polyfills on your users.
A short test showed that <em>not</em> transpiling async/await but rather using it
natively cut down the bundle size by <em>~10%</em>, but more importantly, it cut the
<em>startup time</em> of the code by <strong>~25%</strong>.</p>
<p>This sadly is one of the disadvantages of JS being a language that relies on a
runtime. :-(</p>
<p>When talking about a <em>target</em>, I would also encourage people to publish code
that targets a <em>standard</em> module system, by which I mean <em>native</em> <code>import/export</code>
syntax. That way the user of your library has the choice how to best consume it,
such as by bundling it with the rest of their code.</p>
<p>Sadly though, this goal is at odds with being able to run that code in node
natively <em>sadface</em>. One solution is to publish the code both as commonjs, and as
native modules, which however is also at odds with using deep import paths, which
I previously argued you should avoid anyway :-).</p>
<p>But the takeaway is to publish code in a way that is friendly to bundlers. Which
also has implications on the <em>dependencies</em>, which also need to be friendly to
bundlers, which sadly most code still is not.</p>
<h2 id="testing-and-linting"><a class="anchor-link" href="#testing-and-linting" aria-label="Anchor link for: testing-and-linting">#</a>
Testing and Linting</h2>
<p>Todays post is about <strong>small</strong> libraries, such as <a href="https://github.com/Swatinem/rollup-plugin-dts">rollup-plugin-dts</a>, of which
I wrote, say <em>~98%</em> myself. This means I don’t really need a complex linting
setup, apart from <em>format on save</em> that the IDE provides. I will focus more on
linting in a future post.</p>
<p>Being a small and focused library also means its easy to test, which I usually
put quite some effort into doing, as close to 100% coverage as possible.</p>
<p>For this I use <a href="https://jestjs.io/">jest</a> in combination with <a href="https://kulshekhar.github.io/ts-jest/">ts-jest</a>. Being <strong>small</strong> also means
that the convenience of having a single command to both typecheck and test my
code including code coverage outweighs the disadvantage of that workflow being
slow.
Running the testsuite takes around <em>~6 seconds</em> for <a href="https://github.com/Swatinem/rollup-plugin-dts">rollup-plugin-dts</a> and maybe
<em>~10 seconds</em> for <a href="https://github.com/eversport/intl-codegen">intl-codegen</a>. The slowness probably comes from the fact that
both tools do typechecking using TS as <em>part of the tests themselves</em>, rather
than the tooling itself.</p>
<p>One nice thing about software engineering itself is that you are constantly
challenged, and you need to re-evaluate all your decisions and opinions all the
time, which makes you grow. There is a saying that if you are not somewhat
ashamed of your own code you wrote a year ago, you didn’t really grow as a
developer. But I digress.</p>
<p>So, while yes, I do like <a href="https://jestjs.io/">jest</a> for its convenience and especially for its
<code>expect</code> matchers and snapshot testing, I also dislike it at the same time.
It is an overly large behemoth that tries to do too much, and creates a lot of
problems doing so.</p>
<p>One example is <a href="https://github.com/facebook/create-react-app/issues/5868">buggy handling of TS code</a>,
because well jest aims to support TS out of the box, but fails to do so ever so
subtly.
And while it supports file-level mocking, the way it does that is not always
obvious and can lead to quite some surprising problems. You can mock the local
<code>./send.ts</code> file by creating a <code>./__mocks__/send.ts</code>, but it will then also use
that mock for <code>import X from "send"</code> in a completely different part of your
codebase. This was surprising. But once I figured it out, it also kind of
explains why jest spews tons of warnings when you have two mocks named
<code>./__mocks__/index.ts</code> in different parts of your codebase.</p>
<p>Learning from this, I would recommend to just avoid file based mocking. I will
also re-evaluate my opinion about having a <code>test runner</code> <em>running</em> your tests.
Maybe it would be a better idea to build a testsuite as a dedicated executable
that you can <em>run</em>, which <em>explicitly uses</em> a <em>testing library</em> internally for
organizational purpuses, a concept that for example <a href="https://github.com/lorenzofox3/zora">zora</a> advocates.
I grew quite wary of tools that force you to organize your code in a certain way.</p>
<p>I think I will experiment with this concept in <a href="https://github.com/Swatinem/rollup-plugin-dts">rollup-plugin-dts</a> and
<a href="https://github.com/eversport/intl-codegen">intl-codegen</a> in the future.</p>
DX Challenges of TS/JS Projects2019-05-08T00:00:00+00:002019-05-08T00:00:00+00:00
Unknown
https://swatinem.de/blog/dx-challenges/<p>Well… It has actually been more than one year since my last blog. Back then I
bid farewell to programming for a while, and had plans to switch to Rust
development. Fast forward more than a year, I am back working with TypeScript,
and haven’t written any production Rust code yet.
While I still have high hopes for Rust, it was just more convenient for me to
stick to something I have a lot of experience with, while also trying to learn
from things that Rust does well.
Anyway, I digress.</p>
<hr />
<p>What I want to do right now is to start with a small blog series, in
<em>rubber duck debugging</em> style, to documenting my thoughts and opinions on how to
manage JS/TS projects, and maybe in the process coming up with some viable
solutions to the problems we face.</p>
<p>Let’s start explaining the challenges we currently face by giving some
statistics on our codebase.</p>
<p>According to <a href="https://github.com/XAMPPRocky/tokei">tokei</a>, we currently have <strong>~4000</strong> source files with <strong>>300kLOC</strong>,
excluding external dependencies, tests and automatically generated code.
We are going all-in on TypeScript and use it for all the <em>new</em> code we write.</p>
<p>A <code>tsc</code> invocation gives the following statistics right now:</p>
<pre style="background-color:#fafafa;color:#61676c;"><code><span>Files: 3780
</span><span>Memory used: 1060813K
</span><span>Total time: 31.56s
</span></code></pre>
<p>Running <code>yarn</code> to install all external dependencies takes <strong>~50 seconds</strong>
and pre-compiling <strong>~1500</strong> of our source files another <strong>~18 seconds</strong></p>
<p>Our unit and integration test suites run for <strong>~3 minutes</strong> each, with another
<strong>~12 minutes</strong> for our end-to-end tests running in <a href="https://github.com/Microsoft/vscode">cypress</a> (on CI).
BTW, we have a mixed opinion on cypress, mostly related to their inability to
upgrade their browser to a version that is more recent than <em>electron 1</em> <em>*sigh*</em>.</p>
<p>Running a full lint with both <code>eslint</code> and <code>tslint</code> takes <strong>~2 minutes</strong>.</p>
<p>And since we are creating both more code, and more tests, things will only get
slower.</p>
<p>For local development, I now switched from vim to <a href="https://code.visualstudio.com/">vscode</a>, with which I am super
happy in general. If it were not for the TS language server.
I showed some stats above that running the TS typechecker itself takes <em>~30 seconds</em>
and consumes some <em>~1G</em> of RAM. The language server itself, combined with the
<a href="https://github.com/Microsoft/typescript-tslint-plugin">typescript-tslint-plugin</a> plugin takes an incredibly long time to start, while
consuming in excess of <strong>2G</strong> of RAM. In my last blog post I was lamenting the
way JS apps waste resources like CPU and RAM. Well today I am using more than
<strong>8G</strong> of my 16G just to run a browser, IDE, email client and desktop environment,
all written in JS <em>*sigh*</em>.</p>
<p>But that isn’t even that much of a problem, if at least it would be stable. Hint:
It’s not! While I do love all the code assist features, including the integration
with tslint, it frequently takes <em>ages</em> to give me code-completion or even expand
snippets. Worst of all, I have reached a level of slowness where switching branches
would reliably kill <code>tsserver</code> after a minute due to OOM <em>*sadface*</em>. By now
this starts to slow me down significantly.</p>
<p>Also, it kind of bothers me that these separate tools, such as IDE integration,
linting (for example on pre-commit) and actually running your code share no
state, and thus are doing a lot of redundant work that makes them slow.
More on that in a later post.</p>
<hr />
<p>Apart from these problems that revolve around DX tooling, we also face problems
when pushing code into production, related to building / bundling the code and
around how to deal with external and internal dependencies.</p>
<hr />
<p>This is it for now. In the followup posts, I want to go into more detail on how
I would like to better manage these challenges.</p>
Farewell WebTech2017-03-18T00:00:00+00:002017-03-18T00:00:00+00:00
Unknown
https://swatinem.de/blog/farewell-webtech/<h3 id="tldr"><a class="anchor-link" href="#tldr" aria-label="Anchor link for: tldr">#</a>
TLDR:</h3>
<p>End of this month, I will quit web programming and programming in general for
some time, taking a much deserved time off and decide on my future direction.</p>
<p>It has been a nice ride, let me share a few things.</p>
<h2 id="the-beginnings"><a class="anchor-link" href="#the-beginnings" aria-label="Anchor link for: the-beginnings">#</a>
The beginnings</h2>
<p>I edited my first piece of HTML back when using Netscape Navigator in Windows 95.
It quickly grew into a hobby and starting in 2000, I was managing a few
game-related fan pages.</p>
<p>Back in those days <code><table></code> based layouts with <code>bgcolor</code> and <code><img></code> were all
the rage. There was no CSS back then, and noone was using JS for anything
really. You basically did some PHP3, maybe with a MySQL 3 in the backend. No
real classes in the backend code, no transactional database, no client side
code.</p>
<p>At one point, I made the move from IE6 to Firebird as it was called back then,
simply because it supported these semitransparent images called <code>PNG</code>. It was
crazy, really.</p>
<p>Apart from some game-related fanpages, I also contributed to a browser-based
MMO and did some occasional paid work when I was not in school. I also
contributed to bigger pieces of software, both closed source and open source.</p>
<p>I was always the one embracing bleeding edge technologies. Table-less layouts,
pure-CSS dropdowns, <code>-moz-border-radius</code>, etc.
I actually wrote my Bachelors theses about WebGL, in a time when it was just an
experimental thing in Firefox only, back in 2010.</p>
<p>Before that, I also switched to Linux full-time (apart from gaming), mostly
motivated by the horribly bad Windows Vista. Well thanks for that actually :-)</p>
<p>Along the way, I developed a strong focus on frondend code and JavaScript. My
Masters Thesis was about implementing a sophisticated code completion engine
for JS. I know the language inside and out, hell I even wrote some of it. That
refers to some of the ES2015 features I implemented in SpiderMonkey, the JS
engine running Firefox. One of the achievements I am most proud of.</p>
<h2 id="problems-ensue"><a class="anchor-link" href="#problems-ensue" aria-label="Anchor link for: problems-ensue">#</a>
Problems ensue</h2>
<p>But the web ecosystem itself moves in a direction I am not happy with and that
I also do not want to be part of anymore, really.</p>
<p>One of the problems is the community that is increasingly moving towards a
Java-esque mindset where it is ok to just layer abstraction over abstraction,
all that after you have to install hundreds of megabytes of tooling, all while
wrestling with packagers and half-broken solutions to things like hot module
reloading and things like that.</p>
<p>You are left with either choosing library A that can’t handle usecase X you
have, or with library B that sucks because of completely different reasons.
Things are buggy and incomplete, and nodejs still has not figured out what to
do about ES2015 modules, predictions are it will take a year to sort it out. It
is a real mess.</p>
<p>People jokingly call all this <em>JS fatigue</em>, but it really is draining all your
energy, especially if you have been at it as long as I have.</p>
<h2 id="the-web-as-a-platform"><a class="anchor-link" href="#the-web-as-a-platform" aria-label="Anchor link for: the-web-as-a-platform">#</a>
The web as a platform</h2>
<p>Proponents always say that the web is a single platform that can finally
achieve the <em>“write once, run everywhere”</em> thing we always wanted. Sadly, it is
far from it. It is actually four main browser engines running on different
operating systems on different hardware and form factors. All of those break in
surprising ways and have subtle incompatibilities that haunt you any time you
want to do anything interesting. Not to mention legacy systems.</p>
<p>For the last 2+ years I have been working on
<a href="https://pagestrip.com">pagestrip</a>, which provides a system for digital
publishing which brings user authored content to the web and scales it to the
available viewport pixel-perfectly. It does sound pretty easy, and that is the
reaction I got from most people who asked what I (used to) do for a living.</p>
<p>But the devil is in the detail, and making those things work on all those
different platforms with halfway decent performance and visual fidelity is, at
least for certain things we want to do, simply impossible because of platform
limitations. Chasing down all the edge cases is really frustrating work. Up to
the point where I just <em>snapped</em> and said I could not take it any more.</p>
<p>(As a funny side story, a former collegue of mine quit a few month prior after
a very long time struggling with implementing proper snapping in the editor
part of our platform. He just <em>snapped</em> :-)</p>
<p>I am also jokingly referred to as the <strong>CSO</strong> of the company, the <em>chief
scrolling officer</em>, because one of my recent projects was to implement proper
scrolling with snapping support. You know, because browser-native scroll
snapping is a changing spec that only has some incomplete, incompatible and
buggy implementations in browsers. And we kind of need to have that stuff
working.</p>
<p>But oh boy is it hard! With all those different browers, devices and input
methods. Its a real mess, believe me! One might rightfully say that it really
was what broke me at last.</p>
<h2 id="the-future-of-software"><a class="anchor-link" href="#the-future-of-software" aria-label="Anchor link for: the-future-of-software">#</a>
The Future of Software</h2>
<p>But lets get back to software development and programming languages in general.
There is one big trend I see in recent times, which is to write more and more
software in javascript.</p>
<p>Actually, I think the main goal was to have one language that you could use to
write software for all platforms. Something Java failed to do (thank god for
that), but now JS is kind of achieving, for better or for worse.</p>
<p>I can see this trend looking at the proliferation of both
<a href="https://electron.atom.io/">Electron</a> for desktop development and
<a href="http://facebook.github.io/react-native/">react-native</a> for mobile. And I
really see the appeal of both approaches. You can share, say 80-90%, of your
code across platforms. And electron itself is much more of a <em>single platform</em>
than the mess of incompatible browsers is.</p>
<p>But I am deeply opposed to this trend. Simply because JS / web technologies is
not the right technology to use for this. As much as I have worked with JS over
the years, also my own opinion about things like type systems has changed a lot
over the years. And by now I am ready to say that JS is a horrible language,
just as the web platform is a horrible platform.</p>
<p>One of the reasons is the nature of web technology. It is extremely flexible
and dynamic. But that is also one of the weaknesses. Because it is only ever
going to grow. Let me give you an example.</p>
<p>The number of properties of <code>CSSStyleDeclaration</code> (aka <code>Element.style</code>) will
only ever grow, it will never ever increase. There is no thing like
deprecations or semver-major releases of web tech. Features will never ever be
removed.</p>
<p>The goal of <em>pay for what you use</em> does not work in the context of web
technology. It is impossible just by the nature of the technology. Browsers
will only ever get more complex. They will use up more memory and system
resources, just because of the standards development. And that does not even
mention the ever increasing bloated libraries that pile up layers of
abstraction in a language that <em>has no zero cost abstractions</em>.</p>
<p>I mean we now have 4 cores and 4G of RAM in our phones, and easily 8 cores
and up to 32G of RAM in our desktop computers. Because we actually need that
much to run all the bloated software. Sure, <em>“unused memory is wasted memory”</em>,
but maybe I want actually multitask and be able to run more than one program at
the same time.</p>
<p>Not to mention startup cost. I would argue that before the advent of SSDs,
system bootup and app startup was mostly IO limited. But now with more and more
desktop apps moving over to things like electron, things become CPU bound again
and a simple text editor needs
<a href="https://github.com/atom/atom/pull/13916">up to one second</a> to start. This is
simply unacceptable. I want the PC to wait for me, not the other way around.
And speaking of SSDs, storage might also become a bottleneck when every piece
of software needs a ~100-200M electron bundle. At least on linux distribution
packaged software can deduplicate that. No idea how huge react-native packages
are though, but I bet they aren’t light either.</p>
<h2 id="opposing-trends"><a class="anchor-link" href="#opposing-trends" aria-label="Anchor link for: opposing-trends">#</a>
Opposing trends</h2>
<p>Speaking of react-native, I am actually happy that it will hopefully kill this
thing called mobile web apps. I think those were a terrible idea, contrary to
what other Mozilla fanboys might think, related to Firefox OS.</p>
<p>Anyway, I think mobile web apps were a terrible idea, because they were never
able to match both the performance and the look-and-feel of native mobile apps.
React-native takes care of that by at least providing a native look-and-feel.
And react-native is also increasingly moving performance critical things (such
as animations) to native code, because, oh what a surprise, JS sucks when it
comes to performance.</p>
<p>Electron is kind of the opposite. It rather provides a unified look (and
themeable, that really is a plus) of one app <em>across</em> platforms, although
electron apps will not really match the look-and-feel of native apps for each
desktop platform. And like I mentioned, people are apparently happy with subpar
performance on desktop for whatever reason.</p>
<h2 id="the-hero-we-deserve"><a class="anchor-link" href="#the-hero-we-deserve" aria-label="Anchor link for: the-hero-we-deserve">#</a>
The hero we deserve</h2>
<p>Well I am personally betting quite heavily on
<a href="https://www.rust-lang.org/">Rust</a>, a language that is extremely high
performance (on par or even beating C++), but also safe (no crashes) and most
importantly easy to use. Well at least once its ergonomics are improved a bit,
which is an
<a href="https://blog.rust-lang.org/2017/03/02/lang-ergonomics.html">ongoing task</a>.
I am especially excited about
<a href="https://github.com/rust-lang/rust-roadmap/issues/16">non-lexical lifetimes</a>.</p>
<p>I am also happy to see some progress for the general problem of 2D graphics and
performance, a problem that former(?) Qt developer Zack Rusin was talking about
<a href="http://zrusin.blogspot.co.at/2010/11/2d-musings.html">7 years ago</a>. I mean it
really surprises me every time why modern games can render millions of polygons
and extremely high quality graphics at buttery smooth framerates but web pages
still stutter. I do think that finally we have the proper technologies to move
2D content to the GPU. Not surprisingly, part of that stack is written in Rust.
<a href="https://github.com/pcwalton/pathfinder">Pathfinder</a>, a high quality GPU font
renderer and <a href="https://github.com/servo/webrender">webrender</a>, a 2D scene graph
renderer optimized for web content.</p>
<p>Also the GNOME/GTK community is making progress towards hardware accelerated
drawing in GTK4 via
<a href="https://www.bassi.io/articles/2016/07/05/gsk-demystified-1/">GSK</a>, and doing
experiments around bringing GTK and Rust closer together, which I am very
excited about.</p>
<p>Since electron clearly shows that desktop apps are moving toward a custom
themeable look-and-feel across platforms instead of platform native
look-and-feel, maybe the time would be ripe for a completely new toolkit
powered by webrender, with a css inspired theming solution, but still with a
clear strategy on deprecation and purging/detoxing of obsolete features.</p>
<p>Coupled with a language that actually supports <em>zero cost abstractions</em> and
<em>pay for what you use</em> resource usage.</p>
<p>Or maybe it will rather happen in the form of react-native for desktop powered
by WebAssembly, so you can still use a decent language.</p>
<h2 id="in-closing"><a class="anchor-link" href="#in-closing" aria-label="Anchor link for: in-closing">#</a>
In closing</h2>
<p>Exciting times, certainly. Only time will tell, but I sure am too tired to be
on for the ride. I will just wait this one out, practicing my Rust game in the
meantime, hoping that things will turn out for the better in the long run.</p>
<p>Peace out.</p>
Inadequacies of typed JavaScript2016-09-10T00:00:00+00:002016-09-10T00:00:00+00:00
Unknown
https://swatinem.de/blog/inadequacies-of-typed-javascript/<p>The last week I have been playing extensively with both <a href="https://flowtype.org/">flow</a> and
<a href="https://www.typescriptlang.org/">TypeScript</a>. And I have noticed two things that I consider to be bugs in
both of them. So is there something wrong with my expectations maybe? Lets
analyze both of the cases.</p>
<h2 id="non-nullable-class-members"><a class="anchor-link" href="#non-nullable-class-members" aria-label="Anchor link for: non-nullable-class-members">#</a>
non-nullable class members</h2>
<p>TypeScript (TS) introduces non-nullable types with version 2, and flow has had them
for quite a while I think. There have been tons of stories about how bad the
null type/value is. Some call it the worst mistake in computer science. And
indeed, most of the errors we have at runtime are related to null. Accessing
properties of null, or calling null. Having to check values for null all the
time is tedious and as the runtime errors prove, null can slip through at all
kinds of places.</p>
<p>It is even worse in JavaScript because it does not have one null type but
actually two: <code>null</code> and <code>undefined</code>. To make things worse <code>typeof null === "object"</code> and <code>typeof undefined === "undefined</code>, so you can’t even check for
both cases at the same time; well actually you can use <code>value == null</code> since
that also works for undefined. Go figure.</p>
<p>To make more worse still, JS also has boolean coercion. Which itself can be
convenient, but you can very easily trip up. If you have a value that has the
type <code>number | null</code>, using a simple <code>if (value)</code> will fail for the value <code>0</code>.
Probably not the thing you intended. And you <em>will</em> get it wrong at some point.
I have done so over and over again.</p>
<p>So, to come back to the topic at hand. Both flow and TS now prevent you from
tripping up on most of the null issues, except for the boolean coercion. But
both of them get one case wrong:</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="color:#fa6e32;">class </span><span style="color:#399ee6;">A </span><span>{
</span><span> prop</span><span style="color:#ed9366;">: </span><span style="font-style:italic;color:#55b4d4;">string</span><span style="color:#61676ccc;">;
</span><span>}
</span><span style="color:#fa6e32;">const </span><span>a </span><span style="color:#ed9366;">= new </span><span style="color:#399ee6;">A</span><span>()</span><span style="color:#61676ccc;">;
</span><span>a</span><span style="color:#ed9366;">.</span><span>prop</span><span style="color:#ed9366;">.</span><span>length</span><span style="color:#61676ccc;">; </span><span style="font-style:italic;color:#abb0b6;">// typechecks just fine, fails at runtime
</span></code></pre>
<p>This simple code will typecheck just fine in both flow and TS, but will fail at
runtime with a TypeError because you are accessing a property on null.
I have reported this on the
<a href="https://github.com/Microsoft/TypeScript/issues/10827">TS issue tracker</a> but it
was closed as a duplicate of a <em>wontfix</em>ed issue. Apparently it is too
difficult to correctly cover all the different ways in which constructors can
behave.</p>
<p>As far as I see it, the root of the problem is twofold.
First, JS objects were never meant to have a guaranteed shape, its only these
type checkers that kind of try to enforce those things. And the second problem
is that JS constructors work by assigning/creating properties on <code>this</code>, which
can be aliased or delegated to other functions, etc…</p>
<p>In other languages with stricter guarantees, such as Rust, you do not have
<code>this</code> or constructor functions at all. The language itself guarantees that
creating an object of a certain type is atomic via an object literal that is
guaranteed to include all properties. (You can still use object spread for
convenience)</p>
<h3 id="ideas-to-solve-the-problem"><a class="anchor-link" href="#ideas-to-solve-the-problem" aria-label="Anchor link for: ideas-to-solve-the-problem">#</a>
Ideas to solve the Problem</h3>
<p>Both flow and TS can hold up the typesystem guarantees if you are using object
literals instead of classes. Use plain functions and object instead of classes.
I have heard that one before. The whole functional programming crowd advocates
this. Maybe they have a point? But I do like having methods on
objects/prototypes. So maybe we can combine object literals that are correctly
checked for null properties with methods somehow. Well there is this special
<code>__proto__</code> property that was specified for ES2015. I tried using it with TS,
but failed. The method works perfectly in JS, but typechecking in TS fails.
Well unless I want to copy every method into the object, which completely
defeats the purpose of shared prototype methods.</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="color:#fa6e32;">type </span><span style="color:#399ee6;">A </span><span style="color:#ed9366;">= </span><span>{
</span><span> a</span><span style="color:#ed9366;">: </span><span style="font-style:italic;color:#55b4d4;">string</span><span style="color:#61676ccc;">;
</span><span>}</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">function </span><span style="color:#f29718;">A</span><span>(</span><span style="color:#ff8f40;">a</span><span style="color:#ed9366;">: </span><span style="font-style:italic;color:#55b4d4;">string</span><span>)</span><span style="color:#ed9366;">: </span><span style="color:#399ee6;">A </span><span>{
</span><span> </span><span style="color:#fa6e32;">return </span><span>{
</span><span> __proto__</span><span style="color:#61676ccc;">: </span><span>AProto</span><span style="color:#61676ccc;">, </span><span style="font-style:italic;color:#abb0b6;">// can’t assign to unknown property
</span><span> a
</span><span> }</span><span style="color:#61676ccc;">;
</span><span>}
</span><span>
</span><span style="color:#fa6e32;">const </span><span>AProto </span><span style="color:#ed9366;">= </span><span>{
</span><span> </span><span style="color:#f29718;">method</span><span>(</span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#ed9366;">: </span><span style="color:#399ee6;">A</span><span>) {
</span><span> </span><span style="color:#fa6e32;">return </span><span style="font-style:italic;color:#55b4d4;">this</span><span style="color:#ed9366;">.</span><span>a</span><span style="color:#ed9366;">.</span><span>length</span><span style="color:#61676ccc;">;
</span><span> }
</span><span>}</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">const </span><span>a </span><span style="color:#ed9366;">= </span><span style="color:#f29718;">A</span><span>(</span><span style="color:#86b300;">"a"</span><span>)</span><span style="color:#61676ccc;">;
</span><span>a</span><span style="color:#ed9366;">.</span><span style="color:#f29718;">method</span><span>()</span><span style="color:#61676ccc;">; </span><span style="font-style:italic;color:#abb0b6;">// can’t call unknown method
</span></code></pre>
<p>So no luck here :-(</p>
<h2 id="aliasing-literal-union-types"><a class="anchor-link" href="#aliasing-literal-union-types" aria-label="Anchor link for: aliasing-literal-union-types">#</a>
Aliasing literal / union types</h2>
<p>Both flow and TS have the notion of literal types and union types. Union types
are really simple. The type <code>A | B</code> can either be <code>A</code> or <code>B</code>. But since those
are not native to JS, you have to have some way to actually assert the type at
runtime. For primitives its easy, since you can just check via <code>typeof</code>. But
discriminating objects needs to be done manually with a property used to <em>tag</em>
the object. In the following example, <code>U</code> is such a union type and it can be
discriminated by looking at the <code>type</code> property. And here we are actually using
strings as types: literal types. So in this case <code>U.type</code> has the type <code>"a" | "b"</code>, so it can either be the string <code>"a"</code> or <code>"b"</code>, but nothing else.</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="color:#fa6e32;">type </span><span style="color:#399ee6;">A </span><span style="color:#ed9366;">= </span><span>{ type</span><span style="color:#ed9366;">: </span><span style="color:#86b300;">"a"</span><span style="color:#61676ccc;">; </span><span>propA</span><span style="color:#ed9366;">: </span><span style="font-style:italic;color:#55b4d4;">string </span><span>}</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">type </span><span style="color:#399ee6;">B </span><span style="color:#ed9366;">= </span><span>{ type</span><span style="color:#ed9366;">: </span><span style="color:#86b300;">"b"</span><span style="color:#61676ccc;">; </span><span>propB</span><span style="color:#ed9366;">: </span><span style="font-style:italic;color:#55b4d4;">number </span><span>}</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">type </span><span style="color:#399ee6;">U </span><span style="color:#ed9366;">= </span><span style="color:#399ee6;">A </span><span style="color:#ed9366;">| </span><span style="color:#399ee6;">B</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>Both flow and TS handle these things quite well. If you use <code>.type</code> in a
<code>switch</code> or <code>if</code> statement, you can match for the exact type. But both
typecheckers fail if you want to alias that property, for example via
destructuring: <code>const {type} = X;</code>. Here, the type of <code>type</code> is widened to
<code>string</code> and it can no longer be used to discriminate the union type.
I also reported this in the
<a href="https://github.com/Microsoft/TypeScript/issues/10830">TS issue tracker</a> and
got the answer that is would be two complicated to implement this, because it
would involve tracking the dependency between the two variables, increasing the
depth of the dependency as more aliases are added.</p>
<p>Well at least this can be easily fixed by just always using the property
instead of extracting it into a local.</p>
<h2 id="closing-remarks"><a class="anchor-link" href="#closing-remarks" aria-label="Anchor link for: closing-remarks">#</a>
Closing remarks</h2>
<p>I wish one of the compile-to-js languages would actually support <em>real</em> union
types and has a <em>real</em> match expression. Automatically tagging variants or even
supporting null-pointer optimization so you don’t have to have unnecessary
indirections if a simple <code>typeof</code> check would be enough.</p>
<p>There are already compile-to-js languages that support things like this. For
Example <a href="http://elm-lang.org/">Elm</a> and <a href="http://www.purescript.org/">PureScript</a> (both based on Haskell).
And possibly <a href="https://clojurescript.org/">ClojureScript</a>, even though I know too little about that
language. With the recently released <a href="https://bloomberg.github.io/bucklescript/">BuckleScript</a>, you can also
compile OCaml to JS, with quite readable code, although I don’t quite like that
it encodes structures as arrays and thus misses out on hidden class
optimizations of JS engines (I could be wrong though). With <a href="https://facebook.github.io/reason/">Reason</a>
you can even write readable OCaml.</p>
<p>And of cause, I am very excited to see steady, although very slow progress
towards compiling Rust to WebAssembly. Although I’m unsure about the FFI
between WebAssembly/asm.js and real JS. It relies on a preallocated
<code>ArrayBuffer</code> as a kind of heap, which might be good for low level performance
but does not play well with consuming and outputting plain JS objects.</p>
<p>So the future is still very much open. I really hope there will be a readable,
convenient, fast and <em>safe</em> compile-to-js language that can solve all these
problems and integrates seamlessly into existing JS projects. One can dream
though.</p>
Individualismus2016-06-12T00:00:00+00:002016-06-12T00:00:00+00:00
Unknown
https://swatinem.de/blog/individualismus/<p>Aus gegebenem Anlass möchte ich gerne meine Meinung zu verschieden Themen
erläutern die mir am Herzen liegen. Da ich schriftlich meine Gedanken besser
ordnen kann ziehe ich das einer wörtlichen Diskussion vorerst vor.</p>
<h2 id="mich-stort"><a class="anchor-link" href="#mich-stort" aria-label="Anchor link for: mich-stort">#</a>
Mich stört</h2>
<p>Ich sehe oft Verhaltensmuster in Individuen und in der Gesellschaft die mich
durchaus stören und die ich für gefährlich erachte. Hier einige Beispiel,
manche aktuell, manche alt.</p>
<ul>
<li>Person A schließt Personen aus ihrem Freundeskreis aus weil diese Partei X
wählen.</li>
<li>Person B würde am liebsten Menschen verprügeln die Partei X wählen.</li>
<li>Person C geht zu einer Demostration mit der Absicht Menschen zu beschimpfen.</li>
<li>Personen D und E verarschen mich weil ich <em>nicht gegen</em> Organisation Y bin.</li>
<li>Personen boykottieren und hetzen gegen eine IT Konferenz weil auf dieser
Person F einen Vortrag hält.</li>
<li>Personen hetzen gegen Person G weil diese Jahre zuvor Geld für einen
politischen Zweck gespendet hat.</li>
<li>Person H will am liebsten Partei X verbieten.</li>
<li>Person G bezeichner alle Wähler von Partei X als <em>dumm</em>.</li>
<li>Website W löscht und zensiert Kommentare mit gewissen Inhalten.</li>
</ul>
<p>Wenn man etwas verallgemeinert könnte man dieses Verhalten als Mobbing
bezeichnen. Und glaubt mir, ich kenne mich mit Mobbing aus. Ich war mein Leben
lang Außenseiter und wurde mein Leben lang gemobbt. Genau aus dem Grund
will ich vor den Gefahren dieser Verhaltensmuster warnen.</p>
<h2 id="die-demokratische-grundordnung-in-gefahr"><a class="anchor-link" href="#die-demokratische-grundordnung-in-gefahr" aria-label="Anchor link for: die-demokratische-grundordnung-in-gefahr">#</a>
Die demokratische Grundordnung in Gefahr</h2>
<p>Auf den ersten Blick scheint es sehr weit hergeholt, aber ich finde das dieses
Verhalten im Zusammenhang mit Politik die gesamte Grundordnung auf den Kopf
stellt.</p>
<p>Mal angenommen Person H bekommt ihren wunsch und Partei X wird verboten.
Vielleicht weil Partei X <em>zu rechts</em> ist. Als nächstes verbieten wir dann noch
Partei Y weil sie <em>zu kommunistisch</em> ist. Dann Partei Z weil sie <em>zu öko</em> ist.
Partei W verbieten wir auch aus irgendeinem anderen Grund. Und ehe wir uns
versehen haben wir Scheinwahlen auf deren Stimmzettel nur eine
Auswahlmöglichkeit ist.</p>
<p>Und wie sieht es mit Überwachung und Zensur aus? Es wird tatsächlich zunehmend
zensiert im Internet. Unter dem vorbehalt <em>rechte</em> oder <em>hass verbreitende</em>
Kommentare zu löschen. Aber es geht auch schon einen Schritt weiter. Ein Freund
hat mir neulich geschildert wie er zum lösches seines Facebook Kontos
getrieben wurde. Nachdem er einen Beitrag einer politischen Partei kommentiert
hat wollte Facebook urplötzlich seine Identität durch eine Passkopie
feststellen. Ich selbst nenne es einen meiner Lebensgrundsätze niemals bei
Facebook zu sein und nenne schon seit Jahren Facebook als eine der größten
Gefahren für das freie Internet.</p>
<p>Die Frage ist auch wo wir die Grenzen ziehen. Wir haben heute schon Person C
die absichtlich Menschen wegen ihrer politischen Meinung beschimpft, wir haben
schon Hetzkampagnen gegen Einzelpersonen, Parteien oder sogar IT Konferenzen.
Wo ist die Grenze? Was ist wenn Person B tatsächlich anfängt Anhänger von
Partei X zu verprügeln? Irgendwann sind wir so weit das wir deren Häuser
anzünden, und sie schlussendlich ermorden.</p>
<p>Ist das alles so weit hergeholt? Vor 70-80 Jahren hat es sicherlich auch mit
Mobbing angefangen. Mit dem ausschließen aus dem Freundeskreis. Mit
Beschimpfungen und dem Boykott von Veranstaltungen. Wie es geendet hat steht in
den Geschichtsbüchern.</p>
<p>Mobbing ist tatsächlich eine starke Waffe. Es kann dem Menschen den Gedanken
in den Kopf setzen er müsse sich für seinen Körper schämen. Oder für seine
sexuelle Orientierung. Und jetzt sogar für die politische Meinung? Wenn durch
Mobbing dies auch noch manipuliert wird sehe ich auch deswegen die
demokratischen Grundsätze in Gefahr.
Noch gut das es ein Wahlgeheimnis gibt. Oder sollen wir dieses auch bald
abschaffen?</p>
<h2 id="ein-apell-fur-ein-friedliches-zusammenleben"><a class="anchor-link" href="#ein-apell-fur-ein-friedliches-zusammenleben" aria-label="Anchor link for: ein-apell-fur-ein-friedliches-zusammenleben">#</a>
Ein Apell für ein friedliches Zusammenleben</h2>
<p>Ich habe zwar selbst viele Vorurteile. Aber ich versuche dennoch jeden Menschen
mit Respekt und Würde zu behandeln, ungeachtet dessen Hautfarbe, Geschlecht,
Religion und <em>politischer Meinung</em>. Sofern sie mir den selben Respekt entgegen
bringen :-)</p>
<p>Ich finde es sehr schade dass viele Leute die ich kenne, und die sich selbst
als ach so tolerant bezeichnen dessen nicht fähig sind und stattdessen Menschen
mit anderer politischer Meinung beschimpfen oder am liebsten verprügeln würden.
Eben von den Leuten erwarte ich das sie es besser wissen sollten.</p>
<p>Ich würde die Menschen in meiner Umgebung eher als <em>links</em> einordnen, falls ihr
mir das Schubladendenken verzeiht. Aber das Maß an Hass den diese Menschen
gegen <em>rechte</em> versprühen übersteigt meine Toleranz etwas.
Und nicht nur das. Auch Menschen die <em>nicht gegen</em> rechts sind kriegen diesen
Hass teilweise ab. So wie ich.</p>
<p>Ich bin nicht <em>gegen</em> rechts. Ich bin auch nicht <em>für</em> rechts. Ich bin dafür
das jeder Mensch ein Recht darauf hat sich frei zu entfalten und mit Würde
behandelt zu werden. Und ein Recht auf freie Meinungsäußerung usw, auch wenn
die Meinung vielen Leuten nicht gefällt.</p>
<p>Dennoch muss man sich im Kopf behalten dass die eigene Freiheit dort aufhört wo
sie die Freiheit anderer Menschen einschränkt. In diesem Sinne: Seid lieb
zueinander :-)</p>
The one killer feature icon fonts have over svg2016-01-31T00:00:00+00:002016-01-31T00:00:00+00:00
Unknown
https://swatinem.de/blog/the-one-killer-feature-of-icon-fonts/<p>The reason I write about this now is githubs icons started to look like shit on
my screen/system since a week or so. I asked myself why and looked at the
source to find out they switched from using an icon font to inline svgs.
There is quite some controversy about whether to use an icon font or svgs for
icons. Just <a href="https://www.google.at/search?q=svg+vs+icon+font">google for it</a> to
find some articles about that topic.</p>
<p>What neither of those articles mentions is the one killer feature that icon
fonts have over svg. Actually its two features, but they are a bit related.
It is this: <a href="https://en.wikipedia.org/wiki/Font_hinting"><strong>hinting</strong></a> and
<a href="https://en.wikipedia.org/wiki/Subpixel_rendering"><strong>subpixel anti-aliasing</strong></a>.</p>
<p>In english, hinting means that the font rendering will <em>snap straight lines to
device pixels</em>, to make sure that fonts always look crisp. And subpixel AA
means that the font rendering can boost the horizontal resolution by 3 due to
the fact how lcd hardware actually works.</p>
<p>Both of these features are unique to fonts, they do not apply to svg. Hinting
can actually distort the proportions somewhat to make sure straight lines snap
to pixels, you clearly do not want that for svg. And although subpixel AA may
be possible for svg, I have never seen that been done so far.</p>
<p>There is quite some hoops you have to just through to make svg, and especially
<code><canvas></code> look good and crisp. If you want a line that is <code>1px</code> <em>wide</em> to look
good, you have to place it at a <code>0.5px</code> coordinate, so when you add the width,
it ends up at <code>[0, 1]</code> and therefore looks crisp. You do <em>not</em> want to do that
for a <code>2px</code> line though, because you end up with <code>[-0.5, 1.5]</code> which looks like
shit again. It’s hard. Believe me, I’ve been through that already.</p>
<p>The point im trying to make here though is that in order to make svgs look
good, you actually have to give up on one key selling point of svgs:
<strong>scalability</strong>! You have to author svgs, specifically for one size to make
sure that lines snap to device pixels. Fonts will just do the work for you.</p>
<h2 id="caveats"><a class="anchor-link" href="#caveats" aria-label="Anchor link for: caveats">#</a>
Caveats</h2>
<p>All that I have written about here applies to my personal system, which has a
device pixel density of 1. For low dpi screens, hinting matters <em>a lot</em>. The
problems I described here are <em>much</em> less of a problem on high resolution
screens.</p>
<p>The second thing to note is that <strong>content authors do not control font
rendering</strong> all the way. Sure, there are css properties like <code>font-rendering</code>
and friends, but for example the hinting behavior can not be controlled by the
author, but is a system wide setting. On my system, I have hinting turned up
quite high because I prefer light crisp fonts rather than thick blurry ones.
One of the hits you find on google for that topic mentions that icon fonts look
blurry compared to svgs. They might, <em>depending on your font rendering
settings</em>. On the other hand though, svgs look blurry if lines do not line up
with physical pixels, pun intended.</p>
<p>Fact is that font rendering settings also very much depend on your personal
taste. Some people for example have font smoothing / anti-aliasing disabled
altogether, most likely because they don’t know such settings exist. If I have
to use someone elses system, turning on font AA is the first thing I do,
because it really hurts my eyes.</p>
<p>I have a also recently seen a developer post a
<a href="https://github.com/rust-lang/rfcs/pull/243#issuecomment-172473100">screenshot</a>
without font AA that also hurt my eyes really bad. Come on, it’s 2016 for fucks
sake! There is absolutely no reason to not activate this single most important
feature of a modern desktop!</p>
Doing the impossible:2015-07-26T00:00:00+00:002015-07-26T00:00:00+00:00
Unknown
https://swatinem.de/blog/doing-the-impossible-choosing-a-material-design-framework/<p>The story starts like this: I am about to start developing a native-looking
mobile App. And I would like to use Material Design for it. Since its awesome
and I want a native look and feel.</p>
<p>And just as a precursor: Similar to Paul Lewis, I am also known to <a href="https://aerotwist.com/blog/polymer-for-the-performance-obsessed/">hate
everything that looks and smells like a
framework</a>.
Even though Material Design is just about styling, most of the Material Design
Frameworks do come with some baggage in the form of JS and Framework lock-in.
I am also extremely opinionated when it comes to JS Frameworks and Conventions.
I kind of feel like I am chasing for perfection instead of getting shit done.
But well that’s just how I roll :-(</p>
<p>So I was mainly looking at
<a href="https://github.com/dogfalo/materialize/">Materialize</a> and
<a href="https://github.com/callemall/material-ui">Material-UI</a> and the paper elements
of <a href="https://github.com/PolymerElements/polymer-starter-kit/">Polymer</a>. And then
just during my research google released
<a href="https://github.com/google/material-design-lite">Material Design Lite</a>. I also
looked at some other smaller contestants, but those four came out top.</p>
<p>Before I start dissecting the choices, lets just say that every one of those is
missing some things that I kind of need. It is kind of disappointing really. So
many contestants, but all are lacking in some ways. :-(</p>
<p>So lets start.</p>
<h2 id="materialize"><a class="anchor-link" href="#materialize" aria-label="Anchor link for: materialize">#</a>
Materialize</h2>
<p>Seeing how materialize was a bunch of jquery plugins, I quickly removed it from
the list of options. jQuery was nice ten years ago, but I think we can do
without it by now.</p>
<p>I actually did try the other three, Material-UI, MDL and Polymer.</p>
<h2 id="material-ui"><a class="anchor-link" href="#material-ui" aria-label="Anchor link for: material-ui">#</a>
Material-UI</h2>
<p>While I do like the concept and the composability of a react-like library,
react itself is a bit too big for my taste. Putting material-ui into the mix, a
simple page with just an AppBar comes to 1M of code. As opposed to all others,
I actually talk unminified ungzipped code here. LOC would be a better measure
but I don’t have exact numbers, except that it is HUGE.</p>
<p>It also has some kind of boilerplate:</p>
<pre data-lang="ts" style="background-color:#fafafa;color:#61676c;" class="language-ts "><code class="language-ts" data-lang="ts"><span style="color:#fa6e32;">import </span><span>React </span><span style="color:#fa6e32;">from </span><span style="color:#86b300;">"react"</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">import </span><span>injectTapEventPlugin </span><span style="color:#fa6e32;">from </span><span style="color:#86b300;">"react-tap-event-plugin"</span><span style="color:#61676ccc;">;
</span><span style="color:#fa6e32;">import </span><span>{ Styles</span><span style="color:#61676ccc;">, </span><span>AppBar</span><span style="color:#61676ccc;">, </span><span>IconButton } </span><span style="color:#fa6e32;">from </span><span style="color:#86b300;">"material-ui"</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#f29718;">injectTapEventPlugin</span><span>()</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">const </span><span>ThemeManager </span><span style="color:#ed9366;">= new </span><span style="color:#399ee6;">Styles</span><span style="color:#ed9366;">.</span><span style="color:#399ee6;">ThemeManager</span><span>()</span><span style="color:#61676ccc;">;
</span><span>
</span><span style="color:#fa6e32;">class </span><span style="color:#399ee6;">App </span><span style="color:#fa6e32;">extends </span><span style="color:#399ee6;">React</span><span style="color:#ed9366;">.</span><span style="text-decoration:underline;color:#399ee6;">Component </span><span>{
</span><span> </span><span style="color:#f29718;">getChildContext</span><span>() {
</span><span> </span><span style="color:#fa6e32;">return </span><span>{
</span><span> muiTheme</span><span style="color:#61676ccc;">: </span><span>ThemeManager</span><span style="color:#ed9366;">.</span><span style="color:#f29718;">getCurrentTheme</span><span>()
</span><span> }</span><span style="color:#61676ccc;">;
</span><span> }
</span><span>
</span><span> </span><span style="color:#f29718;">render</span><span>() {
</span><span> </span><span style="color:#fa6e32;">return </span><span>(
</span><span> </span><span style="color:#ed9366;"><</span><span style="color:#ff8f40;">AppBar
</span><span> title</span><span style="color:#ed9366;">=</span><span style="color:#86b300;">"Foo"
</span><span> iconElementLeft</span><span style="color:#ed9366;">=</span><span>{
</span><span> <IconButton iconClassName</span><span style="color:#ed9366;">=</span><span style="color:#86b300;">"material-icons"</span><span style="color:#ed9366;">></span><span>arrow_back</span><span style="color:#ed9366;"></</span><span>IconButton</span><span style="color:#ed9366;">>
</span><span> }
</span><span> </span><span style="color:#ed9366;">/>
</span><span> )</span><span style="color:#61676ccc;">;
</span><span> }
</span><span>}
</span><span>
</span><span>App</span><span style="color:#ed9366;">.</span><span>childContextTypes </span><span style="color:#ed9366;">= </span><span>{
</span><span> muiTheme</span><span style="color:#61676ccc;">: </span><span>React</span><span style="color:#ed9366;">.</span><span>PropTypes</span><span style="color:#ed9366;">.</span><span>object
</span><span>}</span><span style="color:#61676ccc;">;
</span><span>
</span><span>React</span><span style="color:#ed9366;">.</span><span style="color:#f29718;">render</span><span>(<</span><span style="color:#399ee6;">App</span><span> /></span><span style="color:#61676ccc;">, </span><span>document</span><span style="color:#ed9366;">.</span><span>body)</span><span style="color:#61676ccc;">;
</span></code></pre>
<p>As far as I understand, this means that you can change the theme dynamically,
which would be awesome. But the boilerplate is a bit annoying nonetheless.</p>
<p>On the plus side, Material-UI has some nice special elements like date and time
pickers. It seems to be quite complete.</p>
<p>What I don’t like about React is that you can’t have arrays of elements, except
for <code>children</code>. But you can only have one <code>children</code> array per element. So
Material-UI passes some children as props, which just feels wrong to me. I know
that a lot of React libraries use this pattern, but it still feels wrong to me.</p>
<h2 id="polymer"><a class="anchor-link" href="#polymer" aria-label="Anchor link for: polymer">#</a>
Polymer</h2>
<p>I started out with the full polymer starter. Maybe that was a problem, since
its also HUGE. It was downloading 300M of npm packages, plus a lot of things
from bower. It just generated hundreds of files and I have no idea for what
purpose. I simply couldn’t understand all that code. And that’s not even
library code. It’s supposed to be your own app code.</p>
<p>Also, I’m not really a friend of html imports. I would rather have ES6 modules
and wire up as much through JS as possible, maybe even CSS. But having
everything as HTML modules with code inside script tags just feels kind of
backwards to me. Maybe I should start with an empty project so I wont be
overwhelmed from the start.</p>
<p>I am also kind of ambivalent when it comes to web components in general. Sure,
it is supposed to be implemented natively in the Browser. But you still have to
pay high costs for it. The browser will have to (recursively) resolve the
imports and load the things. HTTP2 Push will help a lot, but still. Also there
is a lot of JS involved at app start. While it does not matter for the project
at hand, I think think web components, depending on the way you use them might
not be that great for SEO, serving static html and load times. But I might be
mistaken completely. For now the whole machinery with html imports and the
tools that are used to vulcanize them seem to be a little overkill for me.</p>
<p>I actually haven’t looked into how big the resulting code is, but I would
probably have to use it in conjunction with a React-like library anyway so that
would grow the size considerably as well.</p>
<h2 id="material-design-lite"><a class="anchor-link" href="#material-design-lite" aria-label="Anchor link for: material-design-lite">#</a>
Material Design Lite</h2>
<p>MDL was actually released right the moment when I was starting my research.
Being mainly about CSS, with just little JS involved (sadly you can’t quite do
without), and directly from google, it felt like it might be the best
contestant. But it is also quite new and incomplete. It has no list components
as of yet. And the Appbar kind of has a menu button hardcoded, I’m not sure how
to replace that menu button by a back button.</p>
<p>Also something that really annoys me is that it is <a href="https://github.com/google/material-design-lite/issues/833">not yet easily embeddable
in a typical app based on CJS or ES6
modules</a>. The JS
code relies on being globally imported, which I don’t particularly like.
Hopefully this will be fixed, but until then its just annoying.</p>
<p>The JS Code comes to roughly 110K unminified and it is not too bad to read. The
CSS also amounts to roughly the same size.</p>
<p>Again, I would use it in conjunction with a React-like library so that would
add some weight. All in all, it feels to be the simplest base if I happen to
do a lot of custom things. And since it has no lists yet, I would have to do at
least those myself.</p>
<h2 id="conclusion"><a class="anchor-link" href="#conclusion" aria-label="Anchor link for: conclusion">#</a>
Conclusion</h2>
<p>I am still not happy with the choices I have to be honest. And I still feel
like I’m standing in my own way. Since I want to use something that I actually
like to use, as in feeling like I’m doing the right thing as opposed to getting
things done but feeling like its all a big hack.</p>
<p>So I’m basically standing still for now. Maybe I will give a clean Polymer
project another chance, maybe I will go with MDL and just implement certain
things myself. Most likely I would have to implement quite some things myself
anyways.</p>
New Blog2015-07-23T00:00:00+00:002015-07-23T00:00:00+00:00
Unknown
https://swatinem.de/blog/new-blog/<p>I have decided to start over.
This time instead of rolling my own static site generator or blogging system, I
decided to try hexo with a slight modification of its default theme.</p>
<p>And from now on I think I will blog more about technical things rather than
personal or philosophical ones. And also to do so in english.</p>
<p>So enjoy!</p>