Pitfalls of fallible Iterators
— 5 minI wanted to write about this topic quite some time ago, but it seems my tendency to procrastinate won in the end. Until today, so lets get started.
The topic at hand are fallible Iterators, and this post is motivated by a real world problem that I fixed both at the producer end in wasmparser and the consumer in symbolic.
When parsing some binary data files, for example wasm
as in the example above,
errors can happen all the time. Files might be truncated, they might be
corrupted either by some random bitflips or bad compression, by a faulty writer
or they might be malicious and created by an attacker specifically to exploit
bugs in the parser.
Especially for the last reason, parsers in general need to be very robust. As we are dealing with Rust we are lucky because in safe Rust at least we can’t corrupt our internal program state and thus we should be safe against executing untrusted code. But as we will see, bad things can still happen.
When parsing such files, we at Sentry want to be as forgiving as possible though. We want to extract as much usable information from a file as possible. Even if our parsers are incomplete and can’t handle some cases, or the original producers are buggy and produce invalid files (yes, this happens more often than I would like to admit), we still want to make use of the stuff that was correct and that we can parse.
We don’t want to reject the complete file because a single string was not utf-8 encoded for example. We also don’t want to early-return on the first invalid string, but rather skip ahead to next valid one.
So what are fallible Iterators? There is two general patterns that I saw:
next(&mut self) -> Option<Result<T, E>>
next(&mut self) -> Result<Option<T>, E>
They are both very similar, and you can almost convert between the two using
transpose
, but I will argue that they express slightly different intentions
that I will also highlight. So lets look at both in some detail.
#
next(&mut self) -> Option<Result<T, E>>
This pattern is nice because we are dealing with a real impl Iterator
that
we can use in for
loops.
Lets see what else we can do with these:
// we can propagate errors early:
for item in iter {
let item = item?;
// ...
}
// break on first error:
for item in iter {
let item = match item {
Ok(item) => item,
Err(_) => break,
};
// ...
}
// or we can skip over errors:
for item in iter {
let item = match item {
Ok(item) => item,
Err(_) => continue,
};
// ...
}
// or even simpler, since `Result` implements `IntoIterator`:
for item in iter.flatten() {
// ...
}
We can also directly collect
this into a Result<Vec<T>, E>
which might
or might not be useful.
#
next(&mut self) -> Result<Option<T>, E>
This pattern is slightly more cumbersome to deal with, as it is not a real
impl Iterator
. But there is even the fallible-iterator
crate
to make this more convenient.
// we can propagate errors early:
while let Some(item) = iter.next()? {
// ...
}
// stop on first error:
while let Ok(Some(item)) = item.next() {
// ...
}
// or we can skip over errors in a weird way:
loop {
let item = match iter.next() {
Ok(Some(item)) => item,
Ok(None) => break,
_ => continue,
};
// ...
}
# Whats the difference?
Well not a whole lot to be honest. And I would say it comes down to a matter of taste which variant people chose.
However, in my opinion they do express different intentions, let me explain.
To me, the Option<Result<T, E>>
pattern signals that the produces knows there
is something to parse, or phrased the other way around: the producer knows when
the end is reached without actively parsing something.
The Result<Option<T>, E>
pattern however says that the producer has no idea if
it is at the end, unless it tries to parse more, which can obviously fail.
# Can I ignore errors?
Well, its complicated.
If you want to be safe, the answer is no. You gotta propagate the first error
you see. As the two issues I linked in the beginning show, you can never
assume that the producer is well behaved. It might just return the same parse
error over and over again till infinity, which is a super bad place to be in.
Or it might skip ahead to the next parsable item. Or it might behave like a
"fused" Iterator and return None
afterwards. You really can’t tell. And that
is the problem I wanted to highlight.
# Conclusion
Parsing is hard, errors will inevitably happen. How to deal with those depends on the use cases. In general, I do have some tips from experience:
- As a consumer, assume the worst and always propagate errors by default.
- If you want to be lenient, audit the producer code carefully to make sure it recovers or terminates after errors correctly.
- Be mindful that any update can break these assumptions!
- If in doubt, error out.
- As a producer, make sure that your iterators always terminate no matter what.
- It might be good to have tests that just do
for _ in iter {}
or the equivalent forResult<Option<T>, E>
to check that iterators do not loop infinitely. - The above test is perfect for fuzzing btw ;-)
- Document the behavior of your iterators!
Well this is it. Which of these two pattern do you prefer? Do you agree with my assessment about which intentions these patterns express?