Howto Design an infallible algorithm that records errors

7 July 2021 — 5 min

A quick note first: The "howto" does not mean I present a solution here, but rather that I am searching for a good solution.

With that out of the way, let me explain what I want to do. I would like to have an algorithm that always succeeds and produces some kind of result. And I want to simultaneously output any kind of error that happens while running that algorithm.

I can quickly think about two solutions here (please ignore any syntax or type errors here):

fn solution_1(input: Input) -> (Output, Vec<Box<dyn Error>>);

trait ErrorCollector {
    fn record_error<E: Error>(&mut self, error: E) {}
}

fn solution_2(input: Input, error_collector: &mut dyn ErrorCollector) -> Output;

The first solution returns a list of errors along with the normal output. One disadvantage of this is that it allocates and boxes the errors, so it won’t work in a no_std environment. Although that is not necessary for my particular use-case, even though it might be nice to have in general. The other disadvantage is that is will allocate and return the list unconditionally, even if the caller just doesn’t care.

The second solution takes a collector that we feed errors into. The caller is responsible to either collect the errors or to just ignore them. I kind of favor this solution.

One more question would be: Do I always want to continue processing when an error happens? With the ErrorCollector solution, it might be possible to give feedback whether the processor should continue, or early-exit. The processor might as well continue to return a Result as well, making a difference between recoverable or non-recoverable errors.

# Some prior Art

I do have a thing for programming language tooling, and there is quite some prior art for resilient parsers there. It is a question of how convenient it is to work with the results however.

There is a difference between parsers outputting an abstract syntax tree, and ones that rather output a concrete syntax tree. The ones that work with CSTs seem to rather output token trees, with special Error tokens which can happen anywhere in the CST. On the other hands, previous AST-based parsers would rather output a list of errors along with the normal AST. The thing that makes these inconvenient is that all the AST properties are optional, which makes it a bit hard to use.

I have worked with the output of the TypeScript parser a lot. It is a resilient parser that can as well parse invalid code as much as possible. However for my use-case, I only work with valid code. And for that use-case, having to always match on, or unwrap the TypeScript equivalent of Option is a bit of a pain.

I have heard some good things about the incremental and resilient parser of Rust-Analyzer, which produces a token-based CST, as far as I know. I have never directly worked with it, so I can’t speak to how convenient it is to use and match on code structures though.

# My use-case

At sentry, we deal a lot with native executable and debugging formats. The two major formats there are PDB, the debugging format on Windows, and DWARF, the one on Linux and Mac. The low level parsers for these formats can spew all kinds of errors from almost every single function/accessor/iterator they provide.

However, we want to extract as much high-level information from these formats as possible. Parsing these formats can fail for all kinds of different reasons. For one, the compilers producing these formats might have bugs and they generate some invalid records in these files. Or our parsers might be buggy, or they might fail if they encounter some records that are not supported.

We want to be as resilient as possible. We shouldn’t fail processing the whole debug information file, if only one record deep inside of it is faulty, for whatever reason. We also want to get detailed errors so we know how to improve our parsers to add support for new formats and fix bugs in it.

This is a hard problem to solve. Also, how resilient do we want to be? How do we know if we can parse 99% of the file correctly, but just fail one single entry, vs if we can only parse a single entry, and fail to parse 99% (the rest) of the file.

How do we want to continue when we encounter an error? Do we skip the whole record, or do we rather emit some kind of sentinel record? To give a concrete example. If our job is to match an instruction address to a (Function, File, Line) tuple, should we rather say "we have no record", or should we rather return a record such as (function "foo", in "unknown file", on line "unknown")?

I think we can extend the ErrorCollector pattern, and also give it a context of what we try to process, or even give it the possibility to return a sentinel value?

I’m struggling a bit with figuring out how this would actually work. Also, I’m not sure we want to give the collector all that much choice, or if we should just make these decisions ourselves. If we encounter an error parsing the filename of a function record, just return "unknown", if we encounter an error parsing the whole function record, just "skip and continue with the next one"?

The question is also, how specific should this be to our own use-case? We have a higher-level library wrapping a low-level one. But we are not the only user of this higher-level library. There are other users as well. How fine-grained should the control be that we expose? How can we evolve this without forcing breaking changes on other users?

# Some Brainstorming

Lets sketch up a possible API for this:

trait ErrorCollector {
    fn record_error<E: Error>(&mut self, error: E) -> Result<(), E>;
}

struct AlwaysError {};
impl ErrorCollector for AlwaysError {
    fn record_error<E: Error>(&mut self, error: E) -> Result<(), E> {
        Err(error)
    }
}

struct AlwaysIgnore {};
impl ErrorCollector for AlwaysIgnore {
    fn record_error<E: Error>(&mut self, error: E) -> Result<(), E> {
        Ok(())
    }
}

#[derive(Default)]
struct CollectErrors {
    errors: Vec<Box<dyn Error>>;
}

impl CollectErrors {
    pub fn into_inner(self) -> Vec<Box<dyn Error>> {
        self.errors
    }
}

impl ErrorCollector for CollectErrors {
    fn record_error<E: Error>(&mut self, error: E) -> Result<(), E> {
        self.errors.push(Box::new(error));
        Ok(())
    }
}

fn err_or_with<Collector, F, T, E>(collector: &mut Collector, result: Result<T, E>, ok: F)
where
    Collector: ErrorCollector,
    F: FnOnce() -> T
{
    // hm, do we have such a combinator on `Result` already?
    match result {
        Err(err) => {
            match collector.record_error(err) {
                Ok(_) => Ok(ok()),
                _ => _
            }
        }
        ok => ok,
    }
}

macro_rules! err_or_continue {
    ($collector:expr, $expr:expr $(,)?) => {
        std::result::Result::Err(err) => {
            match collector.record_error(err) {
                Ok(_) => continue,
                _ => _
            }
        }
        ok => ok,
    }
}

fn process<C: ErrorCollector>(collector: C) -> Result<Vec<&str>, Error> {
    let mut names = vec![];
    for res in some_iter {
        let res = err_or_continue(collector, res)?;

        let name = err_or_with(collector, res.get_name(), || "unknown")?;
    }
    Ok(names)
}

I would really appreciate some feedback on this.