Error Handling Considerations

15 May 2019 — 11 min

Error handling is a really important aspect of software engineering, and also a very complex one with a lot of sides that need to be considered. Lets try to break this down into a few key aspects.

# Types of Errors

By types, I do mean the distinction between recoverable errors and unrecoverable errors, or expected errors and unexpected errors. In general the difference is that expected / recoverable errors are handled explicitly by code, and unrecoverable errors usually crash the program or are handled at a much coarser granularity, such as per-thread or per-request. One important thing to note first is that this distinction is up to the developers on a per-project basis. For example, a missing file can be treated as an unrecoverable error when reading configuration on program start, where a crash will instruct the developers to correctly configure it. In other parts of the program, a missing file must be handled explicitly, because it might depend on user input and must not crash the running program.

# Errors as Data

Especially expected errors need to be displayed to end users. For example with a nicely formatted Error Box in the case of web UI.

One very important aspect here, which sadly almost noone gets right is that error messages need to be localizable (!!!) For this to work, the error needs to be serializable, including every kind of meta information that might be displayed to the user. For the example above, an abstract representation of the error should include information that opening a file with a certain path/name failed (and possibly some more).

I would argue that this is not really important, or even bad for unrecoverable errors. These should be logged instead, because they are most likely caused by some developer fault. A stack trace should be provided as a form of metadata to help developers fix this error. However, these kind of errors should only be shown to end-users in a very opaque way, and avoid leaking internal implementation details such as stack traces which could be used to maliciously attack a service.

# Control Flow of Errors

Here I see a distinction between explicit treatment of errors, and explicit handling. This is very tightly coupled to language syntax features or idioms. In general, explicit error handling comes at the expense of more boilerplate but can also potentially lead to better software in my opinion.

But what do I mean by explicit or implicit at all? Well, per my definition explicit means that functions will return or pass as callback parameter a strictly typed error object, and/or a value. Implicit error handling in contrast means that errors are thrown somewhere and unwind the call stack up to a catch block. This way they lose most/all of their type information. However this reduces a lot of boilerplate, because code does not deal with statements that could potentially throw everywhere.

# Comparing different Languages

Now lets look at some examples in different programming languages. Most of them support both explicit and implicit error handling. And in most cases, explicit handling does not need to be a language feature itself, but can also be implemented as a library.

# Example: Haskell

First off: I don’t really know this language well, so I might be wrong about some points.

Haskell is a very strictly function language, and claims to be very safe. Instead of nullable values, it has a Maybe type that either has Just a value or Nothing. Similarly, it uses the type Either to denote a Right value or a Left error. The language has special syntax to chain functions together that either work with a Just or Right value, or short circuit and just return the Nothing / Left.

I might be wrong about this, but I think Haskell and other very strict functional languages don’t even have the notion of throw that unwinds the stack.

# Example: Go

Again, I don’t really know go, this only summarized some things I have read online.

In Go, most functions will return a compound value, with a value and an error.

value, error := someFn()
if error != nil {
   return nil, error
}

I don’t really know how strict the go typechecker is. But having to explicitly check for nil everywhere is a real anti-pattern IMO.

Again, I don’t know if go actually has the concept of throw, however I have never seen any example of it.

# Example: TS

TypeScript actually supports different kinds of error handling patterns.

The callback-based style that is common in node and in older libraries will look similar to the go example. You will provide a callback function with two parameters, and need to explicitly check and early-return on errors.

someFn((error, value) => {
  if (error) {
    return callback(error);
  }
  // …
});

More modern async/await based code has support for try/catch.

Apart from this, some code might also use explicit return types such as Haskells Maybe.

But using either callbacks or explicit library provided types such as Maybe has the significant drawback that basically any code can just throw and punch though that abstraction layer.

Also, since the support for these explicit styles have no dedicated language/syntax support, they come with some boilerplate and inconvenience.

The problem with try/catch however is, that there is absolutely no guarantee on the value in a catch block by definition. TS even has an explicit compiler error that states that Catch clause variable cannot have a type annotation.

You can just throw 1 and that is valid code. This by itself can cause a lot of problems. We actually shipped code to production that throw-ing inside of a catch block because it made wrong assumptions on the shape of object it caught.

# Example: Rust

Even though I have not actually used Rust that much, I read a lot about it and I admire a lot of the things it does.

IMO, Rust gets a lot of things just right. Error handling is no different.

It basically has two mechanisms, a Haskell-esque Result type for recoverable errors, and a throw-like mechanism (called panic!) for unrecoverable errors.

Apart from this, it has dedicated syntax (?) to make explicit error handling extremely convenient. It is also possible to implement the special From/Into trait to convert between error types completely automatically, without any additional boilerplate.

  let mut s = String::new();
  File::open("hello.txt")?.read_to_string(&mut s)?;
  Ok(s)

Here, both opening the file and reading can potentially error, and just chaining the ? operator will early-return a Result and automatically convert the IoError into your application specific Error type if a corresponding From/Into implementation exists.

In contrast to that, the panic! mechanism will unwind the callstack in the case of unrecoverable errors.

In general, there is also the community consensus that libraries should always return Results. It is the choice of the application if and how to handle those errors. An application can use unwrap or expect to essentially throw on errors.

Read more on how error handling in Rust works.

# Where to handle Errors

There is quite some controversy in our team around where to actually handle these errors.

Lets take a simple Database Repository as an example. Lets assume there is a findOne method. By definition, this will return a nullable type. At least if the manual type declaration is correct. Sadly, most database libraries are completely untyped in TS :-(

Currently we have three different patterns around this:

First, the type definition might just be wrong, and assume a non-nullable type which is actually nullable and might result in the typical undefined is not a function kind of errors.
Developers might use the non-nullable assertion operator (!) and consciously decide to implicitly throw a undefined is not a function error.
A developer might add an if with an explicit throw. This is a lot of boilerplate.

When we go back one step and say that libraries should return the most correct types, it means that the types need to be marked as nullable, so the first case is definitely wrong.

But lets focus on where we are in the program flow.

When we are at the IO boundary to some user provided data, such as an id, we have a recoverable error in the sense that we can provide the user with a meaningful error message, such as a 404 error.

However, in a deeper layer of the application, I would argue that this case should not occur, and if it does, it would be a programmer logic error. In my opinion, doing explicit error handling in this layer is way too much boilerplate and is actually harmful to the readability and understandability of the code logic.

For this reason, I would argue that once user input is validated, any deeper code should just use non-null assertions, or maybe a more explicit .expect() function and throw with a normal JS error.

# Summary

We currently have a mix of different error handling patterns. I also experimented with returning a Result-ish type using TS discriminated unions, which is just too inconvenient in TS to be viable. The first conclusion thus is to just stick to throw for the control flow.

I would also rephrase the distinction between recoverable and unrecoverable errors, to better understand what the goals and requirements are.

Lets use the words user facing error in contrast to developer facing error.

A user facing error:

must be translatable, and thus needs to include enough metadata to be able to do so.
must be serializable, for example to be returned by an API to its users.
should not leak any implementation detail of the application.
should provide a link to user input if possible.
should make it possible to group / display multiple errors at once.

In contrast, a developer facing error:

must be debuggable, with enough metadata, such as a stack trace.
need not be translatable, since only developers should ever see it.
likely has no correlation to user input.
IMO, should actually not be translated, to make it easier to communicate across teams and to search for solutions online.

From these requirements, I would first conclude that unexpected / unrecoverable / developer facing errors should use the standard throw new Error() pattern. I am also strongly in favor of just using non-null assertions to cut down on unnecessary boilerplate. When both inputs and logic assumptions are sufficiently validated, these kind of errors should ideally never occur, so adding boilerplate for these kind of errors in unnecessary.

In contrast, for expected / recoverable / user facing errors, I would rather create a custom error type, which might, but need not necessarily extends Error. This type must be serializable, and include enough metadata to make translation possible. It should also include metadata to link back to any user input. Apart from that, it should be possible to return multiple such errors, even though that needs to be done explicitly by developers.

These two different kinds of errors should also be handled separately in catch blocks depending on the specific usecase.

User facing errors should be explicitly checked for and either re-thrown when deep inside of the application or returned to the user explicitly on a per request / per operation basis. These kind of errors will most likely both happen and be handled close to the user. Since translation is also one requirement, this should also happen as close to the IO-layer as possible.

Anything else should definitely be logged at least. Then it is up to the developer how to handle these, and how course-grained the handling should be. Possibilities are to just retry the operation, or maybe to ignore it completely. But also these errors must be caught on a per request basis and converted to an opaque user facing error.

# Conclusion

Well, error handling is a really big and controversial topic. Most of the hard decisions really depend on the specific application usecase.

What makes me kind of sad is that most solutions fail my most important requirement of user facing errors. They make translating errors really hard. Especially for libraries that are focused on validating user input, this is a must have requirement!