Forms of blocking and non-blocking I/O

… and how they relate to languages and services

15 November 2020 — 9 min

My fiance is often challenging me to explain some programmers topic in simple terms for non-programmers like her to understand. We call these stories Programmers Fairytales. I was surprisingly successful explaining to her the differences between blocking, and different forms of non-blocking I/O (input-output) in a way that she understood. Surprisingly, these thoughts have come back recently, thinking about a problem at work that came up and needs brainstorming.

# Programmers Fairytale

So our programmers fairytale starts with us having to endure the overly bureaucratic and annoying process of getting married. We have to frequently go to the Standesamt (apparently it’s civil registration office in english). There are different ways we can talk to them, which is how I explained I/O to her.

# Blocking I/O

Hello there. We would like you to get this work done, we will wait here until you are done.

Blocking I/O is like knocking at someones door and giving them something to work, while waiting patiently at the door until that work is done. Depending on the kind of work, this is obviously a bad idea, because you just keep waiting / blocking until the work is finished, while you can’t do anything meaningful in the meantime yourself.

# Polling-based I/O

Hello there. We have some work for you. We will check back tomorrow.

Tomorrow: Yo, you about done? Tick tock, motherfucker!

When doing non-blocking I/O based on polling, it means to periodically check back and see if the work you wanted to get done is finished already or not. It’s a lot better already, because we can spend the meantime doing other things while we wait. It’s however not ideal, because we still have to walk over to the office and ring the door, or alternatively do a phone-call and wait for it to get answered. The other person might also be interrupted in whatever they were doing at that time.

# Completion-based I/O

Us: Hello there. We have some work for you.

Office: Cool. Leave your number, we will call you when its done.

With this form of I/O, we register a callback, to be notified of the completion of the work later on. In our case, we just leave our phone number and email with the office. Less interruptions from checking back every day, and eventually the work is done. This works fine for software, although not so well when dealing with civil offices.

# Queueing (io_uring)

Us: We have some work for you, we will leave it in your letter box. When you are done, put the reply in our letter box please.

Office: I have some spare time right now, I guess I can check my letter box.

Us: Might as well check my letter box on the way out. Oh, there is the reply.

Now we have introduced something in between, a letterbox, to hold the messages and the work instructions. We don’t interrupt anyone by ringing the doorbell or calling the phone. Whenever they have some spare time, they will just check their letter box. In terms of software, I would call this an ideal solution. Every party can just focus on doing its own thing, no distractions.

There is one complication though. You have this new concept of a letter box to think about. In particular to think about what you want to do in case it fills up with messages that you don’t have time to reply to.

Interestingly enough, this form of I/O was recently introduced in the Linux kernel under the name io_uring. It is based around a submission queue (office letter box) and a completion queue (our letter box). Submitting requests and polling the result works without any syscall / context-switches (ringing the doorbell).

# Spectre for non-programmers

A small digression here. I was even able to kind-of explain spectre to my fiance. Remember when we ring the doorbell of the office. It would be super bad if they open the door and we would see all of the paperwork of other people laying around. So naturally, they will first get that stuff out of the way before opening the door for us. Kind of like how CPU caches have to be wiped before switching contexts, so other processes are not able to do cache-timing attacks.

Anyway, io_uring does not involve syscalls, so it does not need to do any of that. Which means its super fast. And I’m certainly very excited about its potential. Since it is quite young, it’s not yet possible to use it for everything, and also programs and runtimes need to be updated to actually make use of it.

# Languages and Runtimes

In another programmers fairytale, I tried to explain async/await to her, although I think I was not as successful as before. Interestingly, you can mix different forms of async.

Lets start with some work that you want to get done asynchronously. We call this Future in Rust, and Promise in JS. Its a promise that the work you requested may eventually be done in the future (or not).

The old way of using these futures was with callbacks, the completion-based model from above. You leave your number by providing a function / closure. Its just a block of code thats separate from your other code.

The new way of doing things is by using the await keyword. In Rust, it is put behind an expression, in JS it’s in front. What it does is, it looks like blocking, like the we will wait here from above, hence the await.

However, under the hood its implemented completely differently in different languages. Rust futures are based on polling, while JS promises use a completion callback. This has interesting implications, both good and bad.

In Rust, its kind of bad that you have to poll all the time, but on the other hand, if you don’t care about the result anymore, you just stop polling and the work stops as well. In JS, you can easily say I don’t really care, just do it. On the other hand, its a lot harder to actually stop doing something that you already started. Sometimes the office just burns down and you have no idea why.

Coming back to our fairytale, I think the office is more like Rust futures. You constantly have to bug them and say yes, we still want that stuff to get done, or they just won’t move a muscle. It’s sad actually :-(

There are also differences on the operating system level. While io_uring is the new hotness on Linux, previously things were based on polling. Although in a more optimized way, where you have a whole list of things and ask is any of these done yet? As far as I know, Windows is based on callbacks. This is interesting, because a language/runtime has to work in a uniform way across operating systems, although under the hood it does completely different things.

# Tradeoffs

It’s interesting to think a bit more about tradeoffs though. Depending on the work you want to get done, it might be the best idea to just block. When its a matter of latency, it might be the best idea to just make the trip to the office once and wait a little. Having to come back the next day might just delay things unnecessarily. Also, providing phone numbers (callbacks) or installing letter boxes (queues) may increase the complexity, even polling has complexity, as the other person may just ask who were you again?

# Networked Services

All of these concepts map to networked services as well. The difference might be, instead of ringing our next door neighbors, we actually have to walk 10 minutes down the street to the office. The round trip time is a lot longer. But you can still decide if you want to wait right there, or come back tomorrow. Or even both. Wait there for a certain amount of time, but then you get bored and decide to rather check back later.

This is exactly how symbolicator works right now. Or rather, symbolicator is like the office.

Here, take a number and have a seat. I might have your answer right away, otherwise check back tomorrow.

This system is starting to be a problem, and we are looking into ways to improve it. I tend to favor the queueing solution. Coming back to our story, when it comes to networked services, the letter box itself is a separate service, like a post office. You submit your work request to the post office nearest to you, and they will deliver the request to the post office box nearest to the office that will serve that request. In that scenario, symbolicator will just walk to its post office box whenever it has nothing to do, get the job done and bring the result back to the post office. Kind of like the submission queue and completion queue from io_uring. No distractions, although in this case the distractions don’t matter that much. It is more the concept of take a number and check back that bothers me personally. Overall, I think the question is rather, do we want to build and manage a new post office, and have contingency plans in place on what we should be doing when the letter boxes start piling up and eventually overflow? Also, should we introduce another service, someone who is responsible for deciding and delivering the messages to the right letter box?

Well thats it for today. I think the main takeaway here is this:

Public offices are like Rust futures. You have to poll them constantly or they won’t be doing any work. sadface