Arpad Borsos Blog Resume

Lets talk about Pagination

— 11 min

Just recently, I have had to tackle some quite challenging problems related to pagination. And during some discussions I noticed that some concepts are not yet clear to some engineers. So I will try to explain all of these.

I will focus on both page-based, as well as cursor-based pagination, and explain how both methods work. I will also focus on some common questions an API consumer might have, and how easy it is to implement, with a special look at SQL. And lastly, I will look at how these pagination methods behave when the data itself changes in certain ways.

Prerequisites

To make any kind of pagination useful at all, we need to have a sorted list of entities with some kind of stable sorting order. We do not want the entities to randomly re-order, unless we explicitly change entities inside of that list.

Lets try to visualize such list, so behold my awesome ascii box drawing skills:

E = Entity, K: Sort Key, W: Pagination Window, C: Cursor
   ┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
E: │ A│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
   ├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
K: │ 2│ 3│ 5│ 7│ 9│10│10│15│20│28│99│
   └──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
          └ W ┘  ↑
                C:9

So here we have 11 Entities, sorted by their Sort Key. And we a view into that list, which I call the Pagination Window. I will also talk about a left-to-right traversal order, so anything left in the drawing can also be called before or front, and likewise after or end for the things on the right. Also, to make things easier, I will assume a Pagination Window of 2 elements. And we assume that we are walking this list from front to back.

We also have some meta information that we want to know about, as stated above.

I will also consider a few mutations to the list, such as:

As we are walking the list front to back, adding/removing new entities to the back of the list are neither observable, nor do we really care about them. Unless its a move.

Page-Based Pagination

Implementing this kind of pagination is very simple, you just split the list into equally-sized slices called pages. Obviously, the pagination window is the page size. Depending on the implementation, you might also chose to allow an arbitrary offset, but for simplicity we will not.

┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
│ A│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
│ 2│ 3│ 5│ 7│ 9│10│10│15│20│28│99│
└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
 └ 1 ┘ └ 2 ┘ └ 3 ┘ └ 4 ┘ └ 5 ┘ └ 6 ┘

Changing Data

One problem with page-based pagination is how it reacts to changes in the data.

Lets look at insertion and deletion before the pagination window.

┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
│ A│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
│ 2│ 3│ 5│ 7│ 9│10│10│15│20│28│99│
└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
             └ W ┘
  ↓ inserted here
┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
│ X│ A│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
│ 1│ 2│ 3│ 5│ 7│ 9│10│10│15│20│28│99│
└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
                   └ W ┘

As you can see, we moved one page right, but the whole list was shifted by one position, which means we see F twice. Not good, not terrible.

┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
│ A│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
│ 2│ 3│ 5│ 7│ 9│10│10│15│20│28│99│
└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
  ↑          └ W ┘
  └ removed this
┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
│ 3│ 5│ 7│ 9│10│10│15│20│28│99│
└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
                   └ W ┘

Again, the whole list was shifted, but in this case we have a more severe problem, since element G was skipped completely.

Implementation

Implementing page-based pagination is really trivial, both in memory, as well as in SQL, as you have the dedicated syntax LIMIT / OFFSET exactly for this use case. However, jumping to a relative position is complex.

Cursor-Based Pagination

With cursor based pagination, you have a Cursor that points to a position on the sort axis. With cursors, you can usually select items before and after the cursor, since we focus on left to right traversal, we will only consider items after the cursor.

Navigating using cursors is possible when in addition to the item itself, we return the cursor of the item as well. In the next step, we set our new cursor to the cursor of the last item, and query again.

One popular example of cursor-based pagination is the relay connections specification which is popular when dealing with graphql.

Unique Cursors

One important restriction cursor based pagination has is that the cursor itself needs to be unique! Why? Lets demonstrate with an example.

┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
│ A│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
│ 2│ 3│ 5│ 7│ 9│10│10│15│20│28│99│
└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
           ↑ └ W ┘
          C:7
┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
│ A│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
│ 2│ 3│ 5│ 7│ 9│10│10│15│20│28│99│
└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
                 ↑  ↑ └ W ┘
                 C:10

Here, first we were at cursor 7, and selected 2 items after it. No problem so far. The last item F has the cursor 10. Lets move one step further, selecting 2 items after F, using its cursor 10. Now we have the problem that we skipped G, because its cursor is duplicated to the one of F.

So depending on your use case, if the property you want to sort by is not unique, such as a typical modification time, you have to combine it with another property that is, such as uuid.

Getting at metadata

Answering some meta questions becomes a lot more complicated with cursor based pagination.

Changing Data

As the cursor denotes a relative position inside the list, it is immune to changes in the data. Even removing the element under the cursor is not a problem.

┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
│ A│ B│ C│ D│ E│ F│ G│ H│ I│ J│ K│
├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
│ 2│ 3│ 5│ 7│ 9│10│12│15│20│28│99│
└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
             └ W ┘
                 ↑ removed this
┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
│ A│ B│ C│ D│ E│ G│ H│ I│ J│ K│
├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
│ 2│ 3│ 5│ 7│ 9│12│15│20│28│99│
└──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
               ↑└ W ┘
              C:10

Since the cursor of F is 10, we can still query everything after 10.

Implementation

Since cursors themselves give a relative position inside the list, implementing this in-memory requires doing a binary search.

Implementation in SQL is also quite simple, at least for a simple non-compound cursor. You just apply a simple WHERE condition. I have found this stack overflow post to be a really good in explaining how to deal with compound cursors.

In general, after implementing cursor based pagination, together with the optimizations explained above, I can say it is a lot of work, with a lot of details you have to pay attention to.

Conclusion and Recommendations

Lets repeat some important things.

First of all, both approaches require you to have a stable sort order. Furthermore, cursor-based pagination additionally needs to have a unique sort key for each entity, that you also have to expose.

So for this reason, good performance requires that you either have the data pre-sorted, or have some kind of index, especially in the case of SQL.

This also means that sorting by some computed property or one that is not indexed can result in bad performance.

Cursor-based pagination makes relative jumps easier, while page-based pagination makes absolute jumps easier.

Both methods make it easy to implement the typical previous / next navigation. But cursor-based navigation makes it more resilient to changing data.

Like always, it is a matter of tradeoffs and use-cases.

And also summarize the problems around changing data:

Considering these limitations, when you are dealing with prepend only data, starting at the front again once you finished traversing should make sure you get all the data.


After implementing both page-based and cursor-based pagination, and now that I have written down all my thoughts about it, I do feel a bit more comfortable with the choice we made for cursor-based pagination. But for one reason only: the data we return depends on the current time, so it can have unpredictable data removal, which can lead to missed items when using page-based pagination as shown above.

Otherwise I would probably go with page-based pagination, for reasons of performance and ease of implementation. Which also reminds me to rant about the fact that cursor-based pagination is really complicated to implement using SQL.

One thing we still struggle with is performance. Not because of the pagination method we chose but rather that our sort order depends on a non-indexed compound value. Meh. So we come full circle to the matter of caching and cache invalidation :-)