Hacker News new | ask | show | jobs
Building APIs: A Comparison Between Cursor and Offset Pagination?
37 points by kasrakhosravi2 1896 days ago
I have been mostly relying on offsets for implementing paging. It is more straightforward in implementation but I find cursor based pagination more intuitive in usage; plus it has better support for real-time data but on the negative side you are forced to use infinite scroll which is not always the best solution.

I was curious if anybody has any experience with these two approaches in their projects? And what are some other advantages and disadvantages on this? [1]

[1] I wrote this piece a while back but was thinking of updating it further: https://betterprogramming.pub/building-apis-a-comparison-between-cursor-and-offset-pagination-88261e3885f8

8 comments

> you are forced to use infinite scroll which is not always the best solution.

You're not forced to. You could instead Show "8964 Results Total: Previous 100 | Next 100 ... Last 100" which in practice is the same as pagination.

Sure the user can't click in the MIDDLE of the page set, but what kind of a use case is that, anyway.

Anyway, cursor offsets are fundamentally O(1), and offsets are fundamentally O(N) where N is the offset. As a younger programmer, I used to wonder what is this algorithmic magic the database uses, so it knows how to compute offset 10000 without sorting and looking at the first 10000 values I sort by. Well? No magic, it just hustles through it and delivers. But slowly.

It can be useful for time ordered content where you can binary search for the appropriate time. Of course it would be nice if sites let you directly go to a time period but that seems to be nonexistent.
Seems like a "before" and "after" fields would serve users better, but I do understand the use case.
A skip list would allow a user to skip to the Nth page, but I don't know of any database that offers one in the form of an index.
We can technically offer the users Nth page with cursors+offset hybrid, if we're fine with the computational demands (smaller than pure offset, but bigger than just prev|next|last).

Let's say we want to offer the user 5 pages, 100 records per page:

prev (7) | next (9) | 10 | 11 | 12 | 13 | 14 | ... | last

If you click "13", you'd need to query with cursor for page 9 (i.e. id > last rec of page 8) , and offset 400 limit 100.

This way offset never get gigantic, because if you show up to 5 pages to the user, you need to offset up to 5 * 100 records at a given time (combined with the cursor).

You can also just query 5*100 IDs (keysets actually) and insert them into temporary table, all in a single SQL query, retrieve 100 full records in 2nd query, and get cursors for next/prev pages in third query (I'm not sure about this part, would need some windowing query I guess). It may seem wasteful, but in my experience it's fast (though I don't really have "Big Data" to test on). I do something similar to check if there's next/prev page. Temporary tables in SQLite are session local, deleted on close.
It may be fast on its own, but I doubt it's faster than cursor+offset. Your idea is to materialize a slice starting from the cursor and offsetting within it. This means we still do offset, but without the cursor.

The thing is, the cursor is a lookup on an indexed sorted index, it's O(log N), it's effectively free already. And we add the cost of materializing the slice (if even just in memory), and introducing the potential of DoS-ing our server with this temporary state if we're not careful.

I don't understand your point. I don't do any offset. I just pick 500 keys by cursor.

insert into temp select id, date from tbl where id >= c_id and date >= c_date order by date desc, id desc limit 500

I don't see doing any offset over the whole dataset.

> And we add the cost of materializing the slice (if even just in memory)

The alternative is to transfer 400 full rows of data that will be used just to compute pages.

> and introducing the potential of DoS-ing our server with this temporary state if we're not careful.

I don't see any DoS vector specific to this but I'd be thrilled to learn of one.

Cursor based pagination is much better from a performance point of view against large amounts of data.

It's also sometimes called keyset pagination - there's a good article about that here: https://use-the-index-luke.com/no-offset

I implemented it in https://datasette.io - it's quite tricky to build if you want to support arbitrary sort orders.

I built a utility for Graphene + Django to do it properly with arbitrary sorts, though one does need to be careful about exposing sorts orders which aren’t backed by any indexes.
I think viewing the problem of pagination as a choice between offset and keyset is a limiting mindset.

Two things that come to mind:

1. Depending on the technology there may be alternatives: https://www.citusdata.com/blog/2016/03/30/five-ways-to-pagin...

2. There may be ways to provide pagination through filtering instead of offsetting

Probably most common ordering on big resultsets are either by time (eg. emails) or alphabetic (eg. a dictionary).

When searching in a dictionary you do a binary search followed by page navigation when close: open it approximatively where you think the target word might be, then jump in the appropriate direction a number of times then go page by page until the target is found. If you have tabs to jump to the appropriate starting letter or group of starting letters that can be implemented as filtering but the UI can look like pagination.When going through pages of emails, to the user, jumping through pages or jumping through days may be the same thing.

This of course means pages may no longer be the same length.

Infinite scrolling is a poor UI for multiple reasons and it is an antagonistic UI (you don't want users to browse long datasets so you make it HARD). A determined user will just generate more requests. Instead of making it hard to explore a dataset, make it easy. Make it intuitive to filter and make it easy to reverse the ordering (costs nothing for the DB).

If you only provide one ordering and infinite scrolling, the only effect is frustration, broken scroll wheels, and plenty of requests. An example of this is FB Messenger. You can only scroll or search by words. So if I don't remember keywords, just the approximate time and I have confidence I can reach the chat then the only way is to scroll and scroll. Compare Telegram which lets you filter by date.

Another positive example is any Discourse based forum which provides a timeline for scrolling. Jumping in the middle of the time interval is not the same as jumping in the middle of the resultset, but it serves the same purpose and is usually closer to what the user actually wants.
I've used cursor pagination on daemons doing large batch processing. It worked fine.

If you're using a cursor, you should be iterating the entire (or at least most of) the result set over the course of the connection. I wouldn't think they're appropriate to API's, because maintaining the cursor state between requests sounds painful.

Offset is the only way to paginate arbitrarily sorted data sets, unless you plan on going hard on indexing. The performance falls off as the offset grows. You'll also have issues with apparent 'duplicates' between pages.

Keyset pagination is preferred if the data is usually sorted in one way. Most infinite scrolling falls into this category. Each row has a unique 'counter', which is (usually) monotonic. To skip pages, filter rows whose counter is later than the last row in the current page. This is fast and has no duplicates. However, your sorting options are constrained, as you must have an index for each sort option.

Keyset and offset can be combined to eliminate the duplication defect of offset alone. The performance issue will still be there. This becomes an engineering decision - in most use cases, the large offset perf hits don't really happen (aside from abuse, which almost always happens).

Important to call out here, but thanks to GraphQL’s unfortunate habit of using words which already have established meanings, when a lot of people talk about “cursor-based pagination”, they actually mean keyset pagination and not something that requires a server to maintain a stateful cursor.

I’m guessing this is the meaning the OP was referring to.

I think it's common to call it cursor regardless of where it's maintained. I agree that it may confuse some people since there are cursors in SQL APIs. But the underlying concept is the same. Keyset does not convey the specific meaning.
When I’ve worked with APIs that implemented pagination using cursors (Square), I really loved the ease of integration with my app. Writing a while loop that checks if the cursor is populated was better/faster than checking if the number of results from the last request was less than the requested number of rows.
I actually built some of those Square APIs! The reason we used cursors was correctness. If new records get added (say, in your transaction history) and you're consuming records newest first, limit+offset pagination will cause the client to miss or double-count some records. That might be OK when a user is scrolling through a listing, but it's a real problem when you're writing an app to compute your sales tax obligation.
Offset pagination is hard on the backends. You can distribute some of the sorting load (see: https://engineering.medallia.com/blog/posts/sorting-and-pagi... ), but it's tricky, and less efficient in any case.

Cursor based pagination is way better, since the backend aggregation bucket size is capped by the page size times the number of nodes used for a distributed sort (essentially you aggregate on page sized priority queues, one per node, and then aggregate them).

How does keyset based pagination deal with deletes? If the cursor/current key is deleted you lose your paging state. Or am I wrong?
AFAU you use keyset as upper/lower bounds in your SQL query, so deletion is not problem. Example:

Cursor ($ID:$DATE) = "123:2012-01-01"

Query: select * from tbl where id >= $ID and date >= $DATE limit 100

Interessant-Interesting