Hacker News new | ask | show | jobs
by momokoko 2008 days ago
This is trumpeted around and actually put into production every once in a while.

The reason opaque pagination is an antipattern is because you can’t optimistically fetch resources.

So your customer, the person that paying you for your product, needs to wait for some number of synchronous reads.

With non-opaque offsets these can be done in parallel. If the typical request requires 4 pages, these can be done 4 at a time and of it is less than 4 pages those can be discarded.

This is a clever hack that ends up being user hostile in actual practice. Remember APIs are designed for the benefit of the consumer vs the benefit of the maintainers.

5 comments

If the API takes a cursor and a number of items to fetch, the idea of a "page" or how large it is exists entirely on the client. You can fetch 40 results in one query and say it corresponds to the next four "pages", if you're configured to show ten items per page

It seems worth noting a high number of concurrent queries to the same database shard as part of the same overall page load can be very wasteful of CPU as database load increase, due to the cost of context switching. https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-... dives into that.

> So your customer, the person that paying you for your product, needs to wait for some number of synchronous reads.

The customer is not always right.

It's also not clear what the practical issue with sequential requests are? Multiple parallel requests may get caught in a rate limiter, and impose much more work on the backend than a cursor (multiple unnecessarysort/discards). It's not a given that spamming a service gets you all the data any faster than using a cursor.

> With non-opaque offsets these can be done in parallel. If the typical request requires 4 pages, these can be done 4 at a time and of it is less than 4 pages those can be discarded.

If the typical request requires 4 pages, then your page size is suboptimal.

I think supporting an explicit (large) page size addresses your concern. It’s not that you want to make four separate requests, it’s that you want the backend to give you more than a tiny amount of data. If you’re going to induce you the 400 results of loading on the backend, backend implementators may as well give you all the results in one go (assuming the output fits in whatever request/response limits you have).

I definitely agree that just having opaque page tokens without the ability to say “I want up to 100” leads to needless pain for clients and overall system inefficiency.

I think that cursor should be used more with realtime data and more stable data like products in an e-commerce shop should use paginated queries. There's many UX benefits for paginated queries. It gives customer a clear overview of how many products there are and customer can also navigate faster to get a better understanding of what general prices are. If the table/list doesn't have good filters, customer can find what they are looking for faster too. Same with highscores, forums threads and other similar things. Say you are browsing highscores and you want to see what scores are around 10000th position etc.

As a user I find having only prev/next buttons a bit claustrophobic in this case. I think UX should trump whatever performance gains there are from it.

non backend dev here. could someone elaborate what exactly is so opaque about pagination? and what makes offsets less opaque? does this term have meaning I don't know about?

and why does the DB need to wait for some number of synchronous reads?

"offset" can be passed transparently to db to retrieve a range of records while "cursor" is customarily implemented in the API layer. I think that's what "opaque" meant in GP's context.

A starting read is needed to obtain the first cursor, hence the synchronous read.

“opaque” here means we are hiding the real ID of the “cursor” item.
gotcha, thank you!
if you ask for next page of results, with parameters like offset=20&limit=10, then you, as a client, can try to reason and manipulate those parameters. Ask for multiple pages in parallel, ask for offset=18 etc. Make calculations on those parameters. If you only providing a token, like "next_page=abcdef1234" with some encoded structure, you're limiting your client in what it can actually do, but simultaneously simplify backend architecture and make it more forward compatible with future changes in backend (after all, that next_page token can actually be just an offset and limit encoded)
ah i see so some kind of tradeoff between power to the API consumer vs power to the API maintainer. thank you!

i don't see anyone arguing for "next_page=abcdef1234". realistically it's more like "cursor=abcdef1234&limit=10". slightly less opaque. still your point about asking for multiple pages in parallel still stands.

i think i agree with mostly everyone here in that this is a fine tradeoff to make and so would favor cursors over offsets. (unclear how cursors relate to "keysets")