Hacker News new | ask | show | jobs
by feike 1397 days ago
Reminds me of Markus Winand who hands out stickers on database conferences banning offset.

His site is a great resource for anyone wanting to take a deeper dive on SQL performance:

https://use-the-index-luke.com/sql/partial-results/fetch-nex...

4 comments

So basically do it like Reddit?

https://old.reddit.com/?count=25&after=t3_wtpvdp

I noticed Reddit's pagination has that "after" parameter, which points to the last post on the current page.

It glitches out if the last item is deleted by moderators, but otherwise it works smoothly.

On Reddit I frequently see the "next" page having the same posts as the previous page. Not all the same but many of the same. Like, maybe after is being respected but the sorting is different or something.
You took some time to read the page and while reading it, the homepage changed and some new posts got added to the top. Therefore some posts get moved to the 2nd page.
I see that on Hacker News a decent amount as well when going through the top stories across multiple pages. My assumption has always been that the order changes between when I load the page and when I move to the next one (which sometimes is not for another several minutes).
The problem with Reddit is that the sorting isn't by time, so the content order changes.

The only solution I can think of for that is to track which individual posts have been shown to which users, which is quite a lot of work to do.

Do you know if this is specific to MySQL or does it also apply to other RDBMS like PostgreSQL?
Can't he just use rowversion?

No need for row value syntax and it works with MS SQL Server

how to jump to an arbitrary page?
You can do that with postgres histograms https://www.citusdata.com/blog/2016/03/30/five-ways-to-pagin... - go to the section "Keyset with Estimated Bookmarks"

> As we saw, plain keyset pagination offers no facility to jump a certain percentage into the results except through client guesswork. However the PostgreSQL statistics collector maintains per-column histograms of value distribution. We can use these estimates in conjunction with limits and small offsets to get fast random-access pagination through a hybrid approach.

Jumping to a specific page is a bit of an ambiguous / undefined term in this case. Like asking for a specific page in a book that's still being written. Maybe today the plot twist occurred on page 100, but then the author decides chapter 1 needs more backstory, and now the plot twist happens on page 115.

Unless you can guarantee your data is static, or that the sorting order cannot be mutated and only append later values, the concept of what data belongs in which page could be changing every millisecond.

Do you really need to jump to an arbitrary page and land on the exact item? For many applications an approximate jump is fine. If your column is fairly uniformly distributed you can guess the index for any arbitrary page.
Yes, my business users will feel like they don't have sufficient access to their data if they can't.

> If your column is fairly uniformly distributed you can guess the index for any arbitrary page.

I don't think that'll work in a multi-tenancy situation with complex filters.

I bet your users does not always know what is best for them.
Some people thoroughly enjoy a linear saccade search! See for example any social media app.

It definitely isn’t in the users’ best interest to have any method of scrolling through a lot of records.

You don't, but instead you can jump to an arbitrary place in the results. For example, you could show results starting from the letter P, or show results starting from 2022-04-02.
Spoiler: you can’t.
But the entire concept is that this is an adaptation to the fact that data may be added to or removed from the database. If that's true, there would be no benefit in jumping to a specific page - there's no guarantee that that page will display any particular data.