Hacker News new | ask | show | jobs
by drdec 1398 days ago
I was hoping to read about how they handled cleaning up cursors, what the effect on the server was when having a bunch of open, long-running cursors, etc. Unfortunately the article only treated the subject at a superficial level.

So, anyone here implement pagination via cursors? What do you find to be the drawbacks and how do you mitigate them?

3 comments

You probably assume they are talking about database cursors. The cursor is just a record ID in this article. There is no long term storage of database cursors on the server. Assuming you can sort all your data, next query just returns all records after the given record ID, plus the record with that ID.

One corner case would be if the cursor record is deleted. I don't see it mentioned how they handle it.

Does that mean you have to scan the entire results set to get the right page? So if I am on page 100, I have to query pages 1-99 and discard them?

Or is there a trick here I’m missing?

I think the point is that clients are not even given the option to think in terms of "page numbers".

What's the use case for needing the 100th page of a query result, that also doesn't allow you to cache the 100th page locally to retrieve it later?

There are no pages anymore. You fetch a record by ID and next N records.
Right, but in your database query doesn’t that require you to get a monster result set and filter it down in your application layer?

There’s no SQL clause for “WHERE IT WOULD COME AFTER id=ah73d IN THE RESULT SET” as far as I’m aware.

Yah, another downside of cursor-based pagination is: what happens when the record the cursor refers to is deleted?

Do you just crash and ask the user to start over?

Do you have to nudge open cursors on every delete?

My implementation of cursors works by encoding the primary ID of the last row on the page, along with additional information corresponding to the sort order if that's needed.

That way it doesn't matter if the record is deleted - I can still return the next page by showing records that come after that provided cursor value.

There's an example on this page: https://latest.datasette.io/fixtures/sortable?_sort=sortable

Since the table is sorted by the "sortable" column, the next page link includes this:

    ?_next=15%2Cg%2Cz
15 is the last value for "sortable" on the page, then g,z are the compound primary key for that last row.
Interesting… what happens if new records are added with that 15 value? Do you need an implied secondary sort with the record created time?

Also what if there are more than $page_size records with that 15 value?

Yes, if you want to support new records being added it's up to you to include something like a created date as part of your specified sort order.

More than page_size records with that value works fine - that's why the primary key is included as a tie-breaker.

Doesn’t that take you back to the original problem though: if you delete the primary key the query fails?

Would the created_at be a better option than the primary key? That can’t be deleted or changed, and that would give you a stable secondary sort?

Instead of the cursor being an ID, it could directly encode whatever column(s) you are sorting by. Then you don’t have to locate any record in particular, you can always return records that sort after the cursor.
It’s not a cursor as in a relational database cursor. It’s a cursor, as in an ID of an item in a stably ordered set. There’s no long-running anything to be worried about.