Hacker News new | ask | show | jobs
by ppeetteerr 1998 days ago
Pagination of an immutable collection is one thing and can be parallelized. Pagination of a mutable collection (e.g. a database table), on the other hand, is risky since two requests might return intersecting data if new data was added between the requests being executed.

True result sets require relative page tokens and a synchronization mechanism if the software demands it.

1 comments

Intersecting data is fine provided there's a unique ID for each result that can be used to de-duplicate them.

Ideally I'd want a system that guarantees at-least-once delivery of every item. I can handle duplicates just fine, what I want to avoid is an item being missed out entirely due to the way I break up the data.

It's more than just de-duplicating, tho. Imagine you query a dataset and get something like a page count and a chunk size. That page count cannot be trusted if the dataset is mutable. If an item is inserted at the beginning of the set, you're going to miss the last item.

Pagination is hard

For dynamic usecase, DynamoDB has implemented pagination with something called lastEvaluatedKey - https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

This is different from LIMIT in RDBMS

Wouldn’t this pattern solve the complexity you are talking about?

That's one way, for sure. You can do this with IDs, dates, etc.