| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ppeetteerr 1998 days ago
	Pagination of an immutable collection is one thing and can be parallelized. Pagination of a mutable collection (e.g. a database table), on the other hand, is risky since two requests might return intersecting data if new data was added between the requests being executed. True result sets require relative page tokens and a synchronization mechanism if the software demands it.

1 comments

simonw 1998 days ago

Intersecting data is fine provided there's a unique ID for each result that can be used to de-duplicate them.

Ideally I'd want a system that guarantees at-least-once delivery of every item. I can handle duplicates just fine, what I want to avoid is an item being missed out entirely due to the way I break up the data.

link

ppeetteerr 1998 days ago

It's more than just de-duplicating, tho. Imagine you query a dataset and get something like a page count and a chunk size. That page count cannot be trusted if the dataset is mutable. If an item is inserted at the beginning of the set, you're going to miss the last item.

Pagination is hard

link

the_arun 1998 days ago

For dynamic usecase, DynamoDB has implemented pagination with something called lastEvaluatedKey - https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

This is different from LIMIT in RDBMS

Wouldn’t this pattern solve the complexity you are talking about?

link

ppeetteerr 1997 days ago

That's one way, for sure. You can do this with IDs, dates, etc.

link