Hacker News new | ask | show | jobs
by alexchamberlain 1896 days ago
A skip list would allow a user to skip to the Nth page, but I don't know of any database that offers one in the form of an index.
1 comments

We can technically offer the users Nth page with cursors+offset hybrid, if we're fine with the computational demands (smaller than pure offset, but bigger than just prev|next|last).

Let's say we want to offer the user 5 pages, 100 records per page:

prev (7) | next (9) | 10 | 11 | 12 | 13 | 14 | ... | last

If you click "13", you'd need to query with cursor for page 9 (i.e. id > last rec of page 8) , and offset 400 limit 100.

This way offset never get gigantic, because if you show up to 5 pages to the user, you need to offset up to 5 * 100 records at a given time (combined with the cursor).

You can also just query 5*100 IDs (keysets actually) and insert them into temporary table, all in a single SQL query, retrieve 100 full records in 2nd query, and get cursors for next/prev pages in third query (I'm not sure about this part, would need some windowing query I guess). It may seem wasteful, but in my experience it's fast (though I don't really have "Big Data" to test on). I do something similar to check if there's next/prev page. Temporary tables in SQLite are session local, deleted on close.
It may be fast on its own, but I doubt it's faster than cursor+offset. Your idea is to materialize a slice starting from the cursor and offsetting within it. This means we still do offset, but without the cursor.

The thing is, the cursor is a lookup on an indexed sorted index, it's O(log N), it's effectively free already. And we add the cost of materializing the slice (if even just in memory), and introducing the potential of DoS-ing our server with this temporary state if we're not careful.

I don't understand your point. I don't do any offset. I just pick 500 keys by cursor.

insert into temp select id, date from tbl where id >= c_id and date >= c_date order by date desc, id desc limit 500

I don't see doing any offset over the whole dataset.

> And we add the cost of materializing the slice (if even just in memory)

The alternative is to transfer 400 full rows of data that will be used just to compute pages.

> and introducing the potential of DoS-ing our server with this temporary state if we're not careful.

I don't see any DoS vector specific to this but I'd be thrilled to learn of one.

> I don't understand your point. I don't do any offset. I just pick 500 keys by cursor

So you pick 500 keys and put them into a temp table. Then what? You still have to show a page that's 100 of those 500 keys.

> The alternative is to transfer 400 full rows of data that will be used just to compute pages.

The alternative is the hybrid cursor+offset approach, which doesn't transfer 400 full rows of anything.

> I don't see any DoS vector specific to this but I'd be thrilled to learn of one.

I generate enough sessions to fill your RAM with temp tables.

> Then what? You still have to show a page that's 100 of those 500 keys.

I don't need to use offset. I can use limit. And also, offset in 500 rows dataset never was a problem.

> I generate enough sessions to fill your RAM with temp tables.

o_O These tables contain only keys. In your approach you're going to have 500 keys in memory as well. Moreover, I'm not wasting resources by transferring them in my application. I'm in fact saving resources. You're making things up.

EDIT: Are you arguing about performance impact of scanning 500 rows vs 100 rows (though I don't understand your approach either, you're skipping - scanning - 400 rows as well)? The problem of paging is scanning whole dataset.