Hacker News new | ask | show | jobs
by ajross 573 days ago
Skiplist discussions that don't talk about heap effects are probably incomplete. The big limitation with skiplists for anything that isn't a "mostly-write-once" workload (something that, it should be noted, doesn't get much benefit from the atomic implementation discussed) is that the metadata record is variable sized, which does bad things to heap fragmentation and cache locality (two things you probably care deeply about if you're going so far as to junk your AVL tree for a lockless data structure) and isn't amenable to the slab-style optimizations used elsewhere.

I love skiplists for their simplicity, especially compared with the balanced trees against which they compete. But the truth is they're very one-trick-pony data structures and don't bring lot to the table as a general choice. Their home is pretty limited.

2 comments

Skiplists can be very effective in disk structures for few of reasons.

For one, a sorted sequence of keys can be written to file in a single pass, and it's easy to interleave with the data itself. For example, Lucene uses multi-level skip lists in compressed posting list files in order to quickly jump to a term and then jump to the lowest matching document ID for that term. Since these files are immutable, the tree is only built once and all the data can be stored in sorted order, which has the added benefit that these files can also be merged in a single pass.

Append-only is a very convenient corner case for skiplists to optimize. I call it "skiplogs" because it looks really different from the general case. An average node can be 3 bytes, for example.
Doesn't the author mention using arenas?
Arenas are just heaps, and subject to all the same problems. It's true that in the LSM use case in question (and I need to point out that databases are not my area!) there is a clear "flush" operation that will put a limit on how much heap churn and fragmentation you need to tolerate. And maybe that's an answer that makes skiplists more desirable in this context. But things like long-lived app data and language runtimes don't have that, and skiplists turn into headaches.

A simpler consequence of the same problem: I mostly work in embedded stuff and skiplists are pretty much a non-starter as they can't easily be made intrusive without a bunch of waste.

Different sizes would be allocated in different arenas.
It's my understanding that in the context of append-only skip lists, like what RocksDB uses for memtables, you can see wasted space at the end of a block within an arena but no fragmentation between records within a block. SSTables are another story.