Hacker News new | ask | show | jobs
by HenryR 2801 days ago
Here's a recent comparison of Masstree to ART: https://twitter.com/andy_pavlo/status/986647389820747776?s=2...

ART looks to be better in most cases. It's on my list of K-V stores to review: https://www.the-paper-trail.org/page/reading-list/

3 comments

Cool, looking forward. Please put Judy in as well :)

Seriously, the 17 year old judy is still pretty good, despite it's lack of use of cool vector load / compare / etc instructions for quickly traversing tree nodes that are too small for a full radix search (ART does "simultaneous search", i.e. compares to all stored keys in a single instruction, while judy afaik runs a linear search).

It would be pretty cool if someone vectorized that in judy, and replaced null-terminated strings by a binary-safe representation. Unfortunately, all implementations are old and very hard to read.

I could either figure out how Judy works, or review another three papers :)
:)

My super high-level partial understanding of Judy is the following:

Start with ART, which is pretty simple. Then do the obvious improvements: First, we want fast access to "element number N in sorted order", i.e. you also store number of descendants. Next, you do key compression: Storing the portion of the key that can be reconstructed from the tree traversal is silly. Next: an 8-bit tag for signaling one out of 4 node types? Are you, like, filthy rich? More node types it is. Next: Spending 64 bit for a node pointer? Are you crazy? Put the type-tag into the parents pointer (earlier resolution of the branch!), and use a custom allocator to get smaller pointers (you allocate big segments and your node pointers are offsets).

You end up with an unintelligible monstrosity. Give it a cute name, forget about SIMD because it is 2001, and you end up with something like Judy.

For me the requirement is: plain C interface, usable under MIT/BSD or less-restrictive license

Something gets bonus points for not needing other libraries or static data, so I could put it in a kernel, but usually I don't really need this.

For example, Judy is LGPL. Darn. That alone is trouble, but Judy makes it more questionable due to non-trivial code in *.h files.

Any good options?

tangential , Andy's DB lectures are rocking my world

https://www.youtube.com/channel/UCHnBsf2rH-K7pn09rb3qvkA