Cool, looking forward. Please put Judy in as well :)
Seriously, the 17 year old judy is still pretty good, despite it's lack of use of cool vector load / compare / etc instructions for quickly traversing tree nodes that are too small for a full radix search (ART does "simultaneous search", i.e. compares to all stored keys in a single instruction, while judy afaik runs a linear search).
It would be pretty cool if someone vectorized that in judy, and replaced null-terminated strings by a binary-safe representation. Unfortunately, all implementations are old and very hard to read.
My super high-level partial understanding of Judy is the following:
Start with ART, which is pretty simple. Then do the obvious improvements: First, we want fast access to "element number N in sorted order", i.e. you also store number of descendants. Next, you do key compression: Storing the portion of the key that can be reconstructed from the tree traversal is silly. Next: an 8-bit tag for signaling one out of 4 node types? Are you, like, filthy rich? More node types it is. Next: Spending 64 bit for a node pointer? Are you crazy? Put the type-tag into the parents pointer (earlier resolution of the branch!), and use a custom allocator to get smaller pointers (you allocate big segments and your node pointers are offsets).
You end up with an unintelligible monstrosity. Give it a cute name, forget about SIMD because it is 2001, and you end up with something like Judy.
Seriously, the 17 year old judy is still pretty good, despite it's lack of use of cool vector load / compare / etc instructions for quickly traversing tree nodes that are too small for a full radix search (ART does "simultaneous search", i.e. compares to all stored keys in a single instruction, while judy afaik runs a linear search).
It would be pretty cool if someone vectorized that in judy, and replaced null-terminated strings by a binary-safe representation. Unfortunately, all implementations are old and very hard to read.