Hacker News new | ask | show | jobs
by vinkelhake 4456 days ago
Google has an open source implementation of b-trees in C++. You gain performance and memory efficiency (less overhead per element stored). You lose the iterator stability that std::map/std::set guarantees. Other than that the interfaces are pretty much equal.

https://code.google.com/p/cpp-btree/

2 comments

(top article author). I wasn't aware of that, it looks like they come to similar conclusions "B-trees are widely known as data structures for secondary storage, because they keep disk seeks to a minimum. For an in-memory data structure, the same property yields a performance boost by keeping cache-line misses to a minimum. C++ B-tree containers make better use of the cache by performing multiple key-comparisons per node when searching the tree. Although B-tree algorithms are more complex, compared with the Red-Black tree algorithms, the improvement in cache behavior may account for a significant speedup in accessing large containers."
BTrees are cache-aware, and, in a sense, cache-oblivious.

There are more interesting structures, like fractal tree indices, which are cache-oblivios in the very formal sense.

My own opinion is that Btrees are nice, but not very effective in current world. Structures like LSM trees can be more effective. And you can construct high efficiency search structures over runs of LSM tree which will mimic Btrees.

A sidenote: your blog post is from the future - unless we're already in May? :)
Fixed. I'd like to say it was a subtle implication that it contains wisdom from the future ... but actually it was just me being boneheaded.
Funny how everything Google does is dogma. They move from complicated hash DS to B+trees, a decade later, and everyone follows. Better DS are currently Log-merge for heavy writes and data parallel BSTs for read-most.

At least they are not stubborn for long, I give them that.

I don't know how you got that impression, but inside Google people use hash tables far, far more often than B+trees. In fact, I don't think I ever used B+tree once in my entire career inside Google.

It's pretty hard to beat "expected O(1)".