|
|
|
|
|
by thesz
254 days ago
|
|
We are talking about cache friendliness here, including compact data representation. B-trees algorithm requires leaf nodes to be filled up to some prespecified fill factor, usually from 50% to 70%. This is not compact by any measures. Both LSM trees and COLA allow for much more compact storage. They also pretty cache friendly in the "cache oblivious" sense. |
|
B-tree fill factors are a parameter that can be tuned to avoid extra splits depending on your insertion patterns, an in-memory variant doesn't need to be tuned like a disk-based variation.
Also nothing preventing the "bottom" of an LSM variation to be a dense B-tree like structure that just gets less updates.
The COLA paper also never mentions threading, an in-memory B-tree should scale better with size if there is multi-threading while I don't see an easy way to avoid big locks with COLA, maybe why the COLA paper was from 2000 and we haven't seen much additional development on it? (LSM-tree's work beautifully of course but then you basically have the double-space issue like with semi-space garbage collectors).
In the end, what you choose is dependent on your application and patterns, COLA is a clever structure that probably shines in some scenarios.
Like my scenario was a mostly heavy on consecutive insertions as well as lookups of potentially arbitrary compound keys, perfect for cola like since those are cheap.