|
|
|
|
|
by jandrewrogers
1794 days ago
|
|
There is, but it isn’t what I would call intuitive. Traditional tree-based indexing algorithms embed a diverse set of assumptions about the nature of the hardware that aren’t actually true in modern systems. SSDs are also quasi-sequential devices in important ways that need to be respected for performance reasons. The design problem is that storage density and bandwidth has increased massively and traditional indexing algorithms are poorly suited for that environment. The relative memory wastefulness of many indexing algorithms also means that you can’t load the indexes into memory on machines with high storage density, they literally won’t fit. This leads to a page fault cascade in practice that causes a very rapid degradation of performance beyond a relatively ordinary scale. Modern indexing algorithms (not the ones in the article) give up a small amount of selectivity and balanced trees in exchange for reducing the memory footprint of indexes by multiple orders of magnitude and greatly increasing write scalability. Given the storage bandwidth available in modern systems, that is a big win in terms of throughput. Balanced trees are an anachronism generally in current design; it doesn’t matter how unbalanced an index tree is if it fits in CPU cache while searching countless terabytes of data. 20 years ago, I was using all of the indexing structures in the article for high-performance systems. They started going out of style about 10 years ago and I honestly can’t remember the last time I worked on a new system that implemented them. That transition happened slowly and then all at once. |
|