| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jandrewrogers 1460 days ago

The property being optimized for, relative to B+trees, is extreme compactness of representation. In the pantheon of possible indexing algorithms, B+trees are pretty far on the bloated end of the spectrum in terms of the ratio between data space and index space. All indexes have a scaling performance cliff due to the index structure filling up and eventually overflowing available cache, crowding out the data and forcing page faults for almost every index write. In B+tree indexes this happens relatively early and often.

Radically improving index compactness is achieved by loosening design constraints on B+trees: the indexes represent a partial order which only converges on a total order at the limit and the search structure is unbalanced. In the abstract these appear slightly less efficient but it enables the use of selectivity-maximizing succinct representations of the key space that can get pretty close to the information theoretic limits. Scalability gains result from the radical reduction in cache footprint when represented this way.

Optimal compressive indexes are not computable (being equivalent to AI), so the efficient approximation strategies people come up with tend to be diverse, colorful, and sometimes impractical. Tangentially, some flavors have excellent write performance. It is not a trivial algorithm problem but there are a few design families that generalize well to real databases engines. I wouldn't describe this as a fully solved problem but many ordinary cases are covered.

There isn't much incentive to design a relational database engine that can use these types of indexes, since the types of workloads and data models that recommend them usually aren't relational. Someone could, there just isn't much incentive. It is more de rigueur for graph, spatiotemporal, and some types of analytical databases, where there is no other practical option if scalability matters at all.