|
|
|
|
|
by jltsiren
1974 days ago
|
|
Some context: A few years ago, there was a paper from Google (https://dl.acm.org/doi/10.1145/3183713.3196909) that made learned data structures popular for a while. They started from the idea that indexes such as B-trees approximate an increasing function with one-sided error. By using that perspective and allowing two-sided error, they were able to make the index very small (and consequently quite fast). Many data structure researchers got interested in the idea and developed a number of improvements. The PGM-index is one of those. Its main idea is to use piecewise linear approximations (that can be built in a single quick pass over the data) instead of the machine learning black box the Google paper was using. |
|
You may find interesting these other papers of ours:
- The ALENEX21 paper "A 'learned' approach to quicken and compress rank/select dictionaries" (http://pages.di.unipi.it/vinciguerra/publication/learned-ran..., https://github.com/gvinciguerra/la_vector), where we introduce a compressed bitvector supporting efficient rank and select queries, which is competitive with several well-established implementations of succinct data structures.
- The ICML20 paper "Why are learned indexes so effective?" (http://pages.di.unipi.it/vinciguerra/publication/learned-ind...) where we prove that, under some general assumptions on the input data, the space of the PGM-index is actually O(n/B^2) whp (versus Θ(n/B) of classic B-trees).