Only watched the video, was disappointed by https://youtu.be/gCKJ29RaggU?t=408 , where they are comparing against tiny b*tree page sizes that nothing uses any more - 4k, 16k and 64k are way more common
Hi @jabberwcky! The plot refers to a B+tree implementation optimised for main-memory (https://panthema.net/2007/stx-btree/). We didn't show the performance for larger/smaller page sizes because in our machine they performed poorly. Indeed, you can see from the figure that the fastest B+tree configuration had page size set to 512 bytes. The one using 1024-byte pages is already much slower, that's why we didn't clutter the plot with page sizes larger than 1k ;)
This sounds like a great advancement, however an implementation in RDBMS products may be some way away yet - MSSQL uses 8KB pages by default, and I believe (without checking) that most other RDBMSes use at least 4KB. B+ tree index implementations on RDBMS products may be here to stay for a while yet unless these performance issues can be minimised, or unless there is a paradigm shift to use smaller pages - which would have a knock-on effect to query performance unrelated to indexes, such as number of page lookups, increased I/O for non-contiguous page reads...
I think an implementation in RDBMS is worth exploring. After all, the page size of the PGM-index can be tuned to match the one of the media you are storing your data to (or to match the default setting of the RDBMS).
About the paradigm shift to use smaller pages, it's worth mentioning technological innovations like the Intel's Optane DC Persistent Memory, in which the access granularity is 256 bytes. I expect that similar products will be available in the near future and that new/updated RDBMS implementations will take advantage of them.
I assumed that a bigger page size would incur a worse query performance. You can already see the trend in the figure. So the index size comparison is based on the b+-tree which has a similar query performance with the proposed learned index.