Hacker News new | ask | show | jobs
by thesz 1965 days ago
Their slides https://pgm.di.unipi.it/slides-pgm-index-vldb.pdf about PGM index, page 21.

They stop at page size of 1024 bytes - that indicates they are tested in-memory situation. And, which is worse, their compression ratio advantage almost halves when block size is doubled. Thus, what about B-tree with blocks of 16K or even 256K?

Also, what about log-structured merge trees where bigger levels can use bigger pages and, which is quite important, these bigger levels can be constructed using (partial) data scan. These bigger levels can (and should) be immutable, which enables simple byte slicing of keys and RLE compression.

So, where's a comparison with more or less contemporary data structures and algorithms? Why beat half a century old data structure using settings of said data structure that favors your approach?

My former colleague once said "give your baseline some love and it will surprise you". I see no love for B-trees in the PGM work.

1 comments

Hi @thesz!

The experiment you are referring to is done in main memory with an optimised in-memory B+tree implementation. We didn't plot the performance for larger page sizes because in our machine they performed poorly, as you can already see from the configuration with 1024-byte pages. So we're not favouring our approach at all.

Note also that next-gen memories have smaller and smaller access granularities. For example, the Intel's Optane DC Persistent Memory accesses blocks of 256 bytes, while the Intel's Optane DC SSD accesses blocks of 4 KB. I guess that data structures with blocks of 16K-256K are disproportionate in these cases.

About LSM-trees, nothing prevents you to use a PGM-index (which you can construct during the compaction of levels, thus without scanning data twice) to speed up the search on a long immutable level. Or also, to use a PGM-index on data which is organised into RLE-compressed disk pages ;)

These blocks of 256 bytes most probably are stored in wear-leveling database of some sort hidden inside NVME. These databases are often LSM-tree-based. So, writing larger blocks still has benefits, especially when you use compression.

If you think you only need 256 byte pages, average price for 10G hard disk drive is ~$300 [1] and average price for 2G SSD drive is also ~$300 [2].

[1] https://pcpartpicker.com/trends/internal-hard-drive/?__cf_ch...

[2] https://pcpartpicker.com/trends/internal-hard-drive/?__cf_ch...

Five times price/Gb difference. If you need large storage, you need hard disks.

B-trees are good with hard disks, is PGM index good with them too?

From a Big-Oh point of view, the answer is a big yes. No matter the memory technology or the disk page size, be it 256B or 16KB, the PGM-index can scale as B-trees or even better (see my comment here https://news.ycombinator.com/item?id=25901889).
Can you provide us with (preferably drop-in) replacement of LMDB as a proof?

Because your big-O looks like big-O of cache-oblivious algorithm and I saw no proof of that.

Are you aware of any contemporary on-disk or in memory DB system that typically uses a page size of 4KiB or less? And if not, are you expecting that to become a trend in the future?