Hacker News new | ask | show | jobs
by gvinciguerra 1977 days ago
Hi @thesz!

The experiment you are referring to is done in main memory with an optimised in-memory B+tree implementation. We didn't plot the performance for larger page sizes because in our machine they performed poorly, as you can already see from the configuration with 1024-byte pages. So we're not favouring our approach at all.

Note also that next-gen memories have smaller and smaller access granularities. For example, the Intel's Optane DC Persistent Memory accesses blocks of 256 bytes, while the Intel's Optane DC SSD accesses blocks of 4 KB. I guess that data structures with blocks of 16K-256K are disproportionate in these cases.

About LSM-trees, nothing prevents you to use a PGM-index (which you can construct during the compaction of levels, thus without scanning data twice) to speed up the search on a long immutable level. Or also, to use a PGM-index on data which is organised into RLE-compressed disk pages ;)

2 comments

These blocks of 256 bytes most probably are stored in wear-leveling database of some sort hidden inside NVME. These databases are often LSM-tree-based. So, writing larger blocks still has benefits, especially when you use compression.

If you think you only need 256 byte pages, average price for 10G hard disk drive is ~$300 [1] and average price for 2G SSD drive is also ~$300 [2].

[1] https://pcpartpicker.com/trends/internal-hard-drive/?__cf_ch...

[2] https://pcpartpicker.com/trends/internal-hard-drive/?__cf_ch...

Five times price/Gb difference. If you need large storage, you need hard disks.

B-trees are good with hard disks, is PGM index good with them too?

From a Big-Oh point of view, the answer is a big yes. No matter the memory technology or the disk page size, be it 256B or 16KB, the PGM-index can scale as B-trees or even better (see my comment here https://news.ycombinator.com/item?id=25901889).
Can you provide us with (preferably drop-in) replacement of LMDB as a proof?

Because your big-O looks like big-O of cache-oblivious algorithm and I saw no proof of that.

Are you aware of any contemporary on-disk or in memory DB system that typically uses a page size of 4KiB or less? And if not, are you expecting that to become a trend in the future?