Hacker News new | ask | show | jobs
by jsherer 3695 days ago
This looks like an interesting finding. Unfortunately, the trade-off for this type of efficient small data storage is real:

> Note that LSM-trie uses hash functions to organize its data and accordingly does not support range search.

Range search, while not directly applicable to all data sets, is an important feature of the LSM data stores compared (LevelDB & RocksDB).

The authors acknowledge this and say:

> There are techniques available to support the command by maintaining an index above these hash-based stores

So, don't plan on using an LSM-Trie for a direct replacement for your LevelDB or other LSM-Tree based projects that rely on Range searches without considering the additional complexity of building and maintaining an index to perform those Range searches.

2 comments

For folks curious how you might build up a range index atop a hash store, here's a general scheme: https://www.eecs.berkeley.edu/~sylvia/papers/pht.pdf
For the use-case that the paper presents, I don't think that range search is that important. Most use-cases for LSM-trees so far have been keeping in-memory metadata costs down (i.e. replacing storage engines such as Bitcask), and converting random writes into sequential writes so as to make the most of disk throughput and not wear out SSDs. LSM-trees have had shortcomings in the past with write amplification, and I think this paper does well in addressing that (besides pushing the envelope in terms of reducing in-memory metadata costs).