Hacker News new | ask | show | jobs
by halayli 4768 days ago
I am surprised they are using levelDB for a key/value store when they could have used a DBM variant like Kyoto Cabinet HashDB, which is way faster for such a task. Why do you need sorted keys for key/value storage?
2 comments

Sorted keys are extremely useful for time series data. For example, if a key has a timestamp in it and you'd like to do an aggregation over a few days of "keys"...it's very simple to do an iteration from a STARTKEY to a STOPKEY (the keys in-between don't have to be defined...which is very important). Not only is it simple to do the iteration, it's very fast. This kind of use case (very common) is difficult without sorted keys and iteration capabilities.
Agreed - but they are doing a full scan when doing searches so I don't see the benefit.
Searching is not the same thing as a range iteration, normally if you want to search a Key/Value than you'll need to scan the entire dataset.

A range iteration allows you to scan a subset of the data, and as long as they didn't fundamentally change how LevelDB works, than they will still support ranges.

> Searching is not the same thing as a range iteration, normally if you want to search a Key/Value than you'll need to scan the entire dataset.

In this case, you lost the sorted keys advantage that BTree gives you.

My point is that this project leans more towards exact key -> value lookups and a BTree is an overkill here.

I guess it depends on your use case, most of the use cases I've seen have been time series using range iterations which is incredibly fast, but I understand your concern if you're only using it for random gets.
Probably providing the option to create a btree or hash map would be nice.
hash indexes tend to have really terrible write performance because the locations of the writes on disk are random. lsm trees have way better write performance.