| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by halayli 4768 days ago
	I am surprised they are using levelDB for a key/value store when they could have used a DBM variant like Kyoto Cabinet HashDB, which is way faster for such a task. Why do you need sorted keys for key/value storage?

2 comments

monstrado 4768 days ago

Sorted keys are extremely useful for time series data. For example, if a key has a timestamp in it and you'd like to do an aggregation over a few days of "keys"...it's very simple to do an iteration from a STARTKEY to a STOPKEY (the keys in-between don't have to be defined...which is very important). Not only is it simple to do the iteration, it's very fast. This kind of use case (very common) is difficult without sorted keys and iteration capabilities.

link

halayli 4768 days ago

Agreed - but they are doing a full scan when doing searches so I don't see the benefit.

link

monstrado 4768 days ago

Searching is not the same thing as a range iteration, normally if you want to search a Key/Value than you'll need to scan the entire dataset.

A range iteration allows you to scan a subset of the data, and as long as they didn't fundamentally change how LevelDB works, than they will still support ranges.

link

halayli 4768 days ago

> Searching is not the same thing as a range iteration, normally if you want to search a Key/Value than you'll need to scan the entire dataset.

In this case, you lost the sorted keys advantage that BTree gives you.

My point is that this project leans more towards exact key -> value lookups and a BTree is an overkill here.

link

monstrado 4768 days ago

I guess it depends on your use case, most of the use cases I've seen have been time series using range iterations which is incredibly fast, but I understand your concern if you're only using it for random gets.

link

halayli 4768 days ago

Probably providing the option to create a btree or hash map would be nice.

link

jeffffff 4768 days ago

hash indexes tend to have really terrible write performance because the locations of the writes on disk are random. lsm trees have way better write performance.

link