| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aphyr 5588 days ago

Bitcask is an in memory datastore because it keeps a copy of the entire dataset in memory.

Begging your pardon, but I think you may be misunderstanding bitcask. The bitcask keydir is stored in memory, but the values are stored on disk. The keydir is a hash mapping each key to a file ID and the offset/size in that file at which the value is stored. The only time values are stored in memory on is when the kernel's fs cache or readahead buffer provide them.

http://downloads.basho.com/papers/bitcask-intro.pdf

Since a filesystem directory listing is likely held by your OS cache, you should see similar performance between bitcask and files on disk: an in-memory lookup to obtain the inode/offset, and a disk seek+read.

Riak will be substantially slower than directly using bitcask, however, because you may need to talk over the network to as many as N nodes, wait for all their responses, and compute the resultant vclock/metadata, and then serialize it for HTTP (huge TCP overhead) or protocol buffers (relatively fast). If you're running on a single machine, then you may incur additional time for that machine to run what would normally be distributed over three nodes. Without knowing more about your benchmark, however, it's difficult to say.

1 comments

davidhollander 5588 days ago

>you may be misunderstanding bitcask

Yes I was, I apologize. It must have been MongoDB that required total data be no larger than amount of RAM.

The test was run using protocol buffers and Python client on the same computer testing reads vs. a naive map reduce. The naive map reduce was storing 4,000 files in a folder, treating the filename as the key and parsing the text file contents from JSON into a python dictionary to see if an attribute matches. Basically I figure doing a loop of thousands of blocking disk accesses from a laptop harddrive on a standard filesystem buffering nothing in memory should always be much slower than any database.

This was last year so maybe Riak's performance has increased since then. I'd be interested if TokyoCabinet was added as a backend.

aphyr 5588 days ago

Riak, by default, uses a replication value of three. Your single test machine has to do ~three times the work, so you should expect slower performance here. (I'm oversimplifying somewhat.)

You'll see significantly improved performance on a linear test (in my informal testing, 3-4x speedups) by adding an extra two nodes. Parallelized tests pretty much scale linearly with nodes.

In practice, I've found Riak to be slightly slower than MySQL. Direct reads/writes tend to be fast, but JSON parsing can bite you and denormalization requires more writes. The major advantage is that the Riak system can scale linearly with nodes, and that it can fail in predictable and resolvable ways.

As an example, the feed system I'm currently building on Riak will survive a total network partition and allow full reads and writes from every node with no data lost. Everything is automatically merged when the partition ends. The vclock-tagged multi-value functionality of Riak is exceptionally powerful when you want to design these types of systems, and is, in my mind, worth the performance hit and additional design complexity for certain classes of problems.

This was last year so maybe Riak's performance has increased since then. I'd be interested if TokyoCabinet was added as a backend.

There are also InnoDB and multiple in-memory backends, which may provide performance characteristics more in line with what you are looking for.