Hacker News new | ask | show | jobs
by davidhollander 5588 days ago
>you may be misunderstanding bitcask

Yes I was, I apologize. It must have been MongoDB that required total data be no larger than amount of RAM.

The test was run using protocol buffers and Python client on the same computer testing reads vs. a naive map reduce. The naive map reduce was storing 4,000 files in a folder, treating the filename as the key and parsing the text file contents from JSON into a python dictionary to see if an attribute matches. Basically I figure doing a loop of thousands of blocking disk accesses from a laptop harddrive on a standard filesystem buffering nothing in memory should always be much slower than any database.

This was last year so maybe Riak's performance has increased since then. I'd be interested if TokyoCabinet was added as a backend.

1 comments

Riak, by default, uses a replication value of three. Your single test machine has to do ~three times the work, so you should expect slower performance here. (I'm oversimplifying somewhat.)

You'll see significantly improved performance on a linear test (in my informal testing, 3-4x speedups) by adding an extra two nodes. Parallelized tests pretty much scale linearly with nodes.

In practice, I've found Riak to be slightly slower than MySQL. Direct reads/writes tend to be fast, but JSON parsing can bite you and denormalization requires more writes. The major advantage is that the Riak system can scale linearly with nodes, and that it can fail in predictable and resolvable ways.

As an example, the feed system I'm currently building on Riak will survive a total network partition and allow full reads and writes from every node with no data lost. Everything is automatically merged when the partition ends. The vclock-tagged multi-value functionality of Riak is exceptionally powerful when you want to design these types of systems, and is, in my mind, worth the performance hit and additional design complexity for certain classes of problems.

This was last year so maybe Riak's performance has increased since then. I'd be interested if TokyoCabinet was added as a backend.

There are also InnoDB and multiple in-memory backends, which may provide performance characteristics more in line with what you are looking for.