|
|
|
|
|
by grncdr
5474 days ago
|
|
I've heard this about Riak and I was quite excited to test it out for a new project, but in the limited testing I've done Cassandra and HBase both absolutely smoke Riak in terms of write performance. Not really apples to apples I suppose, but I was really surprised at how slow Riak was when handling many (millions) of small writes. We haven't finished our testing/profiling phase yet, so any hints on how to optimize a large number of small writes (on the order of ~dozen bytes each) would be appreciated. |
|
When you hit a certain traffic level, scalability, latency and robustness become far more important than single-node ops/s. I need to be able to add nodes and repair failed nodes while under load--I need the 99.9% latency mark to stay ~100ms while doing so. I don't really care how many bajillions of ops a second your database can do in some concocted scenario, b/c you're not going to do that many in the real world anyway (trust me, we tried). The disk subsystem is going to give you a few hundred, maybe a few thousand if you're lucky, IOPS, then your latency will spike to hell and your phone will wake you up at night.
Maybe in the world where 99% of ops are reads, you will put up impressive numbers, but now you're just showing you are pretty good at using the disk cache. That's a relatively easy problem.
The riak guys seem to get all this better than most: http://blog.basho.com/2011/05/11/Lies-Damn-Lies-And-NoSQL/
So, to give you a short answer to your direct question:
Use SLC SSDs + md + RAID-0. Have at least 5 nodes. Use bitcask, but realize that your keys will need to fit in memory. Also, realize that really small values aren't a great fit for Riak in some ways b/c the overhead per value is at least a few hundred bytes.
Also, it's important to note this is where I'm at right now, but maybe not where you (generally) are at. Riak may not make you happy at server #1, but it will make you pretty happy at server 10 and server 100.
Riak's sweet spot is people with scaling pains. If you only need a server or two to try some stuff, and you don't have any users yet, you might cause yourself more headaches than you need. Sometimes you don't need a locomotive, you need a motorcycle.
(These guys have a pretty great motorcycle: http://rethinkdb.com/ )