| Without going too much into specifics and picking on individual databases (which I could do, boy do I have the scars...) When you hit a certain traffic level, scalability, latency and robustness become far more important than single-node ops/s. I need to be able to add nodes and repair failed nodes while under load--I need the 99.9% latency mark to stay ~100ms while doing so. I don't really care how many bajillions of ops a second your database can do in some concocted scenario, b/c you're not going to do that many in the real world anyway (trust me, we tried). The disk subsystem is going to give you a few hundred, maybe a few thousand if you're lucky, IOPS, then your latency will spike to hell and your phone will wake you up at night. Maybe in the world where 99% of ops are reads, you will put up impressive numbers, but now you're just showing you are pretty good at using the disk cache. That's a relatively easy problem. The riak guys seem to get all this better than most: http://blog.basho.com/2011/05/11/Lies-Damn-Lies-And-NoSQL/ So, to give you a short answer to your direct question: Use SLC SSDs + md + RAID-0. Have at least 5 nodes. Use bitcask, but realize that your keys will need to fit in memory. Also, realize that really small values aren't a great fit for Riak in some ways b/c the overhead per value is at least a few hundred bytes. Also, it's important to note this is where I'm at right now, but maybe not where you (generally) are at. Riak may not make you happy at server #1, but it will make you pretty happy at server 10 and server 100. Riak's sweet spot is people with scaling pains. If you only need a server or two to try some stuff, and you don't have any users yet, you might cause yourself more headaches than you need. Sometimes you don't need a locomotive, you need a motorcycle. (These guys have a pretty great motorcycle: http://rethinkdb.com/ ) |
The test hasn't been a "concocted scenario", it's measuring the performance[1] of a prototype implementations for what will be an essential piece of our infrastructure and process (bulk loads of large numbers of small records, very read heavy after the initial load). Riak's write performance was completely adequate, just nowhere near what we got out-of-the-box with the bulk insert operations available in Cassandra and HBase. I asked on #riak channel on freenode and got told to use protocol buffers (which we already were), I'd really appreciate advice beyond this.
> Also, realize that really small values aren't a great fit for Riak in some ways b/c the overhead per value is at least a few hundred bytes.
This is pretty much what I've chalked it up to. It's unfortunate because that is the use case for which we currently need to provide a solution for right now, and once we've got some of our data in one distributed data store, it's convenient (and considered less risky) to use that same technology for the next project. (This is really a culture thing though, it's taking us months to get the necessary buy-in and approval for a postgres 8.2 -> 9.0 upgrade rolled out for a different product, where we know it would solve a specific issue we have).
[1] We've been running our tests on a 4 node cluster, each node has an 8 core 2.8ghz xeon, 32gb of ram, and a woefully inadequate disk: the machines were repurposed from a system that required them to have redundancy and didn't require write performance, so the drives are RAID1. We also need to make recommendations to IT for their hardware purchase plan after our testing.