Hacker News new | ask | show | jobs
by grncdr 5481 days ago
For everything you've said about Riak re: stability and happy scalability, that's exactly why I was excited to include it in this test. I would like to prevent myself/my team from acquiring too many of those scars, so please elaborate on what caused them ;) Especially if those scars came from Cassandra or HBase!

The test hasn't been a "concocted scenario", it's measuring the performance[1] of a prototype implementations for what will be an essential piece of our infrastructure and process (bulk loads of large numbers of small records, very read heavy after the initial load). Riak's write performance was completely adequate, just nowhere near what we got out-of-the-box with the bulk insert operations available in Cassandra and HBase. I asked on #riak channel on freenode and got told to use protocol buffers (which we already were), I'd really appreciate advice beyond this.

> Also, realize that really small values aren't a great fit for Riak in some ways b/c the overhead per value is at least a few hundred bytes.

This is pretty much what I've chalked it up to. It's unfortunate because that is the use case for which we currently need to provide a solution for right now, and once we've got some of our data in one distributed data store, it's convenient (and considered less risky) to use that same technology for the next project. (This is really a culture thing though, it's taking us months to get the necessary buy-in and approval for a postgres 8.2 -> 9.0 upgrade rolled out for a different product, where we know it would solve a specific issue we have).

[1] We've been running our tests on a 4 node cluster, each node has an 8 core 2.8ghz xeon, 32gb of ram, and a woefully inadequate disk: the machines were repurposed from a system that required them to have redundancy and didn't require write performance, so the drives are RAID1. We also need to make recommendations to IT for their hardware purchase plan after our testing.

2 comments

> The test hasn't been a "concocted scenario"

Btw, b/c I can totally see why you'd read it that way, that particular barb wasn't directed at you, more directed at some of the public benchmarks touted by (non-distributed) NoSQL database systems.

Re: cassandra, please see my reply to the sibling on this thread.

Also, feel free to email me jamie@bu.mp if I can answer any specific questions for you with things we ran into with various database systems.

Can you make a blog post about the issues instead? It would greatly benefit everyone.
I'd love to, but there are things politically complicated about all this in the very small startup world. So maybe one day, but I can't do it right now.
Ah, that's too bad. Thanks anyway.
Have you looked into using ets storage instead of bitcask storage? Millions of keys at ~12bytes value would fit just fine in memory without the bitcask overhead and as long as your N value is > 1 you shouldn't have to worry about data loss unless your whole cluster loses power.