Hacker News new | ask | show | jobs
by JelteF 3078 days ago
Main developer of Keevo at Stream here and you ask a very good question. I've asked myself the same question quite a lot. And I think it actually warrants it own dedicated blog post at some point. To give some idea though, the main reasons are, cost, simplicity, control and wanting to have a very good understanding of the database internals.

Cassandra is very scalable, but it's not very efficient. The hosting costs for our cassandra cluster were so big that it was infeasible to run it in another region as well.

Appart from that we've had (partial) downtime a couple of times because one node just started going crazy because of unclear reasons.

Keevo solves this by not trying to be cassandra, but be much simpler. It doesn't do schema's or indexes. All it is is a very fast ordered key value store that is stored to disk and replicated automatically to multiple servers (using raft). Any other features we need, we build on top of this usually outside of Keevo itself. This simplicity saves us a lot of hosting costs and makes its performance much more predictable and easier to debug.

Last but not least a very important advantage is control and understanding of the database internals. Because we build Keevo ourselves, we know the performance and consistency tradeoffs it has and can change/improve them when needed.

I hope this helps in understanding our choice. It's definitely not something I would recommend for most companies, but since our product is storage at its core it makes sense for us.

4 comments

> Appart from that we've had (partial) downtime a couple of times because one node just started going crazy because of unclear reasons.

Do you have GC logs? GC lockups are the most common case. Have you used G1GC + Java8?

Thanks for sharing, super interesting stuff. I'd be curious to hear more about the design around availability in the case of a zone outage :D
I also think it's really interesting stuff and love working on it. I'll definitely write something about that if/when I do a blog post about it. But it's quite simple at its core. Like the current post mentions, we use Raft to do it. We simply have a cluster of 3 nodes, each in a different zone. If one zone goes down, there's still a majority of nodes up, so enough to keep the cluster running. I recommend reading the raft paper for more details, it's one of the easiest papers to read and understand I've ever found. https://raft.github.io/raft.pdf
> Keevo solves this by not trying to be cassandra, but be much simpler.

Sounds awfully similar to Riak. :)

I have to say I'm not very familiar with Riak. A quick google makes it look similar in functionality. I'm not sure about speed though, RocksDB is really really fast and storage efficient. We tried out a lot of other embedded KV stores. Also Riak does seem to miss one important feature for us, iterating through the keys in lexical order. We use that a lot to build features on.

Even if this feature was/is available in Riak, I still think this was the better choice for us. Bringing this core component in house has been a real boon in gaining good and more importantly predictable performance for our API.

Iterating through keys is not trivial in Riak, you need a 2i to have that. If you have the manpower to maintain your solution and there are people with performance engineering skill it is probably good. Riak has very predictable performance and it is easy to tune and rock solid system this is why it is used in several healthcare systems and for the game that has the largest player base at the moment as well. The consistent hashing layer on the top of LevelDB which is essentially the foundation of RocksDB makes it super nice system. Anyways, I hope Keevo will be available for us as well.
The "you need 2i to have that" is too heavy.

See yugabyte that does cassandra+keevo+rocksdb+raft

Thanks for the comment and look forward to the dedicated blog post :)

Was Riak KV ever considered?

Not as far as I know. We did benchmark a lot of embedded storages before deciding on Rocksdb though.
Wasn't riak, like, less efficient than cassandra ?
What do you mean? I think Riak is much simpler and after tuning it beats Cassandra on the same HW same use case in my anecdotal experience.