Hacker News new | ask | show | jobs
by _y4bi 2715 days ago
Fellow RethinkDB user here. I’ve been looking at Cassandra and FoundationDB as replacements. I’m genuinely curious— what didn’t you like about Cassandra?
3 comments

Cassandra

To be honest, I don't like anything about Cassandra. Beginning with the naming: back when I was trying to learn about Cassandra, I couldn't get past the obscure and bizarre naming (super-columns?). When I dealt with systems using it, I never quite understood how you can keep saying that "the later timestamp wins" and speak of consistency with a straight face: in a distributed system, there is no such thing as a "later timestamp". Or speak of transactions which aren't really transactions at all.

Then I read the Jepsen reports about Cassandra. Yes, Cassandra has made progress since then, but still.

I think of Cassandra as an outdated piece of technology at this point: we can (and do) build better distributed databases today, with better consistency guarantees, and proper transactions in case of FoundationDB. Cassandra was designed for a specific use case and then outgrew its initial design, because there was nothing else at the time. But I see no reason to stick with it any longer.

Even now when you need massive multi-region scalability there is little to choose from — if you want it to be open-source, there's pretty much only FoundationDB left.

FoundationDB does not support true geo-replicated multi-region distribution the way Cassandra, Spanner, Cockroach, etc do, at least not without paying huge latency/round trip costs. If you want to avoid that, the best you can have is a separate failover region, and, with FoundationDB 6, you can get closer-to-LAN latencies for failover deployments to separate regions (but only one region) while retaining ACID semantics. You could build truly global geo-distribution on top of it but that would have to be its own layer that implements 2PC/Paxos or something between regions. Ultimately you have to pay the toll somewhere in a truly consistent system like that if you want global availability (unless you're Spanner and have incredible hardware engineering that can be deployed across the globe).

Cassandra/Scylla are the only open source key value stores that do linear scalability by simply adding nodes even in huge, geo-distributed settings as far as I know, but they are ultimately AP systems. And Scylla just has absurd performance compared to Cassandra or FoundationDB. You just have to know what you're getting into. (But yes, ACID transactions are a good model for developers, and truly FDB's linearizable transactions and high scalability make it an obvious choice many CP systems, if you ask me.)

That is an excellent summary. There is no silver bullet and you can't have your cake and eat it, too. The approach to multi-region that FoundationDB 6 takes suits my needs (I'm not Google) and I like the compromises they made.

Since most of what I do (or consult with) does not need massive performance, I'd rather pick databases with compromises favoring consistency and correctness. This is why I like what I see in FoundationDB so far.

There is nothing else open-source that does multi-region active/active clusters like Cassandra/Scylla.

The rest either have a single cluster that can try to be stretched (usually with bad results or incredibly high latency) or is an enterprise feature using complicated log-shipping to apply updates everywhere.

If you’re considering Cassandra, it’s probably also worth considering Scylla. It’s a drop-in replacement for Cassandra so shares some of its flaws, but is considerably more pleasant to run in production.
Probably async
"Probably async"? Could you expand on it?
To achieve "availability", cassandra lowers consistency, making dev's life extremely hard.

Long: https://docs.yugabyte.com/latest/comparisons/cassandra/

TLDR: google/fb/etc went the other way, using hbase,bigtable witch are strongly consistent.

Cassandra has monotonic reads so it is strongly consistent as long as you have a quorum of nodes. I haven't yet read the link you posted but it's coming from a vendor that is aiming to replace Cassandra so I'll take it a pinch of salt.
No salt required. You can replace it with any strongly consistent system that has transactions and it's the same.
I see. Has Amazon replaced Dynamo with Postgres? Has Google replaced BigTable with MySQL?