| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by CHY872 1069 days ago

I have >5 years of experience using Cassandra in production, involving thousands of clusters storing petabytes of data. My conclusion from that time is that Cassandra is simply not robust enough to be a general purpose database (the team are working on it but they're coming from a really rough starting place) - there are lots of ways to cause data corruption, and Cassandra does enough dynamic repairing that it can be hard to catch this before your backups are dropped due to time windowing. Unfortunately, the juice may still be worth the squeeze - Cassandra's storage model lends itself very nicely to disaster recovery workflows in a way which something like Oracle or FoundationDB does not (and it's Cassandra so you'll need it!), while the ability to horizontally scale gets you out of so many operations issues. If you've got a schema which works well in Cassandra, you've probably solved a lot of the issues you might have.

Example of fairly standard Cassandra bug (don't know if present on latest release, certainly was a year or two ago): When you add a new node to the cluster, it 'bootstraps', where it copies ~1/n the data from other nodes. When you are done bootstrapping, it's copied a bunch of data from other nodes, but the other nodes still contain that data. You then run 'cleanups' on the other nodes to remove the (now stale and unusable) data so as to get your disk space back.

If you accidentally run a cleanup on the new node as it is being bootstrapped, it will succeed, you will delete all the data that's been copied over so far, and Cassandra will _not_ terminate the bootstrap. Everything will be green, but your new node will suddenly be using 0 disk space. When the bootstrap finishes, possibly days later, your cluster will be immediately corrupted due to violated replication guarantees - but only on data that hasn't been read or written over that period, because if it was written it'll be re-replicated, and if it was read Cassandra will silently repair at this time. Repairs resolve the issue, but if you've made this mistake due to scripting, if you get unlucky it's possible to just delete all replicas of some data between repairs.

Example of other Cassandra bug (again, might be outdated): Cassandra nodes identify themselves on startups with IPs, and the owned token ranges are not persisted, they're streamed from other nodes in the cluster. If you've deployed your Cassandra in K8s and you reboot multiple nodes in one go and they swap IPs upon reboot, you may now find yourself in a split brain situation in which nodes magically forget they own certain data ranges and think they own each others data (or maybe it's that the nodes still think they own the right ranges but other nodes think they own the wrong ranges). Wasn't close enough to fully debug that one.

It's a mess. Would seek to avoid problem spaces where I might need to use it again, though if by chance ended up in a space where it made sense, probably wouldn't avoid the tech.

1 comments

hardwaresofton 1069 days ago

Thanks for this insight -- this is one of the first time I've heard of someone constrasting FoundationDB and Cassandra which is nice.

> Example of fairly standard Cassandra bug (don't know if present on latest release, certainly was a year or two ago): When you add a new node to the cluster, it 'bootstraps', where it copies ~1/n the data from other nodes. When you are done bootstrapping, it's copied a bunch of data from other nodes, but the other nodes still contain that data. You then run 'cleanups' on the other nodes to remove the (now stale and unusable) data so as to get your disk space back.

Interesting, seems like there is a bunch of little knowledge like this needed to run a service properly... Managed Cassandra has more added value to provide I guess.

> If you accidentally run a cleanup on the new node as it is being bootstrapped, it will succeed, you will delete all the data that's been copied over so far, and Cassandra will _not_ terminate the bootstrap. Everything will be green, but your new node will suddenly be using 0 disk space. When the bootstrap finishes, possibly days later, your cluster will be immediately corrupted due to violated replication guarantees - but only on data that hasn't been read or written over that period, because if it was written it'll be re-replicated, and if it was read Cassandra will silently repair at this time. Repairs resolve the issue, but if you've made this mistake due to scripting, if you get unlucky it's possible to just delete all replicas of some data between repairs.

This seems... really bad -- I don't think I have the skill to run a Cassandra cluster (and not enough use cases to run it as a hobby to find these edges)...

This sounds like the space for a consultancy to make a tidy killing though.