Hacker News new | ask | show | jobs
by justinsb 5913 days ago
You choose your quorum trading off cost/complexity vs risk-tolerance. You ensure that not forming a quorum is impossible in scenarios that you care about. e.g. You may decide it's OK not to form a quorum if the entire USA power grid goes offline.

The broad problem is that you're trying to apply the mathematical proof of the CAP theorem to the real world. For example, the proof of the CAP theorem treats single-node failures as a case of network partitioning, which is logically elegant. But in the real world, it's just not realistic to consider a dropped TCP connection as equivalent to the failure of a datacenter, as you seem to be doing.

1 comments

Er, no. I'm just not differentiating between the various reasons why a single node may be unavailable. It doesn't really matter _why_ the node is unavailable... it just is.

FWIW, databases like Cassandra expose the consistency tradeoff to the client. You can do quorum reads/writes with Cassandra. You can't with MySQL or PostgreSQL.

Edit: you can choose between quorum reads/writes and stronger or weaker consistency levels with Cassandra, but can't with MySQL / PostgreSQL.

I'm not going to treat a cosmic ray corrupting one single network packet the same way I treat a hurricane cutting off power to a datacenter for 2 weeks. I do see the intellectual appeal in doing so, but we'll just have to agree to disagree!
Uhm, of course they're not the same thing... but they have the same effect. The point is that the system remains available even if a node becomes unavailable for _whatever_ reason. I'm not sure what you're disagreeing on... There are common and uncommon modes of failure. Of course we should prioritize handling the common ones. But if we can handle all of them at once that's ideal. And, as I said in an earlier comment, when you're doing a million operations a second, failures that are one-in-a-million happen every second.