|
|
|
|
|
by sethammons
2786 days ago
|
|
This was my first thought, you beat me to it. It seems like they contradict each other. Is it Raft with quorum or is it a single replica node that is caught up with master? You can't have both (unless you only have three nodes). |
|
Interestingly enough this seems to be how managed sql solutions do it. You just have one primary and one failover replica in another zone. So a quorum write is just a synchronous write to both (2 out of 2). You don't have the split-brain problem because the original primary can't make progress if it can't contact the replica.
https://cloud.google.com/sql/docs/mysql/high-availability
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Conce...
Github's problem is that they were trying to be too smart and allow any of their replicas to be candidate masters without increasing their quorum size. In theory this is higher availability than the managed sql solutions (they can be available even if their entire coast gets nuked) but they do it at the cost of consistency in the more common failure scenarios.