Hacker News new | ask | show | jobs
by zawerf 2786 days ago
> unless you only have three nodes

Interestingly enough this seems to be how managed sql solutions do it. You just have one primary and one failover replica in another zone. So a quorum write is just a synchronous write to both (2 out of 2). You don't have the split-brain problem because the original primary can't make progress if it can't contact the replica.

https://cloud.google.com/sql/docs/mysql/high-availability

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Conce...

Github's problem is that they were trying to be too smart and allow any of their replicas to be candidate masters without increasing their quorum size. In theory this is higher availability than the managed sql solutions (they can be available even if their entire coast gets nuked) but they do it at the cost of consistency in the more common failure scenarios.