|
Like a lot of designs that use Raft/Zookeeper/Paxos/whatever as a building block, the full system doesn't inherit all of the safety properties of the underlying consensus algorithm. I don't think that makes this code useless by any means, but I think it's important to be aware of the edge cases. Consensus algorithms are popular because they're supposed to solve the difficult problem of guaranteeing consistency while attempting to provide liveness, in the presence of arbitrary node or connection failures. Etcd itself can provide this for operations on its own datastore, but that doesn't mean it can be used as a perfect failure detector for another system (which is impossible in the general case). In particular, if the database master becomes partitioned from the etcd leader for more than 30 seconds but is still accessible to clients, boom -- split brain. (You can attempt to mitigate this with timeouts, but that's not foolproof if your system can experience clock skew or swapping/GC delays. Exactly this kind of faulty assumption has caused critical bugs in e.g. HBase in the past, turning what would otherwise be a temporary period of unavailability into data loss.) EDIT: If I'm reading the code correctly, compose.io doesn't make any attempt to mitigate this failure scenario. If the Postgresql master can't contact etcd, it continues acting as a master indefinitely, even after 30 seconds have expired and another server might have taken over. This appears to be what happens in the "no action. not healthy enough to do anything." case in ha.py. I'd be happy to be corrected if there's something I'm missing. |