Hacker News new | ask | show | jobs
by bkirwi 4502 days ago
It looks like the protocol is broken; can anyone confirm?

Suppose a handful of the processes trigger an election simultaneously, each with a `currentEpoch` of `n`, so that each process has voted but no proposal has a majority. Now all processes have a `lastVoteEpoch` of `n`.

According to the description, no process will vote more than once in a single epoch. However, a process will can only propose a new value using its `currentEpoch`. Since each proposal must have a `currentEpoch` of `n`, but no process is still allowed to vote in that epoch, no more state transitions are possible and the frequency cannot change.

3 comments

Check the "retry" part, basically when there is a split-brain during an election, a new process will try again and eventually will succeed. This is copied from Raft.

It works because when some instance retries, it increments again the currentEpoch, so ti is a new voting process from the point of view of the voters.

Btw it is not needed that to retry is the instance that did it the previous time, because even if the voting failed, the currentEpoch of the process that tried to get elected, will be propagated across the system, so every process will get the new epoch, and can try with an updated epoch to get voted.

Right. For the same reason, the Paxos protocol allows multiple prepare votes from each acceptor. It makes this safe by requiring that each acceptor tell the proposer of the value for the highest ballot number it has accepted, and requires the proposer to switch proposal values when receiving such a message. This is, at first glance, one of the strange edges of the Paxos (synod) protocol, but it makes a lot of sense once you see an example where it is needed.

Raft has somewhat similar constructs, but explained slightly differently.

Hey, actually Raft uses randomized retries, since I copied this part (and most of the rest) from Raft. Voting can result into split brain as this condition is recovered, and a retry time that is big enough compared to RTT guarantees liveness.
It looked similar. Can you highlight the differences from Raft? Is it just that state is small so it can accompany the various voting/heartbeat messages?
Because the state is small and does not depend on the previous computation, the problem becomes trivial, and is accomplished just by broadcasting the full information with an attached unique version. No need for a log, rules to apply or not entries from the log to state machine, and so forth.
Hmm -- if a process bumps its `currentEpoch` for every proposal, and the delays are randomized, it looks like this would fail only probabilistically and may eventually succeed. I'll have to take a closer look later today...
Exactly that. It works only as long as the retry time is randomized and much larger than then network RTT.