| You're probably aware of this, but for the sake of others reading: Crashing after an fsync failure isn't sufficient if you're using buffered IO either. Dirty pages in the page cache could cause your consensus implementation to e.g. allow voting for two different leaders for the same term if at some future point that machine crashes and the dirty page contained the log record for that vote. Your process would restart after the machine restarts and no longer have access to the dirty page and potentially vote again if asked. Edit: Thought I'd add the series of steps for this to happen to make it clearer: 1. Process X receives an RPC from Process Y to vote for Y as leader in term 1. 2. Process X writes log record containing vote information to page cache and calls fsync. 3. Process X receives EIO in response to fsync (if it is lucky...) and crashes. 4. Process X restarts and receives a retry of the same RPC from Process Y for term 1 and this time it responds affirmatively because it can see it already voted in term 1 for Process Y, which _should_ be safe. 5. Process X crashes because the machine hosting X experiences a temporary hardware failure. 6. Process X restarts after the hardware failure and receives an RPC from Process Z and responds affirmatively because the dirty page that was available back at T4 never actually made it to disk. Consensus is now (potentially) broken from electing two leaders for the same term. |
Think of all the extra disk IO the worlds databases are doing to defend against step 3) :)