Hacker News new | ask | show | jobs
by brickbrd 1803 days ago
In practice, for the systems where I built a replication system from the ground up, once you factor in all the performance, scale, storage layer and networking implications, this Paxos vs. Raft thing is largely a theoretical discussion.

Basic paxos, is well, too basic and people mostly run modifications of this to get higher throughput and better latencies. After those modifications, it does not look very different from Raft with modifications applied for storage integration and so on.

2 comments

> Basic Paxos, is well, too basic and people mostly run modifications of this to get higher throughput and better latencies. After those modifications, it does not look very different from Raft.

Alan Vermeulen, one of the founding AWS engineers, calls inventing newer solutions to distributed consensus an exercise in re-discovering Paxos.

https://youtu.be/QVvFVwyElLY?t=2367

Even in AWS, its not a direct paper knock-off.
This exactly my take as well. Multi-Paxos and Raft seem very similar to me. Calling out what the exact differences and tradeoffs are would be good blog/research fodder.
I think the differences become more stark and more valuable/surprising the closer you get to understanding the protocols. There are some major availability and performance tradeoffs involved in the choice between Multi-Paxos and Raft, as you go from paper to production. This can be the difference between your cluster remaining available, and the loss of an entire cluster merely because of a latent sector error.

For example, UW-Madison's paper "Protocol-Aware Recovery for Consensus-Based Storage" [1] won best paper at Fast '18 and describes simple scenarios where an entire LogCabin, Raft, Kafka or Zookeeper cluster can become unavailable far too soon, or even suffer global cluster data loss.

[1] https://www.usenix.org/conference/fast18/presentation/alagap...