Hacker News new | ask | show | jobs
by lpage 1015 days ago
Maelstrom [1], a workbench for learning distributed systems from the creator of Jepsen, includes a simple (model-checked) implementation of Raft and an excellent tutorial on implementing it.

Raft is a simple algorithm, but as others have noted, the original paper includes many correctness details often brushed over in toy implementations. Furthermore, the fallibility of real-world hardware (handling memory/disk corruption and grey failures), the requirements of real-world systems with tight latency SLAs, and a need for things like flexible quorum/dynamic cluster membership make implementing it for production a long and daunting task. The commit history of etcd and hashicorp/raft, likely the two most battle-tested open source implementations of raft that still surface correctness bugs on the regular tell you all you need to know.

The tigerbeetle team talks in detail about the real-world aspects of distributed systems on imperfect hardware/non-abstracted system models, and why they chose viewstamp replication, which predates Paxos but looks more like Raft.

[1]: https://github.com/jepsen-io/maelstrom/

[2]: https://github.com/tigerbeetle/tigerbeetle/blob/main/docs/DE...

4 comments

> [Viewstamped replication] predates Paxos but looks more like Raft.

Heidi Howard and Richard Mortier’s paper[1] on the topic of Paxos vs Raft has (multi-decree) Paxos and Raft written out in a way that makes it clear that they are very, very close. I’m very far from knowing what consequences (if any) this has for the implementation concerns you state, but the paper is lovely and I wanted to plug it. (There was also a presentation[2], but IMO the text works better when you want to refer back and forth.)

[1] https://doi.org/10.1145/3380787.3393681

[2] https://www.youtube.com/watch?v=0K6kt39wyH0

The view-stamped replication paper was surprisingly readable - I'd never looked at consensus algorithms before in my life and I found I could kind of follow it after a couple of reads.

https://dspace.mit.edu/bitstream/handle/1721.1/71763/MIT-CSA...

Don't forget heterogeneous network topologies and the fact that some members make absolutely terrible leaders.
looks like a great playground to get familiar with ds