| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sriram_malhar 632 days ago

I have had the opposite trajectory. Used to teach Paxos, but was so relieved to switch to Raft when the paper came out.

I discuss distributed data structures in the context of maps and sequences.

For maps, I discuss key-value stores (NUMA, Redis). I have them implement cache coherence (MESI protocol, TARDIS 2.0), then linearizable, fault-tolerant, wait-free shared memory registers (the Attiya/Bar-Noy/Dolev algorithm[1]).

For sequences, I cover shared logs and state machine replication, including database log shipping, Kafka, queues and Raft.

I like Raft because it cuts down the design space by making certain very intuitive and pragmatic choices, like using timeouts (which are almost beneath Lamport to discuss :), or idioms like "follow the leader", "if the leader is unreachable, stand for election", "elect the latest & most informed leader", (how I wish that was true in real life!), "always append" etc. There are simple mechanisms to preserve invariants.

The problem with Paxos is that there is such a large range of papers that there is no one paper that makes the leap in easy digestible chunks from Basic to MultiPaxos. When I got students to implement MultiPaxos, I never could get sufficient confidence that it was done right (esp. the "disorderly" filling in of log slots).

Paxos is like Monads; when you get it, you feel compelled to write a "Paxos explained" paper :)

[1]https://groups.csail.mit.edu/tds/papers/Attiya/PODC90.pdf

1 comments

withinboredom 632 days ago

I feel like you are doing your students a disservice. (multi-)Paxos, while complex to wrap your head around, enables far more modes of consensus. The possibilities and papers out there are amazing.

Raft essentially only allows a single mode. Moreover, you are starting to see people putting things on top of Raft instead of something like Paxos, in the enterprise, because they don't know any better nor have the foundation to understand what they are doing is "wrong."

> When I got students to implement MultiPaxos, I never could get sufficient confidence that it was done right

Testing this is fairly straightforward, they should be able to join an already existing cluster. If they got it wrong, it shouldn't take down the cluster, and they should be able to step through their own code. There aren't any timeouts, so they can take their time, going through each step of the process until a value is committed.

At that point, you simply explain each step as an individual algorithm, not the sum of its parts. You can even build each part individually because an existing cluster should recover from a misbehaving peer.

From there, it is a rather simple visualization process to see what is going on.

The hard part of paxos is building it from scratch.

link

sriram_malhar 632 days ago

Can you tell more about what other interesting modes multiPaxos allows? For an example of what I find uninteresting, it is proposers or acceptors not having a local disk (I know there are some uses for it, but there are relatively straightforward ways of solving that issue without requiring a whole new protocol). In all examples I have seen, multipaxos and raft are fairly alike, except for parts of their leadership election as Howard and Mortier also describe.

Raft's simplicity and the fact that there's exactly one way to do it is what I find most comforting in the most important component of a distributed system. I can look at an implementation of raft and immediately understand what's going on, and what is missing.

link

withinboredom 631 days ago

What makes multiPaxos a better learning tool isn't multiPaxos for the sake of multiPaxos, but rather Paxos itself. The write-once consensus primitive is a valuable tool (whether you are implementing Raft or multiPaxos) or thing to know. Especially when it comes to flexible / compartmentalized Paxos. Don't get me wrong, the same things can theoretically be done with Raft (if they haven't already), but that single-degree primitive makes it make that much more sense. This is missing from Raft.

It's like learning about a byte and never learning about endianness because everything is pretty much little-endian these days.

link

sriram_malhar 631 days ago

There are many algorithms to implement a single wait-free shared register. I like Basic Paxos, don't get me wrong. I also love the Part-time parliament paper's description of it. What I don't care much for is the generalization to multipaxos; too many things left to the imagination, which, in the hands of people like me without Lamport's reasoning skills, is headed for disaster. There is a reason why they say Paxos is too hard, but few people say that about Raft. A raft implementation is maintainable by ordinary mortals, because like I said, there is only one documented recommended way to do it.

link