Hacker News new | ask | show | jobs
by hinkley 1951 days ago
Author spends some time on determinism, which got me thinking of CRDTs, which are trying to make state changes commutative. They are in effect trying to get from linearalized state changes to causality. If we find a way to solve that problem, we have a more efficient consensus than Raft, one that can withstand small gaps in availability.

I keep waiting for someone to bring a Turing, Gödel, or Shannon into this and point out it’s not computable.

That worry aside, this thought process also brought me to monoids, which are used to share state in some functional languages. I’m curious how much information about concurrent state change is locked up in that space that people trying to solve the general problem don’t have ready access to.

3 comments

Are you familiar with Joe Hellerstein and Peter Alvaro's work in this space? CALM provides something of a unifying theory of mergeable operations, of which CRDTs are an "object orientated" special case: https://arxiv.org/pdf/1901.01930.pdf

In practice, designers choose coordination-heavy protocols like consensus for a number of reasons. One is because writes don't or can't be merged. Operations as basic as simple assignment (x = 1;) can't be merged, so that's very real. Another is because readers can't tolerate weak consistency, because their business logic needs to make decisions at a particular point in time.

You're right that the thinking behind CRDTs (and CALM) is useful in reasoning through determinism in this context. The determinism problem, though, is easier than the general monotonicity problem, because only determinism is required and not associativity or commutativity.

"we have a more efficient consensus than Raft,"

Except that in the Multi-Paxos family of consensus and replication protocols, Raft is probably the least efficient. It's to the extreme and foregoes several simple optimizations, because of design decisions taken in the name of understandability and simplicity.

I would suggest that VR is at least as understandable as Raft, while also being more efficient and with the lowest latency for leader election in the common case. There's no need for Raft's carefully tuned random timeouts because split votes are not possible with VR. VR also lets you do some really cool things like warm up the next leader, or prioritize synchronous replication under Flexible Paxos to the next leader with asynchronous replication amongst the remaining followers, or decide who you want the next leader to be to optimize for geographical placement etc.

VR's view change protocol for leader election is also entirely in-memory so it's far more fault tolerant compared to Raft if you have a storage fault model and not only a network fault model. For example, Raft requires strong persistence guarantees for correctness of the leader election phase. If anything goes wrong with your disk (a ghost write or misdirected write) then Raft's implementation as written would be unsafe. Raft also has liveness issues if all nodes have even a single block failure at any point in their log.

If you're going to reach for a consensus algorithm, there are a lot of good reasons to do a survey of the literature first. There's a whole spectrum to choose from.

Joran, good to see your name come up!

> VR's view change protocol for leader election is also entirely in-memory so it's far more fault tolerant compared to Raft if you have a storage fault model and not only a network fault model.

Only in the fail-stop model, right? Or does this property extend to other models (like omissions)?

Thanks Marc, I'm glad, you too!

Ever since I came across https://brooker.co.za/blog/2018/01/01/balls-into-bins.html, I've been really excited to see and follow all the fantastic design work you've been doing.

>> VR's view change protocol for leader election is also entirely in-memory so it's far more fault tolerant compared to Raft if you have a storage fault model and not only a network fault model.

> Only in the fail-stop model, right? Or does this property extend to other models (like omissions)?

You're far more knowledgeable about the domain... but I was thinking that with VRR's completely in-memory view change and replication protocol, this would then extend past the fail-stop model to include byzantine disk storage faults (misdirected reads/writes, corruption, latent sector errors) since VRR requires no guarantees from the disk in order for the consensus protocol to be correct, whereas Raft does.

To me this is just another reason that makes VRR such a fantastic protocol.

To be fair, I guess we could say that Raft assumes a fail-stop disk model, but then again Raft is supposed to be a practical implementation and disks are not fail-stop in reality. I'm sure you're also well familiar with WISC's Protocol-Aware Recovery for Consensus-Based Storage: https://www.usenix.org/system/files/conference/fast18/fast18...

Beyond that, I've also been wondering about this from the perspective of taking VRR's in-memory view change for the leader election phase, but then combining this with disk-based persistence for the replication phase, along with CTRL from WISC.

I would love to hear your thoughts on this. Did I understand you correctly regarding what you meant with extending "to other models (like omissions)"?

Would be great to catch up next time you are in the Cape!

If we find a way to solve that problem [...]

We will not. And you are hinting at the reason yourself.

[...] CRDTs, which are trying to make state changes commutative.

But this is slightly wrong, when you are using CRDTs you are not trying to make state changes commutative, you are limiting the allowed state changes to commutative ones. But some state changes are inherently non-commutative and if your system requires those, then you can not build it with CRDTs.