Hacker News new | ask | show | jobs
by _benedict 315 days ago
Do you anywhere elaborate what you mean by leaderless, and how this affects the semantics and guarantees you offer?

So far as I understand both Kafka and Pulsar use (leader-based) consensus protocols to deliver some of their features and guarantees, so to match these you must either have developed a leaderless consensus protocol, or modify the guarantees you offer, or else have a leader-based consensus protocol you utilise still?

From one of your other answers, you mention you rely on Apache Bookkeeper, which appears to be leader-based?

I ask because I am aware of only one industry leaderless consensus protocol under development (and I am working on it), and it is always fun to hear about related work.

1 comments

Whoa a leaderless consensus protocol sounds pretty revolutionary!! So many question -- do you have any resources on this you could share?
Revolutionary may be an overstatement, it just affords different system characteristics. There's plenty of literature on the topic though, starting generally with EPaxos[1]. The protocol that we are developing is for Apache Cassandra, is called Accord[2], and forms the basis of our new distributed transaction feature [3]. I will note that the whitepaper linked in [3] is a bit out of date, and there was a bug in the protocol specification at that time. We hope to publish an updated paper in a proper venue in the near future.

[1] https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf [2] https://github.com/apache/cassandra-accord [3] https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-15...

https://www.vldb.org/pvldb/vol15/p1337-lee.pdf

Is this you also or total coincidence?

Not even a coincidence really, it's a very different kind of system. It's an implementation of Hermes with network layer integration. Hermes is designed with very different goals in mind, specifically within-DC consensus with minimal failures (with the caveat I am not intimately familiar):

- Every replica must acknowledge a write, which is undesirable in a WAN setting, due to having to wait for replies from the furthest region

- At most one concurrent "read-modify-write" operation may succeed, so peak throughput is limited by request latency

- Failure of any replica requires reconfiguration for any request to succeed (equivalent to leader election), so the leaderless property here does not improve tail latencies, indeed it is likely harmed by exposing your workload to more required reconfigurations

Cassandra is designed for multiple (usually quite far apart) DC deployments that want to maximise availability and minimise latency, and where failure is expected. Here a quorum system is typically preferable for request latency.