Hacker News new | ask | show | jobs
by daenney 3171 days ago
There's also RobustIRC that's pretty interesting[0]. It bills itself as "IRC without netsplits" and uses Raft-style concensus underneath (which in this case you're getting through etcd instead).

[0]: https://robustirc.net/

2 comments

Netsplits on IRC are such a visceral experience of distributed consensus failure. I remember as a little teenager on IRC being like "omg what is happening, why are there parallel worlds where weird things happen and everything drifts in and out?!"
I'm not convinced this is a solution to the problem though.

As you mentioned with traditional IRC during actual netsplit (not single node disconnecting, but two large networks separating) you will see two networks that will get out of sync with time. In this case (assuming it is implemented correctly) the etcd will just shut down and not allow anyone use it until netsplit ends.

Another issue here is that consensus protocols (raft, paxos etc) expect to have low latency. It doesn't make much sense distribute them geographically, and that will only result in a bad performance.

The biggest issue with Netsplits is getting out of sync, and the syncing back. Many networks solve this by ownership and channel/nick services.

I kind of like how IRCNet solved this problem without involving services:

- nick conflict - each user gets an unique numeric ID based on server id and connection id, this ID is guaranteed to be unique even during netsplit. On nick collision, servers just reset the nick to that ID

- channels - they actually had two solutions !channels which are designed in somewhat similar way, where it receives an unique ID and after netsplit one might end up with 2 channels with a different ID. Second solution is +R flag, which only works if there's no netsplit and if no one has an operator status it will grant one person matching one of the masks there.

I like that approach, because it tries to solve the problem without central authority, it's still arguable what is better.

Egalitarian Paxos (EPaxos) has much better WAN performance than previous protocols.

  https://www.cs.cmu.edu/~dga/papers/epaxos-sosp2013.pdf
Most real-world Paxos implementations, like Raft (what etcd uses), simply use Paxos for electing a leader. Afterward, all updates are serialized through the leader. This was a performance optimization, but it works well only on LANs; it sucks for WANs because you've at least doubled the round-trip time for every commit and the leader becomes a bottleneck, so it's especially important to have low latency.

EPaxos was published before Raft and I couldn't find any benchmarks comparing them directly, but in the paper above EPaxos outperforms Mencius (a rotating single leader design) even within a single EC2 cluster. And EPaxos maintains much more consistent performance during latency spikes. So EPaxos may help close the performance gap between LAN and WAN clusters. In the context of IRC where you're geographically distributed _anyhow_, something like EPaxos may impose no appreciable penalty at all, at least if there's sufficient redundancy to ensure a quorum.

> Most real-world Paxos implementations, like Raft (what etcd uses), simply use Paxos for electing a leader.

Ah, the ambiguity of natural language. Just to make this absolutely clear: Raft is not an implementation of Paxos. The "like" here refers to the "electing a leader" bit.

(I assume you already know this, it's just that the sentence as written is ambiguous and I wanted to clarify. I initially read it the wrong way and felt an urge to 'correct' you.)

Yeah, that's a good point. Federated networks like IRC have their benefits. Most chatting doesn't need total consensus. Disordered messages are sometimes confusing but mostly harmless.
So do they just lock the channel with detectable partition conditions for the minority?