| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rystsov 1170 days ago

Strongly consistent protocols such as a Paxos and Raft always choose consistency over availability and when consistency isn't certain they refuse to answer.

Raft & Paxos: any number of nodes may be down, as soon as the majority is available a replicated system is available and doesn't lie.

Kafka as it's described in the post(): any number of nodes may be down, at most one power outage is allowed (loss of unsynced data), as soon as the majority is available a replicated system is available and doesn't lie.

The counter-example simulates a single power outage

() https://jack-vanlightly.com/blog/2023/4/24/why-apache-kafka-...

1 comments

judofyr 1170 days ago

It just feels like two widely different scenarios we're talking about here.

https://jack-vanlightly.com/blog/2023/4/24/why-apache-kafka-... talks about the case of a single failure and it shows how (a) Raft without fsync() loses ACK-ed messages and (b) Kafka without fsync() handles it fine.

This post on the other hand talks about a case where we have (a) one node being network partitioned, (b) the leader crashing, losing data, and combing back up again, all while (c) ZooKeeper doesn't catch that the leader crashed and elects another leader.

I think definitely the title/blurb should be updated to clarify that this is only in the "exceptional" case of >f failures.

I mean, the following paragraph seems completely misleading:

> Even the loss of power on a single node, resulting in local data loss of unsynchronized data, can lead to silent global data loss in a replicated system that does not use fsync, regardless of the replication protocol in use.

The next section (and the Kafka example) is talking about loss of power on a single node combined with another node being isolated. That's very different from just "loss of power on a single node".

link

rystsov 1170 days ago

We can't ignore or pretend that network partitioning doesn't happen. When people talk about choosing two out of CAP the real question is C or A because P is out of our control.

When we combine network partitioning with single local data suffix loss it either leads to a consistency violation or to a system being unavailable desperate the majority of the nodes being are up. At the moment Kafka chooses availability over consistency.

Also I read Kafka source and the role of network partitioning doesn't seem to be crucial. I suspect that it's also possible to cause similar problem with a single node power-outage https://twitter.com/rystsov/status/1641166637356417027 and unfortunate timing

link

comet-engine 1170 days ago

For what it’s worth, this form of loss wouldn’t be possible under KRaft since they (ironically?) use Raft for the metadata and elections. Ain’t nobody starting a new cluster with Zookeeper these days.

link

slt2021 1170 days ago

you are right, this is a failure of both zookeeper and leader node, so two independent failures at the same time

link

frant-hartm 1170 days ago

Exactly my thoughts.

link