|
|
|
|
|
by rystsov
1122 days ago
|
|
Strongly consistent protocols such as a Paxos and Raft always choose consistency over availability and when consistency isn't certain they refuse to answer. Raft & Paxos: any number of nodes may be down, as soon as the majority is available a replicated system is available and doesn't lie. Kafka as it's described in the post(): any number of nodes may be down, at most one power outage is allowed (loss of unsynced data), as soon as the majority is available a replicated system is available and doesn't lie. The counter-example simulates a single power outage () https://jack-vanlightly.com/blog/2023/4/24/why-apache-kafka-... |
|
https://jack-vanlightly.com/blog/2023/4/24/why-apache-kafka-... talks about the case of a single failure and it shows how (a) Raft without fsync() loses ACK-ed messages and (b) Kafka without fsync() handles it fine.
This post on the other hand talks about a case where we have (a) one node being network partitioned, (b) the leader crashing, losing data, and combing back up again, all while (c) ZooKeeper doesn't catch that the leader crashed and elects another leader.
I think definitely the title/blurb should be updated to clarify that this is only in the "exceptional" case of >f failures.
I mean, the following paragraph seems completely misleading:
> Even the loss of power on a single node, resulting in local data loss of unsynchronized data, can lead to silent global data loss in a replicated system that does not use fsync, regardless of the replication protocol in use.
The next section (and the Kafka example) is talking about loss of power on a single node combined with another node being isolated. That's very different from just "loss of power on a single node".