|
|
|
|
|
by jvanlightly
1135 days ago
|
|
It's a common misconception about Kafka and fsyncs. But the Kafka replication protocol has a recovery mechanism, much in the same way that Viewstamped Replication Revisited does (except it's safer due to the page cache), which allows Kafka to write to disk asynchronously. The trade-off is that we need fault domains (AZs in the cloud), but if we care about durability and availability, we should be deploying across AZs anyway. We've seen plenty of full region outages, but zero power loss events in multiple AZs in six years. Kafka and fsyncs: https://jack-vanlightly.com/blog/2023/4/24/why-apache-kafka-... |
|
Then, in Kafka, what if the leader dies with power failure and came back instantaneously?
i.e.: Let's say there are 3 replicas A(L), B(F), C(F) (L = leader, F = follower)
- 1) append a message to A
- 2) B, C replicas the message. The message is committed
- 3) A dies and came back instantaneously before zk.session.timeout elapsed (i.e. no leadership failover happens), with losing its log prefix due to no fsync
Then B, C truncates the log and the committed message could be lost? Or is there any additional safety mechanism for this scenario?