|
|
|
|
|
by ocadaruma
1135 days ago
|
|
As far as read the blog post, I understand that it assumes the scenario that "a replica dies (and loses its log prefix due to no fcyns) and came back instantaneously (before another replica catches up to the leader)". Then, in Kafka, what if the leader dies with power failure and came back instantaneously? i.e.:
Let's say there are 3 replicas A(L), B(F), C(F) (L = leader, F = follower) - 1) append a message to A - 2) B, C replicas the message. The message is committed - 3) A dies and came back instantaneously before zk.session.timeout elapsed (i.e. no leadership failover happens), with losing its log prefix due to no fsync Then B, C truncates the log and the committed message could be lost? Or is there any additional safety mechanism for this scenario? |
|
One safety mechanism I can think of is that the replicas will detect the leader is down and trigger leader election themselves. Or that upon restart the leader realized it restarted and triggers leader election in a way that B ends up as the leader. (not sure either is being done)
As I think about it more, even if there’s a solution I think I’ll stick to running Redpanda or running Kafka with fsync.