Hacker News new | ask | show | jobs
by TheHydroImpulse 3563 days ago
But co-locating them won't actually remove a class of errors because Zk is not HA. The Kafka brokers need to communicate with the leader in the Zk cluster.

If we have K1,Z1 -- K2,Z2 -- K3,Z3 -- and one node goes down, you've now taken down both a broker and a Zk node. Remember, the brokers don't care about connecting to any Zk node, they want the leader. So you aren't gaining any more fault tolerant by co-locating them.

If there's a network partition between the leader Zk node and other nodes, the local Kafka broker won't actually be able to do much because the Zk cluster will elect a new leader, on another node, so again, you aren't gaining anything.

Moreover, you're now tying the scalability of Kafka with Zk. Zk doesn't scale linearly, so there's only so many nodes you may have in a cluster. Kafka, on the other hand, scales linearly. So if you're colocating them and you have to bump up Kafka, do you still start up Zk for those nodes (but they don't actually join the cluster)? You're now special casing and adding more edge cases.