Hacker News new | ask | show | jobs
by takeda 3185 days ago
> For example, it's extremely easy to destroy a Kafka cluster by bringing a new, empty ZK server online with newer but incorrect data in its volume. ZK will happily trash the entire cluster thinking it has new instructions.

How does that happen? I mean a new, empty ZK server with never data than the rest of the cluster?

Also, please note that ZK is not meant to be a database, but a coordination service, it's guarantee is to have all nodes being always in consistent state and neither of its nodes allow to make any changes if there's no quorum. So if a new node somehow has more recent data with higher serial number it's expected that remaining nodes will sync to that.

1 comments

Exactly right - in my case the situation was another team accidentally bringing a new ZK node with "bad" but "new" data online. Had there been network isolation, no issues. Had there been static cluster identifiers, also no issues. It was a messy environment, and it should have been prevented by operational diligence, but my point is redis is "harder to mess up". As on on-call engineer, I'll always go with simpler, foolproof tools. Another qibble is how gnarly the client-side driver for Kafka is...

I don't hate Kafka, I just don't like ZK and find redis has better tooling and a better track record at my shops :)

In order to connect a ZK host to the cluster its IP needs to be included in configuration of all the nodes.

It's hard to accidentally add node to a cluster. A person who can "accidentally" add a ZK node has enough permission to do a lot of more devastating things accidentally.

Yep. All it takes is service discovery and a not-totally-familiar with ZK jr. sysadmin.

This is all in service to my point about simplicity and safety.