| Another FYI type comment I guess. :) Some of my more gripey statements here may be outdated info, so DYOR I guess. Dynamic reconfig in 3.5 addresses the "restarting every zookeeper instance" problem. [0] You stand up an initial quorum with seed config, then tie in new servers with "reconfig -add". Not sure how well it would tie into cloudy autoscaling stuff though. I wouldn't start there myself. A much bigger pain IMO is the handling of DNS in the official Java ZK client earlier than 3.4.13/3.5.5 (and by association, Curator, ZkClient, etc.). [1] The former was released mid 2018 and the latter this year, so tons of stuff out there that just won't find a host if IPs change. If you "own" all the clients it's maybe not a problem, but if you've got a lot of services owned by a ton of teams it's ... challenging. Even with the fix for ZOOKEEPER-2184 in place I'm pretty sure DNS lookups are only retried if a connect fails, so there's still the issue of IPs "swapping" unexpectedly at the wrong time in cloud environments which can lead to a ZK server in cluster A talking to a ZK server in cluster B (or worse: clients of cluster A talking to cluster B mistakenly thinking that they're talking to cluster A). I'm sure this problem's not unique to ZK though. Authentication helps prevent the worst-case scenarios, but I'm not sure if it helps from an uptime perspective. TL;DR: ZK in the cloud can get messy (even if you play it relatively "safe"). [0] https://zookeeper.apache.org/doc/r3.5.5/zookeeperReconfig.ht...
[1] https://issues.apache.org/jira/browse/ZOOKEEPER-2184 |