|
|
|
|
|
by Ygor
4055 days ago
|
|
Zombie ZooKeeper nodes that appear as healthy members of the cluster after an OOM is something that can cause major problems. There are two quick solutions on the operational side that can be deployed to prevent this: - Run each zk server node with the JVM OnOutOfMemoryError flag, e.g. like this: -XX:OnOutOfMemoryError="kill -9 %p" - Have your monitoring detect an OOM in the zookeeper.out log, and use your supervisor to restart the failing ZK node. ZooKeeper is designed to be fail fast, and any OOM should cause an immediate process shutdown, ofc continuing with an automatic start of a new process by whatever is supervising it. |
|