|
|
|
|
|
by treffer
869 days ago
|
|
I am somehow puzzled on how this goes into a downtime as implied in the article. The control plane can be fully down - as in: you can shut it down - and everything continues to run. I've been in that situation multiple times with large clusters. E.g. one etcd node having disk issues, pretty much turning it into a lame service (even worse than true down). Kubelets got randomly regarded as non-healthy due to update latency. But everything continued to run. Another time API servers were leaking memory, and with that crashing, causing some herding that would crash each server as it comes up. No issues whatsoever, migrate to larger instances. It's a pretty cool feature of kubernetes. I am wondering what was done to let this cascade like this. The only thing I could imagine is that someone _wiped_ the etcd state, then brought it up, casing all things to go down. |
|