|
|
|
|
|
by Apocalypse_666
2459 days ago
|
|
After having just spend most of the day yesterday trying to nurse a failing Kubernetes cluster back to health (taking down all of our production websites in the process), I’ve come to completely loathe it. It felt more like practicing medicine than engineering: try different actions to suppress symptoms, instead of figuring out a root cause to actually prevent this from happening again. While it is a pretty sweet system if it does run, I would strongly advice against anyone trying to manage their own cluster, as it is simply to complex to debug on your own, and there is preciously little information out there to help you |
|
If only I could!! That’s exactly the frustrating part: there seems to be no way of grokking what goes on under the hood, and there are so many different ways of setting up a cluster and very few have any information about them online whatsoever.
As a practical example, what happened yesterday was that all of a sudden my pods could no longer resolve DNS lookups (took a while to figure out that that was what was going on, no fun when all your sites are down and customers are on the phone). Logging into the nodes, we found out about half of them had iptables disabled (but still worked somehow?). You try to figure out what’s going on, but there’s about 12 containers running in tandem to enable networking in the first place (what’s Calico again? KubeDNS? CoreDNS? I set it up a year ago, can’t remember now...) and no avail in Googling, because your setup is unique and nobody else was harebrained enough to set up their own cluster and blog about it. Commence the random sequence of commands I’ll never remember until by some miracle things seems to fix themselves. Now it’s just waiting for this to happen again, and being not one step closer to fixing it