| Out of interest, what was wrong with it and how did you fix it? In 4 years I've never came across a cluster I was unable to fix, nor has it really broken without someone taking an unadvisable action on it. This may simply be because I started early enough that I was forced to manually configure the components and thus understand the underlying system well enough. Over time I have seen some interesting things though: - Changing the overlay network on running servers probably the silliest thing I've done. This wasn't on production, but figuring out where all the files are and deleting them was something pretty undocumented. - A few years back somebody ran a HA cluster without setting it as HA which resulted in occasional races where services keep changing IP addresses. I believe the ability to do this was patched out. - An upgrade caused a doubling of all pods once. This was back when deployments were alpha/beta and they changed how they were references in the underlying system, causing deployments to forget their replicasets, etc. Overall though, in 4 years I've spent very little time debugging clusters and more time debugging apps, which is what we want. |
You’re basically saying “the tool X is fine, you’re just inexperienced/undisciplined and using it wrong”. Which is fair critique if I was an intern, but I have a decade+ experience in development and operations and I look at kubernetes in disbelief - why should things be that complicated? I get it, everything is pluggable and configurable, but surely this must be balanced out by making it more approachable and convenient?
You can’t sneeze in kubernetes without it requiring you to generate some ssl certs to the point where it’s just cargo-culture without any consideration of purpose and security.
And what’s up with dozens and dozens of bloated yamls and golang files? The fresh 30-odd commits ”official” flink operator is 3 THOUSAND lines of Go and 5 THOUSAND lines of yamls. How is that reasonable? In which universe is that reasonable? all it does is a for-loop that overwrites a bunch of pods to keep their spec in sync with desired config. There’s like 1000:1 boilerplate ratio in kubernetes and it’s considered good somehow?
Sorry for the rant, I’m just angry that we’re six decades into software engineering and the newest hottest project I the newest hottest line of work behaves like everybody should be paid per line of code they produce.