| I want someone to check me on a strongly held belief, I promise I have an open mind: There’s nothing wrong with experimenting with k8s in your org. Nothing wrong with k8s, it’s complex yes but that’s not inherently problematic. But it’s been my experience over the last several jobs that many orgs are throwing mission critical workloads and hinging a lot of people’s productivity onto k8s and asking operators and engineers to ostensibly figure it out as they go. And from rants and vents I see on other tech communities, I’m not the only one making this observation. Which is sheer madness to me. Change my mind: unless you have probably, razor sharp engineers who know what they’re doing with k8s enough to give you more than a few days of availability before livened probes start falling off the face of the planet, AND can make this their sole focus, maybe you’re not ready for k8s? |
Running k8s at scale is very challenging due to its complexity and comes with a learning curve that you probably don't want to be ramping through in live-site production issues.
It's a big tool for a big problem that most people don't have.
I have seen k8s used in production 3 times: In a very slow, measured, long-term rollout on owned hardware, which was generally speaking quite successful and paid dividends against the previous home-rolled Ansible/Docker based solution with manual container scheduling by improving allocation, moving networking definition into a much more declarative / "shift-left" way where engineers would define their network topology directly, and improving insight into the system using off-the-shelf tools.
I've also seen k8s used in a very basic fashion on GKE in a mostly painless way - basically just send it and it works.
The worst k8s situation I've seen is one where a startup's GKE infrastructure was migrated into a self-hosted k8s cluster which was cobbled together and had never been scaled up before. Nobody understood the failure points of the system, trivial mistakes caused frequent outages, and as engineers lost faith in the system they started blaming k8s for application level issues. Diving headfirst into a complex system with a production workload is a recipe for pain.