Hacker News new | ask | show | jobs
by dvtrn 1386 days ago
I want someone to check me on a strongly held belief, I promise I have an open mind:

There’s nothing wrong with experimenting with k8s in your org. Nothing wrong with k8s, it’s complex yes but that’s not inherently problematic.

But it’s been my experience over the last several jobs that many orgs are throwing mission critical workloads and hinging a lot of people’s productivity onto k8s and asking operators and engineers to ostensibly figure it out as they go. And from rants and vents I see on other tech communities, I’m not the only one making this observation.

Which is sheer madness to me.

Change my mind: unless you have probably, razor sharp engineers who know what they’re doing with k8s enough to give you more than a few days of availability before livened probes start falling off the face of the planet, AND can make this their sole focus, maybe you’re not ready for k8s?

8 comments

Yes, I 100% agree with this take. Engineers need to understand how k8s works at least at a surface level to engineer their service topology, and whatever devops/SRE/ops team will support k8s needs to scale up on test workloads and simulated production workloads and understand how to debug the system before it can be operated at scale.

Running k8s at scale is very challenging due to its complexity and comes with a learning curve that you probably don't want to be ramping through in live-site production issues.

It's a big tool for a big problem that most people don't have.

I have seen k8s used in production 3 times: In a very slow, measured, long-term rollout on owned hardware, which was generally speaking quite successful and paid dividends against the previous home-rolled Ansible/Docker based solution with manual container scheduling by improving allocation, moving networking definition into a much more declarative / "shift-left" way where engineers would define their network topology directly, and improving insight into the system using off-the-shelf tools.

I've also seen k8s used in a very basic fashion on GKE in a mostly painless way - basically just send it and it works.

The worst k8s situation I've seen is one where a startup's GKE infrastructure was migrated into a self-hosted k8s cluster which was cobbled together and had never been scaled up before. Nobody understood the failure points of the system, trivial mistakes caused frequent outages, and as engineers lost faith in the system they started blaming k8s for application level issues. Diving headfirst into a complex system with a production workload is a recipe for pain.

> It's a big tool for a big problem that most people don't have.

A thousand times yes. To a lesser extent, I would apply this to containerization generally.

If your needs are met by some Ansible/Chef/Puppet plays/recipes on a VPS, then you should embrace the simplicity and immediate flexibility you have, and just go that route.

I have seen so much relative over engineering to run simple workloads in containers that never needed to be scaled like that in the first place.

I would take this a step further to say that an extremely high % (like 95%) of organizations using k8s do not need it and would be much better off not using it.
k8s is really just a lot of Linuxisms put together in semi-novel, intelligent, and pluggable way. That list of features u/orf posted is all possible with plain Linux. You can start really minimal k8s, almost like plain containers on Linux. But going from one machine and one container/workload to literally 100k containers, 100s of machines and workloads working together in secure ways is really difficult without the organization that k8s APIs, interfaces, automations, and concepts impose on Linux.

I've helped an 80% new team survive Black Friday and make $10M revenue in 3 hours in a 6 week engagement. I helped a 100% new team build their first CI/CD system to build/run/test their aircraft embedded system in containers. I’ve helped credit card companies scale dozens of clusters to 150k services under management backed by over 250k containers, per cluster.

And it’s the same code that will run on a 10 year old PC or RPi3B as a single instance. It really is a story about the power of Linux.

I think you have a lot of people coming into the general backend/business-logic tech space who are not nearly enough of "computer people" or engineering-minded to actually be responsible for the uptime of a mission critical workload. They might think they are, but they don't know what it actually takes. (Your "operators and engineers", ostensibly, CAN handle this, but they need to be up to the task too.)

K8s among other things is a way of trying to abstract away the most computer/OS/network-literate parts of a software organization, so that you can be productive, as an org, hiring some computer people along with plenty of the readily-available "smart people who can write code." The former keeps your software running, the latter represent your business as code. It's a way of solving for the hiring market you HAVE, rather than the hiring market of 20 yrs ago. But it hinges on those deep-tech people being up to the task.

I don't think it's just orgs, as there are a lot of engineers that seem to have bought into the `k8s == awesome` mentality. I like Kubernetes and I think it has a lot to recommend it, but it's not a single silver bullet for all problems.
> But it’s been my experience over the last several jobs that many orgs are throwing mission critical workloads and hinging a lot of people’s productivity onto <INSERT_TECH_STACK_HERE> and asking operators and engineers to ostensibly figure it out as they go.

Some FANGs even pride themselves on how they get their army of inexperienced SDEs that on average stay 2 to 3 years with the company to JIT-onboard onto anything.

I think people want to pretend that deployment best practices "don't matter" with k8s but of course they do

As a start, see how many people just take the biggest docker image they can find, and install several irrelevant packages to run their service. Don't optimize/limit memory or CPU usage. Don't have a service that follows the 12-factor-app not even at the basic level (like fail fast, etc)

Complexity is inherently problematic - where does the opposite idea stem from?
Complexity is inherently problematic

I wouldn't necessarily agree, but curious to hear this thought developed some more, if you'd be willing to indulge me? Maybe I'm missing some implied context here that you have experience with that could change my mind.

where does the opposite idea stem from?

Good question. For me personally, it stems from my subscription to the dichotomy of accidental vs. essential complexity as written about here (and other places) https://simplicable.com/new/accidental-complexity-vs-essenti...

Sometimes complexity is unavoidable, but I don't conflate something being complex with a thing being complicated. Complicated systems I would agree, are inherently problematic.

(also, your username just made me very hungry for a falafel sandwich. Excuse me for a moment)

Our ability to build, understand, analyse, evolve, secure and troubleshoot systems depends on our ability to reason about and understand them. The more complex a system is the less we are able to do the aforementioned things.
This is a wonderful point, and it doesn’t mean that some people can’t hold “all of k8s” in their heads. At least theoretically some can, most can’t.

The simpler a system is, typically the easier it is to explain and maintain. In many businesses that takes a front seat. Some do need this level of complexity though. They’re both valid when applied appropriately.