Hacker News new | ask | show | jobs
by nostrebored 869 days ago
My question when looking at Kubernetes for small teams is always the same. Why?

In the blog, there are multiple days of downtime, a complete cluster rebuild, a description of how individual experts have to be crowned as the technology is too complex to jump in and out of in any real production environment, handling versioning of helm and k8s, a description of managing the underlying scripts to rebuild for disaster (I'm assuming there's a data persistence/backup step here that goes unmentioned!), and on, and on and on.

When you're already using cloud primitives, why not use your existing expertise there, their serverless offerings, and learn the IaC tooling of choice for that provider?

Yes it will be more expensive on your cloud bell. But when you measure the TCO, is it really?

10 comments

My experience with Kubernetes has been mostly bad. I always see an explosion of complexity and there is something that needs fixing all the time. The knowledge required comes on top of the existing stack.

Maybe I'm biased and just have the wrong kind of projects, but so far everything I encountered could be built with a simple tech stack on virtual or native hardware. A reverse proxy/webserver, some frontend library/framework, a backend, database, maybe some queues/logs/caching solutions on any server Linux distribution. Maintenance is minimal, dirt cheap, no vendor lock-in and easy to teach. Is everyone building the next Amazon/Netflix/Goole and needs to scale to infinity? I feel there is such a huge amount of software and companies that will never require or benefit from Kubernetes.

Company CTOs in my experience get sold very easily the idea of infinite scalability. In practice not many companies reach that point, but many that go down this road have to build on top of dozens of layers of compute/networking abstractions that only few experts on the team can manage, if any, competently.

I think the cost of self-managed Linux VMs and monoliths is smaller than the cloud vendors made it seem.

Containers are nice when you have to deal with a language like Python and it's packaging ecosystem, but when Go/Rust/.Net/etc binaries are placed in containers as well... I think sight of what we're trying to solve in real life has been kind of lost.

Monoliths are so much easier for smaller teams. No additional tooling needed, no service discovery, instead of networks calls you have function calls, can share resources, etc. Much less overhead as well, so you may not even need to scale. The amount of requests a single Go/Rust server can handle on a dedicated machine is insanely high with modern hardware.
Same exact question I ask every single time. We just decided against k8s, again, in 2024. We are going to go with AWS ECS and Azure Container Apps (the infra has to exist in both clouds).

ECS and Container Apps provides all the benefits of k8s without the cons. What we want is a to be able to execute container (Docker) images with autoscaling and control which group of instances can talk to each other. What we do not want to do:

- learn all of the error modes of k8s

- learn all the network modes of k8s

- learn the tooling of k8s (and the pitfalls)

- learn how to embed yaml into yaml the right way (I have seen some of the tools are doing this)

- do upgrades of k8s and figuring out what has changed the way that is backward incompatible

- learn how to manage certificates for k8s the right way

- learn how to debug DNS issues in a distributed system (https://github.com/kubernetes/kubernetes/issues/110550 and many more)

I could go on and on but many people and companies figured out the hard way that k8s complexity is not justified.

Do people try to push it that strongly for small teams? Lots of us work on bigger teams and enjoy more of the benefits.

However, I also still use Kubernetes for my personal projects, because I really appreciate the level of abstraction it supplies. Everyone always points out that you can do all the things k8s does in other ways, but what I like about it defines a common way to do everything. I don't care that there are 50 ways to do it, I just like having one way.

What this allows is for tools to seamlessly work together. It is trivial to have all sorts of cool functionality with minimal configuration.

> because I really appreciate the level of abstraction it supplies

which are?

I am seriously asking. I use docker-compose of some of the things I do but it never occured to me during my 20 years in systems engineering that k8s offers any kind of great abstraction. For small systems it is easy to use docker (for example running a database for testing). For larger projects there are so many aternatives to k8s that are better, including the major cloud vendor offerings that I have really a hard time justifying even to consider k8s. After years of carnage that they left, seeing failures after failures, even customers reaching out to me in panic to help them because there are timeouts or other issues that nobody can resolve after selling them the idea that k8s has "great level of abstraction" and putting it to production.

> I don't care that there are 50 ways to do it, I just like having one way.

Seeing everything as a nail...

>> because I really appreciate the level of abstraction it supplies

> which are?

When I am creating a new service/application, I just need to define in my resource what I need... listening ports, persistent storage, CPU, memory, ingress, etc... then I am free to change how those are provided without having to change the app. If a new, better, storage provider comes along, I can switch it out without changing anything on my app.

At my work, we have on premise clusters as well as cloud clusters, and I can move my workloads between them seamlessly. In the cloud, we use EBS backed volumes, but my app doesn't need to care. On the on-prem clusters, we use longhorn, but again my app doesn't care. In AWS, we use the ELB as our ingress, but my app doesn't care... on prem, I use metallb, but my app doesn't care.

I just specify that I need a cert and a URL, and each cluster is set up to update DNS and get me a cert. I don't have to worry about DNS or certs expiring. When I deploy my app to a different cluster, that all gets updated automatically.

I also get monitoring for free. Prometheus knows how to discover my services and gather metrics, no matter where I deploy. For log processing, when a new tool comes out, I can plug it in with a few lines of configuration.

The kubernetes resource model provides a standard way to define my stuff. Other services know how to read that resource model and interact with it. If I need something different, I can create my own CRD and controller.

I am able to run a database using a cluster controller with my on prem cluster without having to manage individual nodes. Anyone who has run a database cluster manually knows hardware maintenance or failure is a whole thing... with controllers and k8s nodes, I just need to use node drain and my controller will know how to move the cluster members to different nodes. I can update and upgrade the hardware without having to do anything special. Hardware patching is way easier.

The k8s model forces you to specify how your service should handle node failure, and nodes coming in or out are built into the model from the beginning. It forces you to think about horizontal scaling, failover, and maintenance from the beginning, and gives a standard way for it to work. When you do a node drain, every single app deployed to the cluster knows what to do, and the maintainer doesn't have to think about it.

>> I don't care that there are 50 ways to do it, I just like having one way.

> Seeing everything as a nail...

I don't think that is a fair comparison, because you can create CRDs if your model doesn't fit any existing resource. However, even when you create a CRD, it is still a standard resource that hooks into all of the k8s lifecycle management, and you become part of that ecosystem.

> listening ports, persistent storage, CPU, memory, ingress

These exists without k8s. I do not need a compex abstraction hiding the ways I need to talk to persistent storage. If fact, I believe it is impossible to create such abstraction without very serious compromises.

> In AWS, we use the ELB as our ingress, but my app doesn't care

Your app does not care without k8s. Running python -m http.server does not even know what ELB is. I get it though. You are using k8s as IaC.

> These exists without k8s.

That was exactly my earlier point... of course you can do everything in k8s in other ways, but in the end you have to pick ONE way your company/team is going to do it... why not pick a well defined way, that new hires can already know, that has a ton of tooling available, and works together cohesively?

Yes, I can build each part myself, but why?

> Your app does not care without k8s. Running python -m http.server does not even know what ELB is. I get it though. You are using k8s as IaC.

Sure, but I still need a way to deploy my app, and to move it to a different location when I do hardware maintenance, and a way to get a DNS address that routes to my app.

At my shop, using k8s, I can deploy a brand new service, with a cert, a url, and a place to run it, in a few minutes. I don't have to talk to anyone, I don't have to use any other tools or have to click on any buttons, i just helm install or kubectl apply and my service is running. I don't have to ask the datacenter ops people to find me a server, or get budget for a new AWS instance. I can deploy to an existing cluster and use a small bit of the infrastructure. I don't have to scale my individual service, I can scale the whole cluster for all services.

It is just so much easier to be a developer in this world.

> I do not need a compex abstraction hiding the ways I need to talk to persistent storage. If fact, I believe it is impossible to create such abstraction without very serious compromises.

That's pretty interesting take considering EBS is itself a block device abstraction over network attached storage and pretty complex at that too with a huge price premium

I mean, the file system istelf is an abstraction.
Yes. Whenever I look at a company with less than 20 people with EKS in their stack, I don't go any further. It is such a colossal waste of velocity for a small business or early startup.

As someone who is very pro cloud -- one of my worst experiences working at a cloud provider was a push from on high to sell our customers on a 'cloud modernization initative' that centered on managed kubernetes. At the time, most of my customers were struggling with creating a stateless app, much less horizontal scaling and managing an enterprise-grade compute abstraction layer.

I think K8S is a great tool with a dedicated team and a platform built around it to meet the way that your company ships infrastructure. But what I've just mentioned only makes sense fiscally in the high X00's count or more of engineers.

> Do people try to push it that strongly for small teams?

Yes. You have to understand that a lot of people without the benefit of experience will often base their technology choices on blog posts. K8S has a lot of mindshare and blog attention, so it gets seen as the only way to run a container in a production environment, while all the important aspects of it are ignored.

I get that, but I just get frustrated in the same way I get frustrated with all the "you don't need it" responses to any topic... what about all of us that DO work for bigger companies and DO need to use this stuff? Where can we gather to talk about it without being constantly told we don't need the features?
They don’t read those blogs. And if they do, the decision makers have enough experience to know that “your dog blog doesn’t need k8s” doesn’t apply to their 100000 mau app
I am literally one of the decision makers at a larger company, with more than 10000 servers in hundreds of data centers around the world.

Yes, I am experienced and smart enough to know the statements that don't apply to me. My frustration is that I want to discuss the best tools and techniques the industry is exploring, but every time I start to have those conversations, someone comments that I don't need it.

You're in the wrong spaces. I don't know where you should be to have those conversations, but I imagine it involves (social / interpersonal) networking. You need to be talking to people in the same role or at the same level as you.

Places like hacker news, or reddit, or twitter, are all full of random people, many of whom are just beginning their journey. Recommending multi node orchestration when they'd struggle to get nginx running on it's own, would be inappropriate. They don't need k8s. There's a significant danger of cargo culting here.

> I also still use Kubernetes for my personal projects

Of course. You're going with what you know for the foundation, then building the thing you're interested in (personal projects) on top of that.

It's a good way of using your time efficiently. Nothing at all wrong with that.

This. It's the npm install 100 packages and do everything with JS vs Rails arguments all over again.
> But when you measure the TCO, is it really?

I'm in the ML space and every small company I try to avoid EKS. Then I hate my life. Sagemaker, for example, is a giant abstracted away mess with random holes (ie: these types of jobs don't work on this GPU type, etc.) compared to just running things on EKS. The same goes to trying to deploy a more complex third party application. I could just deploy their Helm chart or I could spend a lot of time deploying it somehow in our environment.

I had to sell Sagemaker and I agree. It is the wrong abstraction layer without the right escape hatches.

I am super pro Ray for handling these types of workloads now. Huge shout out to anyone here working on maintaining that project.

I see it all the time at different layers of the stack. At some point some knowledge is lost due to people turnover and the solution is to change the technology, instead of paying somebody full time to re-understand it. Why not rewrite XXX part in YYY language as nobody understands XXX anymore ? Linux VMs require a good sysadmin with a taste in digging into existing scripts and playbooks. With Kubernetes we can start from scratch and we only needs a kubernetes expert ! (or so they say).

Right now I'm working a lot with an oversized maven configuration that nobody understands ; I'm paid only to dig into it and maybe refactor some parts. It's made way too complicated for the task and does a lot of non-standard stuff to work around problems it created itself. But when I arrived people were blaming jenkins and wanted to move to gitlab because jenkins was becoming too complicated to work around maven (also !). Next thing you know somebody could try kubernetes or moving to the cloud or switching from RedHat to NixOS or whatever, and the problem would still be maven.

k8s is half-baked at best but people enjoy copy-paste yaml recipes, which half-baked products lend themselves to, so it is loved
I work for a US subsidiary of a very large oil company. We are migrating from Azure to AWS for many things (it is deemed "OneCloud"). A very large number of our new EC2 instances, and even our EKS instances, were provisioned within the last 6 months as T2 instances. Some, if we were lucky, were T3. T3 was released 10 years ago. Copy + paste indeed.
Think of the cost savings though!
I would think it's more dependent on technology requirements more than the size of the team. If all you need is some variation of LAMP stack, then you'd probably be better off with a paas like render, fly or the like.
Totally a valid point!

I think size of team matters as the impact of k8s ownership as a fraction of your development velocity changes immensely as you're able to afford a platform team who can build tooling to remove the cognitive load of deploying to and managing k8s. At an ~400 engineer company I worked at, k8s bugs that actually impacted our team were in the single digits over a year, but a large part of that was the platform team that managed the ecosystem around k8s deployments.

Especially considering that the author seems to be using some Azure specific features anyway:

> While being vendor-agnostic is a great idea, for us, it came with a high opportunity cost. After a while, we decided to go all-in on AKS-related Azure products, like the container registry, security scanning, auth, etc. For us, this resulted in an improved developer experience, simplified security ( centralized access management with Azure Entra Id), and more, which led to faster time-to-market and reduced costs (volume benefits).

We're starting to use k8s as a small team because the simpler offerings with GPUs available don't meet our needs. It's clear they're either built for someone else or are less reliable than an EKS cluster would be.
I'd encourage you to look at the problem space and evaluate if ECS or an external abstraction layer (like Ray) meets your needs.

I've seen both work in completely separate domains (e.g. inference on real time video streams vs. model building) -- but obviously ymmv, tech is a big domain and pretending I understand exactly what you're doing would be silly. Sometimes there is a real answer to the why!

Ah, well today I learned about ECS. I guess we’ll migrate to that once I need to add complexity to our EKS setup.

I’m new to this stuff, so it’s hard to dig through all of the possible different solutions.

I looked into Ray a bit but it seemed a little too complicated vs. just running a CUDA accelerated docker container. Most of the streamlined solutions in this space are not made for full stack web developers deploying a service that happens to need a GPU. They’re for ML devs who are trying to own the production side of their part of the product.

Our setup works very very well.

And in smaller setups you would have a shared cluster or fully managed like gke etc.