These costs are highly inflated. Not sure why you need 4 people to operate a small 6 node cluster. From my own personal experience, one guy can do that part time. Your cluster has redundancy, so most problems can wait to be dealt with during regular hours.
Having 4 people is actually not enough I would argue.
BUT!
That's because I advocate for people being able to go on vacations without worrying about work and to be able to not be on-call 24/7/365.
I.e. you don't need them full time and you also need them for all other options, because you need to cover on-call for anything you use. Nobody at AWS or Google will so much as blink when your precious infrastructure blows up on you and you don't know what to do. You need brains to work on those things yourself. You want to only have people on-call every 5 or 6 weeks.
How exactly to calculate the percentages on those salaries, that I am not sure of, but it definitely isn't a full time job to babysit infrastructure like that with 4 full-time brains doing nothing but watching your monitoring dashboards!
As someone who operates four medium-sized (20-200 nodes) clusters as a very small part of my job on a two-person team, I agree. I spend most of my days writing new operators to automate infrastructure, not manually managing kube... said team also own several large kafka brokers handling several billion messages a day, the full ci/cd pipeline, logging/metrics/tracing stack, a sharded postgres operator, authn/z, terraform automation, etc., etc.
We need more people, but kube is the least of our headaches.
As a counterpoint from my own personal experience, time spent managing kubernetes is time NOT spent introducing new features.
I once had the displeasure of watching an upstart data science team at a boutique portfolio management firm break under the weight of k8s management. The team was great at critical thinking, risk-modeling, and statistical analysis. They knew very little about infrastructure, such as networking fundamentals or CPU/Memory management. This team went from helping our firm rapidly develop a sentiment analysis model that sifts through social media for trading signals to wrestling with kubectl all day. Productivity bombed. Team was disbanded about a year after they started using k8s.
It's a shame the author emphasized and inflated the discrete costs of operating a cluster because that emphasis and exaggeration distracts from the TRUE cost ... opportunity cost. We want our engineering teams doing what they do best, writing code and adding new features. Not bogged down in managing the plumbing/infrastructure. Happy to have that abstracted away.
Why do I need four people for on call with a self hosted cluster but only one for a managed cluster? My application still needs the on call support.
I think this cost estimate does not add up.
You're right, having only one person on call for a managed cluster doesn't make a lot of sense. We should probably have planned with at least 2 people for a managed cluster too to cover 24/7/365 operations.
I think our thought process here is that developers are also involved in on call support for the service availability and the k8s cluster availability is mostly managed by the provider, but the cluster can still fail even if the control plane is managed.
Self managed cluster needs networking, some kind of persistent volume storage and the nodes themselves need to be somewhat maintained.
I think you could get one person to be on-call for all those things, personally. But then I think that person should not be on-call for application support (IE; not the things running inside k8s, they would be the person the on-call application developer/administrator would call if they couldn’t debug issues with networking, for instance).
> For all these offerings [GKE, EKS, AKS], there are no automatic version updates or auto-recovery and you still need to pay for the computing resources like CPU, memory, and ephemeral storage that your worker pods consume.
I don’t know about the others, but GKE certainly has automatic updates (for masters and nodes). There’s also auto-repair and a backup feature (is this “auto-recovery”?). GKE has also had autopilot for nearly a year, which bills based on Pod requests.
This comes off as a bit of a FUD piece to sell the idea of serverless, like what Microsoft did with their Linux TCO play in the 2000s.
When Google was just Larry and Sergey, they didn't need an additional SRE person to help them set up and maintain their linux servers - they had enough competence to do so themselves. For someone else, renting a managed Windows server with a support agreement would be exactly the right choice.
Likewise, the 1xSRE for managed and 4xSRE for unmanaged K8s isn't substantiated. In my experience, that can be anything from 0 dedicated people and up depending on needs and circumstances. I imagine things like GKE Autopilot (auto-updates) and Fargate for EKS (serverless K8s pods) make it even easier.
It would be more interesting to me to see how their offering compares to other serverless vendors in price and features.
The AKS control plane is provided cost free. And you can run stateless applications in k8s for years without them needing any maintenance.
I agree that almost nothing in the article is realistic, even if I have reached a similar conclusion in different ways.
- Kubernetes is easy if you run a big number of stateless applications.There are easier ways to do that in the cloud thought. I like the pattern of running the database in whatever way the DBA is used to (probably in a VM in which they ssh/RDP into).
- Kubernetes is great if you deploy your product for clients in different clouds. You tell them we need a k8s cluster of API version XYZ and their staff provision one for you. That way you can support Azure, GCP, Oracle and AWS without having to learn many of their APIs. That abstraction is leaky, hopefully your clients can setup the ingress controllers and storage correctly.
- If you are running your app for your own org in the cloud and you don't intend to move, you probably don't need the extra layers of complexity. Running managed k8s is a cloud API over a cloud API.
- If you don't run anything stateful in kubernetes its much easier - there is plenty of SaaS you can use, for example: RDS for DB, Datadog for monitoring, email providers, managed redis. Use kubernetes only for code you've written and understand it's behaviour. Don't be tempted to many things with helm, just the bare essentials.
- Kubernetes is great if you are a really large organization, because if you standardise on it you standardise infrastructurele, and you can move engineers between projects. And can have an ops silo again! /s
- If you are a big org with a DevOps team, you can create a custom Paas over kubernetes. I.e. when a dev runs make deploy staging1 that actually wraps kubectl and magically builds,tests and deploys the current code.
Meaningless analysis. If you're the kind of org where running a 6 node cluster needs 4 people performing continuous ops, you're not going to succeed. Good luck.
I believe this is nonesense and most of all clickbait....
Most clients I encounter have more need to be able to dynamically run their workloads than the problem that it costs to administer their clusters. Tools like crossplane, argo and a lot of other make it simple to - almost - run your clusters on autopilot.
Kubernetes is not cheap, but being your "own hoster" never really was, but i can tell you what is going to be expensive for you: vendor lockin by a none open source service and also putting all of your eggs into one basket. Even if that basket claims to have servers internationally.
Just look what happened whith the creation of the cluster API, we are moving into a future where all you need to truly autopilot your workloads is a little bit of <insert language of your choice>. I just dont get that in a time where everything can easily be automated from your code commit, over testing, to canary deployments and rollbacks, with a community that delivers top of the line code on a constant basis, for free, could even be compared to what i would consider more or less shared hosting.
Stating that you need a 4 person Devops team for a 6 node cluster is just sad. Also you seem to be replacing your nodes anually too so that makes extra sense =)
I get that you like to market your product, but as someone doing freelance devops for while now, please dont be ridiculous.
I think they could get around the problem of requiring a third-party cluster manager if they really thought through the chicken-and-egg problems, and concentrated the circular dependencies into a very small core.
I’ve had the experience of working through the design of a system over and over until circular dependencies can be tamed through a bootstrap procedure.
We focused on estimating the minimum/entry-level cost of Kubernetes here.
If you have a data intensive service, it surely would add up, but it's not specific to Kubernetes. If you go with VMs or a Serverless deployment, you'll have to pay it too.
If you're speaking about the storage and data transfers related to the Kubernetes control plane itself, I don't believe it represents a significant cost, even with a large cluster.