Hacker News new | ask | show | jobs
by mikeocool 2 days ago
I made this decision at a startup (albeit when the eng team was ~30 people, and we had a monolith with ~10 supporting services). I wouldn’t do it again, even for the reasons stated in the article.

The uniformity is nice, we were moving from apps running directly ec2 instances provisioned with ansible. Each time we spun up a new service it was a process to get the ec2 instances provisioned just so.

But k8s is such a pain in the ass. One thing that I think people new to it don’t realize is that it’s not at all batteries included - to get a basic managed cluster setup, you’re still going to be installing a bunch of additional controllers (ingress, cert-manager, external dns to start). And then you’re on the hook for making sure all those processes stay up (hope the admission webhook controller for a critical resource doesn’t go down!). Then you’ve got to do a major upgrade on not only your cluster, but all of those controllers every ~3 months. And no one is shy about introducing breaking changes.

Also you’re introducing a huge amount of complexity with the k8s networking and dns layer that most startups have zero need for (if you’re on EKS, make sure to read about scaling and monitoring CoreDNS).

I think there is a real hole in the market for a simple solution that lets you deploy some containers to some instances in a declarative fashion without all of that complexity and does decent LTS versions. I imagine there’s something out there that does this, but k8s has really sucked up all the oxygen.

31 comments

Pretty sure if there was a simple alternative, people would hate it.

Everyone initially wants thing A. But then they want to customize it to do all permutations and combinations n of A, B, C. They want it to be extensible. They want redundancy. They want orchestration. They want integration.

It’s why practically every config file format eventually becomes its own scripting language. Even HTML started off simple — now ridiculously complex — all the more ironic since practically nobody writes it by hand. Instead of CSS simplifying it, it became more complex.

There is another thing that is extremely customizable and extensible. It’s called a programming language. People write programs to solve specific problems.

There seems to be a perverse trend of cobbling together a Byzantine mesh of libraries, plugins, and services with complex configuration files to make it do practically everything possible. We just used to write software for such purposes…

And for anyone who thinks HTML is simple… the A (anchor) tag has an “ping” attribute that results in POST requests to a list of URLs when a link is clicked ! The list of attributes and resulting variations in behavior is quite mind boggling. It was supposed to be a damn link! https://html.spec.whatwg.org/multipage/links.html

I don't think you can provide all the features of Kubernetes while reducing the complexity. What is possible is to support a subset of the features of Kubernetes while making it easy to use.

https://github.com/openrundev/openrun is a project I am building. It supports declarative deployments, on a single-node with Docker or onto Kubernetes. The target use cases is limited to standalone web app, like internal tools. No support for stateful services, you manage stateful services yourself. With that simplification, OpenRun provides a much easier developer experience.

I look forward to the evolution of your project into a less standardized Kubernetes as end users request more and more features of your project.
Targeting a specific use case (internal tools) should hopefully help avoid feature creep. Also, the goal is that an OpenRun config should work on a single-node with Docker and with Kubernetes. That limits the types of features which can be implemented (for example no Docker Compose support, no Helm support).
Wow, I didn't know about ping attributes.

Advertisers have really shaped the Web right down to it's core specifications.

There are already simpler alternatives, and yes people hate them too. Usually for the opposite reason of k8s: something they need isn’t included, and now bringing it is difficult or impossible.

Fargate and Cloud Run first come to mind.

Totally agree with you. K8s ends up being the simplest solution for a very complex problem
+1 on the problem of moving complexity from programming languages to configuration.

One of the main problems here is that programming languages typically have lots of tools to help validate correctness, whereas configuration tools are typically either much less mature or woefully underused.

There is nothing more frustrating in something failing due to a misconfiguration - but you've no idea what the correct value should be.

My helm adventures (a while back now, so maybe better these days):

> You have an error in your config on line 1. Good luck.

Wait did it actually say "Good luck" in the error message? If so that is hilarious
It was supposed to primarily be a link target as well as possibly a source. Berners-Lee guessed the ratio of in to out wrong there.
>Pretty sure if there was a simple alternative, people would hate it.

>Everyone initially wants thing A. But then they want to customize it to do all permutations and combinations n of A, B, C.

Oh, I wouldn't be so sure of that. Think about Eclipse vs IntelliJ. 10 years ago Eclipse had all the features, and IntelliJ didn't so it was fast. Most developers don't use all the features, so most developers were very happy to move to IntelliJ, even if it was not free. Then IntelliJ spent the next decade building lots of features it didn't have. Now everyone wants off of IntelliJ, because it's no longer fast. Now it's got a lot of "useless" features like Eclipse too.

not true. The market OFTEN prefers simple things over complicated things.
But there is money in complexity.
There is money in simplicity as well. The market demands and prefers it.
This so isn't true, that it's not even wrong.
Ahem:

“Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better.” ― Edsger Wybe Dijkstra

I really wish there was an 80% kubernetes. I think you could get there with some changes:

1. No overlay networks. 1 IP per machine. pods use dynamically allocated ports, and the kubelet enforces pods listen only on their assigned ports using seccomp.

2. No kube-proxy or equivalent Layer-4 "load-balancer". It's not good, but it's often used. You should use some kind of Layer-7 load balancing instead. Also you need to look up the port number from (1). This also greatly lessens the need for DNS.

3. A better config language. YAML and helm templates are terrible. kustomize is built into kubectl, but it's frustratingly limiting and also still very complicated. Something like nix would have been great. This can make it easier to upgrade third party configs since you can have more logic to validate and merge your settings with upstream defaults or templates.

4. Maybe an EBF-like for the api server? If the built-in k8s objects don't have a setting for something, then you need to write an operator or control loop yourself and then run that too, which is a big lift. Over time, k8s just keeps adding more and more built-in things and then revising them, which creates a ton of churn. If you could easily script simple operations, then they wouldn't have to build in every permutation ahead of time. E.g. the HorizontalPodAutoscaler has 24 config object types with several fields each, but all it does is set replicas based on data read from the api-server, so it could be replaced by some kind of flexible script that runs in the control plane.

Unless you hate HCL, 1, 2, and 3 pretty much describe Nomad exactly. We run over 100k production applications on Nomad. But migration to AWS from private data centers, our HashiCorp bill, and the severe lack of Nomad talent, have finally pushed us to k8s (EKS).
Unfortunate that Nomad hasn't gotten the attention it deserves.
I wish there was an EKS-like for Nomad!
1. You can't force third-party software to do that. There are programs with hardcored ports. There are programs which require XML modifications and container rebuilds to change port number. If your platform does not support launching of unmodified containers, it is severely restricted and not suitable for general use. All my programs always use port 8080 for HTTP, I don't make it configurable because I have no reason to.

2. Does not work for all protocols. Again your solution restricts the number of protocols to HTTP protocols. Might work for many uses, but still this restriction doesn't sound very good. Universal load balancer is much simpler conceptually.

3. YAML is not terrible. YAML is awesome. Kubernetes manifests are terrible, that's I agree with. Docker compose is nice, for example. Kubernetes manifests felt like they were designed to be generated from something, but everyone ended up writing them directly or with templates. Though I think that XML generally is superior format so I'd vote for XML in the end.

Overall your suggestions look like you want to shift complexity from cluster operator to software developer. I'm not sure industry supports that, recently it seems to move in the opposite direction, but that's interesting perspective. I guess with some wrappers for some containers it could be made usable.

But honestly you just want to throw away years of progress in containers and network namespaces. I understand that kubernetes mechanisms are somewhat complicated, but the core idea is to make pods look like virtual machines and I think this is very worthy idea.

Even with all its complexity, k8s doesn’t solve every problem — good luck running an FTP server or anything that needs to dynamically allocate a large range ports on k8s.

I would absolutely trade flexibility for complexity. Particularly for edge cases like hard coded ports.

I believe this is more like Borg if anything.
If these ideas served some useful purpose, they would already be implemented in kubernetes. The platform is quite extensible.
This reminds me of the joke about the economists who spot a $100 bill on the sidewalk.
I don't know... running a startup sized kubernetes is relatively easy and pain free these days (k3s). Especially when it comes to scaling up.

CNPG is an absolute monster (in a good way). cert-manager is easier than the docker alternative, calico has never failed me (except in bgp mode which has some footguns like not being able to come back from a dead state since it has a chicken and an egg problem unless you point it to the external load balancer which I would have known if I read the documentation). trafeik is all you need. talosos largely mitigates the bare metal problems and comes pre-hardened and pre-optimized.

I solo most of my development projects and have used k3s for all of them. The only complaint is that cert-manager by default will fail silently and your certificates will expire. I largely mitigated this by having proper visibility setup via grafana and automated alerts (warns if certificates are about to expire) which should have been done by me anyway.

Two years ago I'd agree, today with LLMs everything I have runs talos with fully automated updates and I haven't had to be on-call for almost a year.

I think parent would wish for something close to what heroku represented (what would it be ?)

K8s is easier at smaller scales (I understand k3s as a packaged version ?), but you still need one or two people in your team to properly understands all of the concepts and inner workings of k8s, and be able to neck deep into if/when shit hits the fan.

For a small team that's a lot of commitment for something that is usually not their bread and butter and wish they could build once and only slightly tweak every year or so.

even with just k3s and a few plugins/operators, it still takes someone dedicated to babying it. I've been running a k3s cluster at home for years and I dread upgrading all the things running on it, and all the things needed to keep it running.

and more to that last point, we haven't talked about maintaining the actual nodes themselves yet.

if you don't use alpha or beta annotations you rarely have to worry about updates, kubernetes has a very strong "do not break it" policy on non beta/alpha annotations.
Not using beta ingress was probably a non-starter for a lot of people, since it was the only option for 4 years.

Then there was an upgrade process that required a fair amount coordination between when you changed your manifests, when you upgraded your cluster and when you upgraded your ingress controller.

PodSecurityPolicies also gained a lot of traction and didn’t really have an alternative before it was deprecated.

Also, custom operators don’t all subscribe to the don’t break non-beta resources in the same way core does.

> cert-manager is easier than the docker alternative

  MDomain blog.kronis.dev
I'm not saying that cert-manager isn't nice, but with regular Docker/Compose/Swarm setups you can just run a web server/load balancer on whatever ports you want. With mod_md the above is all I really need in a regular .conf file to provision LetsEncrypt certs for my blog (very similar with something like Caddy too). And it's the same in Docker as it is when running the web server directly, I think that's why starting with Docker is really nice, because it has fewer custom abstractions and sometimes regular software does things elegantly already.
I’d take a cronjob running cert-bot and some monitoring to ensure a domain’s cert isn’t about to expire over cert-manager any day.

IIRC cert-manager has about three layers of custom resources to conv through when figuring out why a cert isn’t renewing.

> I think there is a real hole in the market for a simple solution that lets you deploy some containers to some instances in a declarative fashion without all of that complexity and does decent LTS versions

Hashicorp's Nomad basically is just that, supports various way of running stuff too which is neat. Shame about the license change which basically killed all my interest in it, so seems the hole is indeed still unfilled.

For simple cases I just launch podman containers on long lived hosts with ansible.

You can still add pods if needed and the systemd integration works.

Plus you can actually improve isolation by co-hosting services under separate UIDs.

Like any container it is just co-hosting, and elasticity is a bit slower with autoscaling instances, but it removes most of the complexity of K8s which very few org benefit from or have the culture to support.

Yeah I’ve always meant to check out nomad and never had an opportunity.

Though as I recall, it makes heavy use of consul, which I have used in anger, and makes me a little weary (though that experience is likely very out of date).

It doesn't require Consul IIRC, but bunch of features does depend on it, like service discovery and related stuff. But Nomad is totally usable without Consul for simpler setups.
They’ve now had nomad-native services without consul for a while, including health checks!
Oh neat, wasn't aware! Cheers for providing the update, now if they only could revert the license it might be something one could use again.
I've been using Nomad for years without Consul. Maybe if you complex networking requirements it is worth it, else don't really need it.
Nomad has gained basic service discovery and K/V store without Consul. However, health checking is extremely limited.
AWS ECS and GCP Cloud Run are this. Run a container on abstract compute. But they aren't "without all that complexity" because it turns out all that complexity is required for even simple use-cases. Load balancing with SSl certs, cloud API keys, deployment pipelines, sidecars, etc.
Those are hosted services? Completely different class of solution.
As CTO of a small startup and cutting costs, setting up hashicorp nomad + bare metal is a joy to work with.

Some self-reloading HAProxy in nomad to automatically assign URLs to services when needed. Could have used Consul but meh.

Tailscale for private networking.

> One thing that I think people new to it don’t realize is that it’s not at all batteries included - to get a basic managed cluster setup, you’re still going to be installing a bunch of additional controllers (ingress, cert-manager, external dns to start).

And if you can do this again, what's your solution to reverse proxy, certificate management, DNS...etc? I guess you can docker-compose some custom stack on a single machine, maybe add one more machine then you can say it's HA enough for small scale. But you can also spend the same amount of time to install those kubernetes controllers with zero customization. In my experience, if you go with the default configuration, most of the well-maintained k8s components are boring as hell these days.

> (if you’re on EKS, make sure to read about scaling and monitoring CoreDNS)

If load to your service increases, you need to scale up/out your service. This is universally true. Do you have a proprietary solution that's easier and more reliable than bumping up the replicas count in kubernetes?

There are lots of design decisions in Kubernetes that I hate. But if you want me to choose between Kubernetes and any proprietary stack, in 2026, I would definitely choose Kubernetes.

I use NixOS with nginx + acme / caddy, coredns and no docker anywhere. It's extremely homogeneous, easy to scale out (add another flake output, deploy to a new server, update DNS records). You could easily automate some of that with more nix, but I don't bother because that's already only like 50 lines of config.

I have a strong preference for renting bare metal and it has served me extremely well.

I totally believe this works for you. But in your case, isn't NixOS just another declarative orchestration system like Kubernetes? Similarly I can just run a standalone nginx, caddy with acme, and a coredns pod in a bare minimum k8s cluster.

Personally, I think the complexity is on the same level.

It really isn't comparable. Sure, nixpkgs is huge, but the surface area for what you need to understand and work with is considerably smaller. They aren't even really in the same domain anyways. I was able to get very comfortable with Nix(OS) in a single weekend, but it took me months to get to a similar level with the K8s ecosystem.
NixOS has no "runtime" or controlplane to maintain.
I don’t have an answer I’m in love with today, I basically just want less moving parts.

As for EKS, having to monitor and manually scale the built in DNS service or else my queries are just going to stop resolving is not the type of thing I expect to have to manage on a managed service. I see they have finally released autoscaling for CoreDNS, though it took them 6 years.

Accidental complexity and essential complexity. There is no working system that achieves all the stated aims with fewer parts. [1]

[1] https://en.wikipedia.org/wiki/No_Silver_Bullet

I've been building multi-cluster Kubernetes for some time and things like External DNS and Ingress controllers per app are just non-starters. They always felt kludgy having K8S orchestrate things external to the cluster and their anti-patterns IMO.
Docker swarm is that simple solution you're looking for. But people don't need simple solutions. They want scalable solutions and Kubernetes fits this niche perfectly. You can deploy it on single server today and scale to 100 servers managed cluster tomorrow.

Just to provide a similar example. Linux system is insanely complicated. Kernel alone has thousands of options. Distos have tens of thousands of packages. Wherever you look at, everything is hard and complicated. Firewall, containers, init system, filesystem hierarchy, storage layers. One would think that some people desire simpler operating system. But everybody uses Linux despite all these complexities. Try to find OpenBSD in production, for example. It's not easy.

People think they want infinitely scalable solutions. But I think what they fail realize is that by the time they actually hit a scale where it matter they’re going to have to change so much about their infra that prematurely using the scalable solution didn’t really but them much except a bunch of headaches when they didn’t have scale.
At our shop we're still cleaning up load-bearing tech debt 20 years later, a lot of which seems to have been built with an expectation "nah no way will this still be running next year"
Kubernetes zeitgest aside, the naming confusion between the old and new things called Docker Swarm doomed the new one.
There was once a time when we could deploy software without spinning up 3 etcd databases, multiple controller processes constantly running event loops, and a virtual networking layer, before you even get off the ground.

Perhaps those days are behind us.

No way. In the future people will be able to vibe code entire businesses in optimized assembler
It's a shit blog article. A shell script is what 99% of businesses need.
k8s is not a pain, I would never return to something like Pupet / Ansible / etc ... to deploy bare ec2 instances, it's just re-inventing the wheel badly.

Just use ECS / Fargate with an ALB in front if you need a simpler use case.

I've had the opposite experience. I used to run k8s on bare metal, troubleshooting something at least once a month (DNS going down was a recurring favorite). The breaking point came with the churn in the ecosystem, got bitten by the deprecation of the community darling weave net cni plugin, the killing of the nginx ingress was the nail in the coffin knowing I had way too many annotation tight to the ingress that it would take longer to migrate those than go the ansible way with k8s imposing tight upgrade schedules. While I agree ansible feels a lot more dirty than k8s, I spend much less time on infrastructure, sleeping betterat night and handling things like databases is much simpler
Don't run k8s on bare metal if you don't have a good platform team though. Using a cloud provider ( eks / aks / gke ) is trivial and "problem" free.

I've never had DNS problems going down, the only thing I've seen is app that would create too many iptables entries when flooding request dns ( app problem ).

>I think there is a real hole in the market for a simple solution that lets you deploy some containers to some instances in a declarative fashion without all of that complexity

That's how I see it as well but it's really tough to go against the grain. I have a small enthusiastic community of users around Uncloud (https://github.com/psviderski/uncloud) who went full circle - fed up with k8s and came back to simple, boring declarative Compose deployments across a handful of interconnected hosts.

Uncloud is essentially a cluster version of Docker Compose without a control plane and cluster management overhead.

We started out core product on ECS, which is a declarative way to run a containerized service. It has been nice and reliable, but it has limitations (slow scaling, weird AWS Quotas if you have ephemeral tasks).

We're moving our non-critical components onto EKS (pipelines, tooling, etc). We had one outage from runaway IP allocation in a subnet, but otherwise it's been pretty stable.

I do hear vague horror stories so I'm really not excited about moving our prod stack to it, but it's actually been really good for installing 3rd party software so far.

The pattern is pervasive. Big corp promotes a solution that fits their need. People read about it, think adopting big corp solution means they are doing the right thing. Few people have big corp need, let alone everyone big corps are different. And then endless hours spent fighting big corp solution to not so big corp problem.
Isn't fargate or ecs that simple service?
Google's Cloud Run is also pretty simple.
I find them just as complicated as k8s.
Working with k8s myself I'm somewhere in between you and the article on opinion. I think k8s is good when you can afford to hire a person dedicated to managing it (or at least find someone with experience in running it that can make it part of their MO)

That is, k8s is probably best considered when you are beginning to consider having an infrastructure department, or if one of your early hires knows Kubernetes and is opinionated in a way that is less "throw cool and complex stuff at the wall"* and more "the 5 things I want in a k8s cluster that I don't want to spend much time on and should just work"

My understanding of the 2000s and 2010s was that there was a big focus on inventing self service deployment systems for developers, and k8s is that solution(!), for the same scale that would begin considering re-inventing the wheel internally anyways

We also had similar problems at our small scale startup. We tried k8s for a pilot project, and observation was the same: the complexity was not worth it for us. We needed something simpler instead we adopted Nomad, which actually fit our use case. It had its own issues and bugs, but overall, it was much more straightforward to work with.
Is the simpler thing not Lambda/Cloud Run (with terraform)?

On the application developer side k8s is awesome fo, but the you look inside the box and it melts your face off.

I'm not sure a middle ground exists unfortunately. It's either full service like Lambda or bag of knives like k8s.

Without kubernetes you end up gluing a bunch of plumbing together anyways. It’s just that people don’t see the glue because they’re so used to it and learning something outside of that framework creations friction.

This isn’t so different from say Linux vs BSD. You can roll your own things and call it a system. Or you can just use something that targets a spec to provide a (mostly) cohesive and consistent layer to build upon.

As the article mentions towards the end, AWS EKS, GCP GKE, and other competitors have made k8s setup turnkey. You can deploy a new cluster with all the controllers you mentioned in a single click / Terraform.
> I think there is a real hole in the market for a simple solution

Unless of course, all of the busywork that comes with kubernetes IS the value (to the engineer). Perhaps a bunch of engineers know at some level that locking the company into an overcomplicated cloud-within-a-cloud setup that has all sorts of weekly issues and requires constant work gives them a lot of job safety that they wouldn't get if they just used an AWS autoscaling group and you're done for the next 5 years.

Because simpler solutions DO exist (like a loadbalancer in front of an autoscale group, and not making a giant SOA for an app that orders you taxis, or books you a bnb or whatever nonsense).

Kamal is somewhere in the middle. Probably a little closer to a bunch of bash scripts. But it’ll get your container going pretty quick. Can take a bit of fiddling with SSH/docker-login. Plus it handles deployments very well.
I built canine.sh for exactly that reason, gives you a sensible deployment platform on top of k8s with one install, and you can customize it once you outgrow it.
Your portainer link is broken.
Nomad, Consul and Vault all running on VMs that you manage with Terraform.

The problem is that when you run this long enough you want K8s features anyway.

And your starter “production” deployment of the Nomad/Consul/Vault stack is literally 12 VMs, comprising three independent Raft clusters. There is no decent way to do zero-downtime instance replacement without building your own orchestration layer, but also they’ve had a years-long track record of shipping bad upgrades and following up with only manual remediations or workarounds instead of a fix.

As someone who has productionized and maintained truly hundreds of those clusters across several jobs, it is hard at this point for me to recommend Consul, Nomad, or Vault to anyone serious about building reliable applications. Too many broken upgrades and manual click-ops tasks just to keep them online. (…and I’ve said nothing of the actual product!)

This is a timely post. We are going to use Consul to replace the need for Internal Load Balancers. What issues do you have with it?
I'm in a similar boat and only somewhat agree. The gist of my post was that this exists but maybe just use Kubernetes anyway.

I don't entirely agree with your statement about zero-downtime instance replacement though. We built our terraform around doing one-at-a-time instance replacement and removing/adding nodes in Hashicorp Raft clusters is pretty much the easiest thing I've ever done with infrastructure.

That's really always been the biggest selling point around Hashicorp's stuff for me. They made bootstrap and maintenance operations easy enough that a caveman could do it. Even recovering from problems isn't terribly hard unless you're already doing something stupid (Roblox outage).

I also have deployed and managed _hundreds_ of these over the last 8 years or so and I'm not really having the same problems that you do. But we don't upgrade to the latest and greatest because it _does_ take them a few versions to get their feature launches correct. This is mainly a Nomad problem now though -- consul and vault are pretty brainless to operate.

Still though, we _also_ use Kubernetes and I prefer it. Most of our software engineers don't though because they don't actually want to take the time to understand it, they just want to run binaries and forget about it.

> there is a real hole in the market for a simple solution that lets you deploy some containers to some instances in a declarative fashion without all of that complexity and does decent LTS versions

There's Nomad for this; I wish more teams would run Nomad.

to what extent would AWS EKS auto mode solve those problems?
"completely" in my experience
There was: Heroku

It was glorious.

>I think there is a real hole in the market for a simple solution that lets you deploy some containers

Containers? In this climate? What's the kernel LPE rate at after copyfail and copyfail2? No containers, VM or harden. No half measures.

If there's going to be something new, it needs to be topical, and containers are out.

> I think there is a real hole in the market for a simple solution that lets you deploy some containers to some instances in a declarative fashion without all of that complexity and does decent LTS versions. I imagine there’s something out there that does this, but k8s has really sucked up all the oxygen.

I mean, it's CDK and whatever equivalents other providers have, isn't it? If you fully embrace all the stuff they give you then it's straightforward to declare everything and it all works together. The downside is the vendor lock-in but unless you actively deploy to multiple environments, which most people don't, you're probably locked in in various ways without knowing about it.

I had exactly the same experience, and we went into the same direction (ansible + ec2). No regrets.
Docker swarm has been providing the simple solution for over a decade but nobody wants to use it because it's not k8s.
Dokku?
> k8s has really sucked up all the oxygen.

Because anything else involves making opinionated decisions that will be wrong for many users.

People who don’t understand why k8s is so widespread don’t understand all the problems it’s solving.

GCP cloud run is pretty close to this. We’re using it now and I’ve a lot of experience with gke.

They’ve announced persistent “instances” recently which solves a big problem for us - sometimes you want continual long running workloads.