Hacker News new | ask | show | jobs
by vundercind 869 days ago
In the bad old days of self-managing some servers with a few libvirt VMs and such, I’d have considered a 3-day outage such a shockingly bad outcome that I’d have totally reconsidered what I was doing.

And k8s is supposed to make that situation better, but these multi-day outage stories are… common? Why are we adding all this complexity and cost if the result is consumer-PC-tower-in-a-closet-with-no-IAC uptime (or worse)?

4 comments

I've been running Kubernetes in production for two years and have never experienced anything remotely close to this. The worst is a node dies every now and then and, on a rare occasion, a workload doesn't happily migrate.

Of course, my experience is in no way authoritative, but referencing this type of incident as common is pretty foreign to me and may be mostly relegated to self-managed clusters.

GKE since 2017 here. Healthcare. I think we had one major outage that involved the cluster itself. It resolved itself and we never discovered what caused it. That was in the early days, so I recall very little.

Now I'm using Fly.io. They both have their advantages. Folks tend to make kubernetes sound way more difficult than it is. It can be overkill but it can also solve so many challenges out of the box. At least when it's managed. It'll cost you though.

> may be mostly relegated to self-managed clusters.

Foreign to me too, but not surprising people report issues as common. there are a lot of footguns in kubernetes that come from a lack of understanding.

You can build a robust kubernetes cluster that hosts an application that’s nearly impossible to bring offline without an act of god, it just takes some know-how and a tiny bit of effort/experience.

> And k8s is supposed to make that situation better, but these multi-day outage stories are… common? Why are we adding all this complexity and cost if the result is consumer-PC-tower-in-a-closet-with-no-IAC uptime (or worse)?

I'm honestly convinced it's half CV-driven development, and half just the fact that it's become the standard workaround for Python dependency hell. Python is still the easiest way to write software, and it's still basically impossible to make an application that works reliably on more than one machine because of how Python dependency management works (or rather doesn't), so you have to use Docker, and apparently Kubernetes is the standard way you deploy Docker containers.

> apparently Kubernetes is the standard way you deploy Docker containers.

I bet more people actually use docker compose because the buyin is that much smaller.

Anecdata, but in my experience, it's been podman for new deployments. Plenty of old stuff on Docker though. It's easier to grow out of Podman and into k8s than it is to go from compose, to swarm, then k8s. Easier to get buy-in for the ease of Docker from ops, easier to get leadership buy-in on the security of Podman. Such is life.
>Python is still the easiest way to write software

Try dotnet then

There are good things about dotnet (I'm more of a Scala person these days, but I have plenty of respect for F#), but there's nothing in there that lets you get up and running remotely as quickly as Python. (I mean, you don't even get a REPL without doing some messing around)
> and it's still basically impossible to make an application that works reliably on more than one machine because of how Python dependency management works (or rather doesn't)

This is complete bullshit.

I remember we were running 500 solaris zones and 5000 vmware VMs over 2 datacenters with 0 major outages over 4 years. I remember a (single) VM crashing and it was a really big deal (turned out it was a config issue, in retrospect a funny one although our (internal) client lost some data). And I remember we were in "crisis mode" for a couple weeks because of SAN storage issues but there was no client interruption of more than 1 minute over those 2 weeks. One of our client was running our app in a cross-datacenter cluster on bare metal with no interruption for over 20 years.

I'm not advocating for any of those specific solutions and given the choice I would probably use something else, but when I see that my previous CTO wanted kube for single-VM deployments, and a former architect collegue wanted kube for apps that were going to be used by 3 to 5 clients maximum (and in both cases to be run by very small and untrained teams), I think the kool aid has been more than drunk, and I'm now avoiding it like the plague.

Complexity and cost aren't bad when they help produce something of value that we wouldn't have otherwise.

For $150 I can fly round trip from New York to San Francisco, on a massively costly and complex giant noisy metal tube with two blades sticking out the sides that are so strong you could put a tank on each one and the blades still wouldn't droop. Why does it have to be so costly and complex, if I could do something simpler, like take a bus? Well, mostly to keep me from dying. But also to carry lots of luggage, keep costs down, and get me there 15x faster.

K8s does provide great value (as a dev tool), but lacks value in production features, and its design is shit. So I wouldn't say complexity and cost are the downside; it's the lacking production value that's the downside.