| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vundercind 869 days ago
	In the bad old days of self-managing some servers with a few libvirt VMs and such, I’d have considered a 3-day outage such a shockingly bad outcome that I’d have totally reconsidered what I was doing. And k8s is supposed to make that situation better, but these multi-day outage stories are… common? Why are we adding all this complexity and cost if the result is consumer-PC-tower-in-a-closet-with-no-IAC uptime (or worse)?

4 comments

aliasxneo 869 days ago

I've been running Kubernetes in production for two years and have never experienced anything remotely close to this. The worst is a node dies every now and then and, on a rare occasion, a workload doesn't happily migrate.

Of course, my experience is in no way authoritative, but referencing this type of incident as common is pretty foreign to me and may be mostly relegated to self-managed clusters.

link

goosejuice 869 days ago

GKE since 2017 here. Healthcare. I think we had one major outage that involved the cluster itself. It resolved itself and we never discovered what caused it. That was in the early days, so I recall very little.

Now I'm using Fly.io. They both have their advantages. Folks tend to make kubernetes sound way more difficult than it is. It can be overkill but it can also solve so many challenges out of the box. At least when it's managed. It'll cost you though.

link

JohnMakin 869 days ago

> may be mostly relegated to self-managed clusters.

Foreign to me too, but not surprising people report issues as common. there are a lot of footguns in kubernetes that come from a lack of understanding.

You can build a robust kubernetes cluster that hosts an application that’s nearly impossible to bring offline without an act of god, it just takes some know-how and a tiny bit of effort/experience.

link

lmm 869 days ago

> And k8s is supposed to make that situation better, but these multi-day outage stories are… common? Why are we adding all this complexity and cost if the result is consumer-PC-tower-in-a-closet-with-no-IAC uptime (or worse)?

I'm honestly convinced it's half CV-driven development, and half just the fact that it's become the standard workaround for Python dependency hell. Python is still the easiest way to write software, and it's still basically impossible to make an application that works reliably on more than one machine because of how Python dependency management works (or rather doesn't), so you have to use Docker, and apparently Kubernetes is the standard way you deploy Docker containers.

link

sshine 869 days ago

> apparently Kubernetes is the standard way you deploy Docker containers.

I bet more people actually use docker compose because the buyin is that much smaller.

link

a_vanderbilt 869 days ago

Anecdata, but in my experience, it's been podman for new deployments. Plenty of old stuff on Docker though. It's easier to grow out of Podman and into k8s than it is to go from compose, to swarm, then k8s. Easier to get buy-in for the ease of Docker from ops, easier to get leadership buy-in on the security of Podman. Such is life.

link

hardware2win 869 days ago

>Python is still the easiest way to write software

Try dotnet then

link

lmm 868 days ago

There are good things about dotnet (I'm more of a Scala person these days, but I have plenty of respect for F#), but there's nothing in there that lets you get up and running remotely as quickly as Python. (I mean, you don't even get a REPL without doing some messing around)

link

elzbardico 869 days ago

> and it's still basically impossible to make an application that works reliably on more than one machine because of how Python dependency management works (or rather doesn't)

This is complete bullshit.

link

bionsystem 869 days ago

I remember we were running 500 solaris zones and 5000 vmware VMs over 2 datacenters with 0 major outages over 4 years. I remember a (single) VM crashing and it was a really big deal (turned out it was a config issue, in retrospect a funny one although our (internal) client lost some data). And I remember we were in "crisis mode" for a couple weeks because of SAN storage issues but there was no client interruption of more than 1 minute over those 2 weeks. One of our client was running our app in a cross-datacenter cluster on bare metal with no interruption for over 20 years.

I'm not advocating for any of those specific solutions and given the choice I would probably use something else, but when I see that my previous CTO wanted kube for single-VM deployments, and a former architect collegue wanted kube for apps that were going to be used by 3 to 5 clients maximum (and in both cases to be run by very small and untrained teams), I think the kool aid has been more than drunk, and I'm now avoiding it like the plague.

link

0xbadcafebee 869 days ago

Complexity and cost aren't bad when they help produce something of value that we wouldn't have otherwise.

For $150 I can fly round trip from New York to San Francisco, on a massively costly and complex giant noisy metal tube with two blades sticking out the sides that are so strong you could put a tank on each one and the blades still wouldn't droop. Why does it have to be so costly and complex, if I could do something simpler, like take a bus? Well, mostly to keep me from dying. But also to carry lots of luggage, keep costs down, and get me there 15x faster.

K8s does provide great value (as a dev tool), but lacks value in production features, and its design is shit. So I wouldn't say complexity and cost are the downside; it's the lacking production value that's the downside.

link