| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 0xbadcafebee 869 days ago

You aren't a real K8s admin until your self-managed cluster crashes hard and you have to spend 3 days trying to recover/rebuild it. Just dealing with the certs once they start expiring is a nightmare.

To avoid chicken-and-egg, your critical services (Drone, Vault, Bind) need to live outside of K8s in something stupid simple, like an ASG or a hot/cold EC2 pair.

I've mostly come to think of K8s as a development tool. It makes it quick and easy for devs to mock up a software architecture and run it anywhere, compared to trying to adopt a single cloud vendor's SaaS tools, and giving devs all the Cloud access needed to control it. Give them access to a semi-locked-down K8s cluster instead and they can build pretty much whatever they need without asking anyone for anything.

For production, it's kind of crap, but usable. It doesn't have any of the operational intelligence you'd want a resilient production system to have, doesn't have real version control, isn't immutable, and makes it very hard to identify and fix problems. A production alternative to K8s should be much more stripped-down, like Fargate, with more useful operational features, and other aspects handled by external projects.

5 comments

nvarsj 869 days ago

It's kind of the modus operandi of Kubernetes since inception. The core model is okay, but ops was always a barely constructed afterthought. And the network stack (kube-proxy) was literally a summer of code project.

I'm thinking a lot of that was by design - both Redhat and Google had incentives to get you onto their value-add to get an actual production ready system.

It also created an entire cottage industry, although much of this has faded as everyone moved to purely managed solutions. Because anything else is absolutely insane.

treflop 869 days ago

I’m not sure if it’s intentional. I don’t find the other container orchestrators that much better either.

No one ever cares about making tooling in any software project. You’re always using something by a dead-ass random third-party.

Microsoft is probably the company where I am actually using Microsoft-made tools to manage Microsoft-made products. And maybe Adobe back in the day.

vundercind 869 days ago

In the bad old days of self-managing some servers with a few libvirt VMs and such, I’d have considered a 3-day outage such a shockingly bad outcome that I’d have totally reconsidered what I was doing.

And k8s is supposed to make that situation better, but these multi-day outage stories are… common? Why are we adding all this complexity and cost if the result is consumer-PC-tower-in-a-closet-with-no-IAC uptime (or worse)?

aliasxneo 869 days ago

I've been running Kubernetes in production for two years and have never experienced anything remotely close to this. The worst is a node dies every now and then and, on a rare occasion, a workload doesn't happily migrate.

Of course, my experience is in no way authoritative, but referencing this type of incident as common is pretty foreign to me and may be mostly relegated to self-managed clusters.

goosejuice 869 days ago

GKE since 2017 here. Healthcare. I think we had one major outage that involved the cluster itself. It resolved itself and we never discovered what caused it. That was in the early days, so I recall very little.

Now I'm using Fly.io. They both have their advantages. Folks tend to make kubernetes sound way more difficult than it is. It can be overkill but it can also solve so many challenges out of the box. At least when it's managed. It'll cost you though.

JohnMakin 869 days ago

> may be mostly relegated to self-managed clusters.

Foreign to me too, but not surprising people report issues as common. there are a lot of footguns in kubernetes that come from a lack of understanding.

You can build a robust kubernetes cluster that hosts an application that’s nearly impossible to bring offline without an act of god, it just takes some know-how and a tiny bit of effort/experience.

lmm 869 days ago

> And k8s is supposed to make that situation better, but these multi-day outage stories are… common? Why are we adding all this complexity and cost if the result is consumer-PC-tower-in-a-closet-with-no-IAC uptime (or worse)?

I'm honestly convinced it's half CV-driven development, and half just the fact that it's become the standard workaround for Python dependency hell. Python is still the easiest way to write software, and it's still basically impossible to make an application that works reliably on more than one machine because of how Python dependency management works (or rather doesn't), so you have to use Docker, and apparently Kubernetes is the standard way you deploy Docker containers.

sshine 869 days ago

> apparently Kubernetes is the standard way you deploy Docker containers.

I bet more people actually use docker compose because the buyin is that much smaller.

a_vanderbilt 869 days ago

Anecdata, but in my experience, it's been podman for new deployments. Plenty of old stuff on Docker though. It's easier to grow out of Podman and into k8s than it is to go from compose, to swarm, then k8s. Easier to get buy-in for the ease of Docker from ops, easier to get leadership buy-in on the security of Podman. Such is life.

hardware2win 869 days ago

>Python is still the easiest way to write software

Try dotnet then

lmm 868 days ago

There are good things about dotnet (I'm more of a Scala person these days, but I have plenty of respect for F#), but there's nothing in there that lets you get up and running remotely as quickly as Python. (I mean, you don't even get a REPL without doing some messing around)

elzbardico 869 days ago

> and it's still basically impossible to make an application that works reliably on more than one machine because of how Python dependency management works (or rather doesn't)

This is complete bullshit.

bionsystem 869 days ago

I remember we were running 500 solaris zones and 5000 vmware VMs over 2 datacenters with 0 major outages over 4 years. I remember a (single) VM crashing and it was a really big deal (turned out it was a config issue, in retrospect a funny one although our (internal) client lost some data). And I remember we were in "crisis mode" for a couple weeks because of SAN storage issues but there was no client interruption of more than 1 minute over those 2 weeks. One of our client was running our app in a cross-datacenter cluster on bare metal with no interruption for over 20 years.

I'm not advocating for any of those specific solutions and given the choice I would probably use something else, but when I see that my previous CTO wanted kube for single-VM deployments, and a former architect collegue wanted kube for apps that were going to be used by 3 to 5 clients maximum (and in both cases to be run by very small and untrained teams), I think the kool aid has been more than drunk, and I'm now avoiding it like the plague.

0xbadcafebee 869 days ago

Complexity and cost aren't bad when they help produce something of value that we wouldn't have otherwise.

For $150 I can fly round trip from New York to San Francisco, on a massively costly and complex giant noisy metal tube with two blades sticking out the sides that are so strong you could put a tank on each one and the blades still wouldn't droop. Why does it have to be so costly and complex, if I could do something simpler, like take a bus? Well, mostly to keep me from dying. But also to carry lots of luggage, keep costs down, and get me there 15x faster.

K8s does provide great value (as a dev tool), but lacks value in production features, and its design is shit. So I wouldn't say complexity and cost are the downside; it's the lacking production value that's the downside.

brightball 869 days ago

Personally, I’m a big fan for QA review sites. Deploy multiple low traffic full site clones to a cluster and spin them up and down as needed. Manual review, automated scans, etc. It’s great for that use case IMO.

In production I always want dedicated resources though.

throwboatyface 869 days ago

Honestly in this day and age rolling your own k8s cluster is negligent. I've worked at multiple companies using EKS, AKS, GKE, and we haven't had 10% of the issues I see people complaining about.

dilyevsky 869 days ago

I've picked my fair share of outages on managed k8s solutions. The difference there is once it's hosed, your fate is 100% in the hands of cloud support and well... good luck with that one. The cloud apologists in this thread will ofc try to shame you for not buying into their marketing

catchnear4321 869 days ago

if your fate is in the hands of one of the cloud gods, what right does anyone have to blame you for what transpires?

mere mortals are not privy to all of the internal downstream impacts from that public-facing service outage. it would be like shouting into the void and expecting an answer, and, more, liking it.

no, it is easier to recognize one’s place, pay the tithes, and enjoy one god’s blessings and curses alike. do not stray and attempt to please two, it will only end in misery. (three is right out.)

jauntywundrkind 869 days ago

Once your team has upgrades down, everything is pretty rote. This submission (Urbit, lol) seemed particularly incompetent at managing cert rotation.

The other capital lesson here? Have backups. The team couldnt restore a bunch of their services effectively, cause they didn't have the manifests. Sure, a managed provider may have less disruptions/avoid some fuckups, but the whole point of Kubernetes is Promise Theory, is Desired State Mamagememt. If you can re-state your asks, put the manifests back, most shit should just work again, easy as that. The team had seemingly no operational system so their whole cluster was a vast special pet. They fucked up. Don't do that.

nyolfen 869 days ago

this is actually a separate project from urbit, called urb-it https://urb-it.webflow.io/

anotherhue 869 days ago

Different Urbit.

ikiris 869 days ago

What's drone?

dabber 869 days ago

https://www.drone.io/

> Automate Software Build and Testing Drone is a self-service Continuous Integration platform for busy development teams.

0xbadcafebee 869 days ago

Simplest possible CI tool that exists, as far as I'm aware. Gives you just barely everything you need, everything is stupid simple, and it just works.

There's an OSS fork in development (https://woodpecker-ci.org/) but it's far behind in terms of features and stability.