Hacker News new | ask | show | jobs
by elktown 676 days ago
I think some are just pretty sick and tired of the explosion of needless complexity we've seen in the last decade or so in software, and rightly so. This is an industry-wide problem of deeply misaligned incentives (& some amount of ZIRP gold rush), not specific to this particular case - if this one is even a good example of this to begin with.

Honestly, as it stands, I think we'd be seen as pretty useless craftsmen in any other field due to an unhealthy obsession of our tooling and meta-work - consistently throwing any kind of sensible resource usage out of the window in favor of just getting to work with certain tooling. It's some kind of a "Temporarily embarrassed FAANG engineer" situation.

4 comments

Fair point but I think the key point here is unnecessary complexity versus necessary complexity. Are zero-downtime deployments and load balancing unnecessary? Perhaps for a personal project, but for any company with a consistent userbase I'd argue these are a non-negotiable, or should be anyways. In a situation where this is the expectation, k8s seems like the simplest answer, or near enough to it.
They are many ways to do deployments without downtime and load balancing is easy to configure without k8s.
I agree with this somewhat. The other day I was driving home and I saw a sprinkler head and broke on the side of the road and was spraying water everywhere. It made me think, why aren't sprinkler systems designed with HA in mind? Why aren't there dual water lines with dual sprinkler heads everywhere with an electronic component that detects a break in a line and automatically switches to the backup water line? It's because the downside of having the water spray everywhere, the grass become unhealthy or die is less than how much it would cost to deploy it HA.

In the software/tech industry it's common place to just accept that your app can't be down for any amount of time no matter what. No one checked to see how much more it would cost (engineering time & infra costs) to deploy the app so it would be HA, so no one checked to see if it would be worth it.

I blame this logic on the low interest rates for a decade. I could be wrong.

This week we had a few minutes of downtime on an internal service because of a node rotation that triggered an alert. The responding engineer started to put together a plan to make the service HA (which would have tripled the cost to serve). I asked how frequently the service went down and how many people would be inconvenienced if it did. They didn't know, but when we checked the metrics it had single-digit minutes of downtime this year and fewer than a dozen daily users. We bumped the threshold on the alert to longer than it takes for a pod to be re-scheduled and resolved the ticket.
This is most sensible thing I’ve read on here in a while. Engineers’ obsession with tinkering and perfection is the slow death of many startups. If you’re doing something important like banking or air traffic control fair enough but a CRUD app for booking hair appointments will survive a bit of downtime
You assume that the teams running these systems achieve acceptable uptime and companies aren't making refunds for missed uptime targets when contracts enforce that, or losing customers. There is definitely a vision for HA at many companies, but they are struggling with and without k8s.
Why would wanting redundancy be a ZIRP? Is blaming everything on ZIRP like Mercury was in retrograde but for economics dorks?
It depends on the cost of complexity you're adding. Adding another database or whatever is really not that complex so yeah sure, go for it.

But a lot of companies are building distributed systems purely because they want this ultra-low downtime. Distributed systems are HARD. You get an entire set of problems you don't get otherwise, and the complexity explodes.

Often, in my opinion, this is not justified. Saving a few minutes of downtime in exchange for making your application orders of magnitude more complex is just not worth it.

Distributed systems solve distributed problems. They're overkill if you just want better uptime or crisis recovery. You can do that with a monolith and a database and get 99.99% of the way there. That's good enough.

Redundancy, like most engineering choices, is a cost/benefit tradeoff. If the costs are distorted, the result of the tradeoff study will be distorted from the decisions that would be made in "more normal" times.
Because the company overhired to the point where people were sitting around dreaming up useless features just to justify their workday.
> It's some kind of a "Temporarily embarrassed FAANG engineer" situation.

FAANG engineers made the same mistake, too, even though the analogy implies comparative competency or value.

Any software engineer who thinks K8 is complex shouldn’t be a software engineer. It’s really not that hard to manage.
I think the key word is “needless” in terms of complexity. There are a lot of k8 projects that probably could benefit from a simpler orchestration system— especially at smaller firms
For me it was DC/OS with marathon and mesos! It worked, it was a tank and it's model was simple.There was also some nice 3rd party open source systems around Mesos that where also simple to use. Unfortunately Kube won.

While nomad can be interesting again it's a single "smallish" vendor pushing an "open" (see debacle with Teraform) source project.

do you have a simpler orchestration system you'd recommend?
How is it more simple?
Every time I read about Nomad, I wonder the same. I swear I'm not trolling here, I honestly don't get how running Nomad is simpler than Kubernetes. Especially considering that there are substantially more resources and help on Kubernetes than Nomad.
No, it just looks and feels like enterprisy SOAP XML