Hacker News new | ask | show | jobs
by throw_5202 1077 days ago
It's so frustrating to talk to Kubernetes enthusiasts.

enthusiast: if you'll migrate service X from a small sets of VMs to K8s cluster (which will take N man-moths because of reasons) it will auto-scale

Old grumpy man: but the the load is low and predictable we don't need autoscale and if load will grow we will just create two more VMs

enthusiast: auto-scaling will save time in the distant and unlikely future and CTO agrees that K8s is the best way to run software so you have to migrate anyway

6 comments

We are running a busy self service ERP for growing 400k employees, on a single database instance for 7 years now. Currently at 256gb ram, 8tb database, 32 P9 cores. I got 99 problems but the single database server ain't one of them!
Parent was probably talking about the application server, which is the typical example of K8s usage, whereas most databases don't fit well there and are usually separate from the K8s cluster.
Yeah. They must be. I’ve had several k8s enthusiast dissuade from putting a db in k8s for performance reasons. This may change overtime but I believe there is a performance bottleneck in disk I/o (or at least that’s how it was explained to me)
That really depends on your k8s environment. If you're running on bare metal and use host bind mounts for storage, you won't have a problem. But as soon as you introduce shared storage it gets messy...
Doesn't that defeat the promise of k8 in the first place?

Because then it won't scale.

And the moment you do you run into other bottlenecks.

Meh, not having to worry about configuring load balancers and letsencrypt alone is worth the effort to get k8s running.
We've just migrated our ERP from a regular deployment on VMs into Kubernetes. We have a very bursty self-service component, so being able to scale up easily for the week we need it is quite nice. We've taken advantage of this ability multiple times and seen great success each time.

We're not running the DB in Kubernetes though.

Just curious, what backup/replication/disaster recovery solution(s) are you running with that?
Used to run Tivoli Storage Manager, now Commvault. Backup can be used for certain type of issues that are isolated to the database.

For DR, Entire stack is being continually synced with a set of hot server in separate datacentre. We run a fully DR exercise yearly and it works well. It's a fairly proprietary AIX / DB2 method though so not sure you'd find value unless you're using the exact same technology stack :-/

Quote them some Henry David Thoreau[1]: "Why should we live with such hurry and waste of life? We are determined to be starved before we are hungry. Men say that a stitch in time saves nine, and so they take a thousand stitches today to save nine tomorrow. As for work, we haven't any of any consequence."

[1] From Walden: read at https://www.gutenberg.org/files/205/205-h/205-h.htm

The irony is the last time I fell for this "SRE/K8s/Platform Engineering" crap, they promised auto-scaling and used it as a motivation for the tech. Yet when the time came, and services inevitably needed to scale due to load, they had to manually log in to increase some random config to let it grow.
create two more vms? just give it some more cores/ram/network/disks...
> Old grumpy man: but the the load is low and predictable we don't need autoscale and if load will grow we will just create two more VMs

That's perfectly ok, if your deployment costs are irrelevant and/or your company gladly pays up your infrastructure costs without a second thought.

This is not the case in some organizations, and toggling a setting to auto scale a deployment can automatically save you thousands of dollars per month.

Would you still be so casual about infrastructure costs if you had to bankroll the extra capacity you need to add to your baseline to support peaks?

The decision process involved in managing your single-box deployment is not the same that goes in managing global deployments with dozens of instances per region. Cloud providers charge a premium, and that premium is a lot.

It's like the thermostat in your office. If you're just running an AC in a single room then you can just set it to full blast to keep it day and night at a certain temperature. Once there's a decision to cut costs then you start to talk about the best time to turn off/turn on a AC unit.

Their entire point was that their load was static, and you've immediately started talking about how it changes constantly.
VM deployment has a drawback though. If the machine runs out of memory, it will lock up (at least that's my experience with EC2). And then you need to set up some sort of health checks and auto scaling (!) for EC2 based on those health checks. This is much more cumbersome and fragile than just containerising the app and setting it up in a k8s cluster which will automatically take care of this scenario.

Also, deploying a new build on VMs is extremely manual compared to k8s, unless again you set up some sort of home brew rube Goldberg machine to auto deploy. It's just way better to use k8s in tandem with a simple GA workflow.

I think grumpy old man knows about the drawbacks with VMs, but grumpy old man also knows that for his particular service k8s has drawbacks. Grumpy old man is thinking that if his particular service goes down, some users just tell him it's not working and it's not a huge problem like people are losing $10 million per hour because of it. And in the event that happens sometime in the next 23 months (if it happens at all) he'll just do that little manual process for a couple more VMs, and that's all the time he'll spend on it. Contrast that with the time it would take to get it ready for k8s and keep someone around who understands k8s well enough to take care of it, even though their need for it might be so low.
And grumpy old man also knows that compared to the 5-year overhead costs of some overcomplicated k8s system, it's quite reasonable to overprovision the bejesus at the colo bare metal>hypervisor layer as cheap insurance.
Every time I read something like this I think that there has to be a tipping point were a well-written (i.e. compiled, not interpreted) solution running on a beefy server has to more cost efficient than running the same thing spread over several containers.

Definitely not an expert, but I get the impression that this point is a lot higher that most people assume it is.

Spoiler alert: a well written application on a beefy server almost always beat the k8s rube Goldberg machines on cost. I still don't get why people refuse to think for themselves and just jump on the latest hype train.

K8s makes little sense unless you run on bare-metal. Once you jump to vms you are injecting another abstraction layer and take on a herculean level of ops without understanding what you're getting into.

VMs are nice when you can't fill a machine with a single task (plus whatever redundancy). Once you get to a single machine, you want to scale up that one machine; you can go a long way where scaling up is cheaper than scaling out. But at some point, scaling out gets cheaper.

I'm not sure where the check points are now, but typical points where cost jumps are desktop -> server socket, single socket -> two sockets, two sockets -> four sockets, four -> eight sockets. AFAIK, AMD EPYC isn't offered at more than two sockets, and going to four sockets used to be possible off the shelf but very expensive, and eight sockets was very expensive if off the shelf or very expensive because custom engineering. Sometimes ram costs go way up for the highest density too.