Hacker News new | ask | show | jobs
by alex_young 33 days ago
Clusters are almost never the right answer for most problems: https://yourdatafitsinram.net/
2 comments

You're drawing an incorrect conclusion from that site. Aside from the fact that "fitting in RAM" is not the only criterion for needing a cluster, the fact that it's possible to fit data into RAM on a single machine doesn't mean that's the most cost-effective, practical, or sensible solution.

A big advantage of clusters, and horizontal scaling in general, is the ability to easily dynamically scale to meet demand.

If you're running a system on a single machine that has N GB of memory and you need to scale to N+1, what do you do? Provision a new machine and migrate everything over?

No-one operates online real-time systems like this. Clusters make it much easier and less expensive to handle this.

On top of that, it's probably true that in some pure numerical problem-count sense, "most problems" don't need a cluster, but that's misleading. It's like saying "most businesses are mom-and-pop shops." Perhaps true, but it ignores hundreds of thousands of larger businesses, or even small business that have big data needs.

There are plenty of problems that involve large amounts of data, and that's increasingly true with ML applications.

I'm at a company of ~100 people which you've probably never heard of (classified as a "small" company in government stats, so not included in the hundreds of thousands figure I mentioned above.) We have 1.9 PB of data for our main environment. When we run processes that deal with it all, the clusters scale to thousands of vCPUs and tens of terabytes of RAM.

Several processes that run daily scale to 500+ vCPUs and many TB of RAM. For the latter, the data itself could probably fit in RAM on a humongous machine, but the CPUs wouldn't fit on a single machine. And we'd have to size the machines carefully every time we start them up. Clusters can scale up dynamically according to the demands of the jobs they're executing.

Not all clusters are elastic. Cloud infrastructure can be, but HPC setups before the cloud were not.
Even in a physical hardware, on-premise scenario, it's still easier to scale horizontally than vertically in almost all cases, for all the reasons I mentioned. That's a big reason why Kubernetes was adopted at an unprecedented pace at medium to large organizations - because it helps manage that approach.
They could have chosen Mesos instead. Kubernetes had other characteristics that allowed it to be adopted far and wide besides the ability to scale horizontally.
I said a big reason, not the only reason.

Besides, Mesos wasn't a good alternative for most companies, so saying "they could have chosen it instead" is a bit theoretical. Mesos was ambitious, but that made it less suitable for a plug & play system that fit easily into existing corporate systems, which had already adopted containers heavily.[] Another reason for Kubernetes' popularity is it didn't try to be a big leap forward the way Mesos did.

[]The Marathon container support for Mesos was released about a year after Kubernetes, but if you were going to set up a system for distributed orchestration of containers, it didn't make much sense to bring Mesos along for the ride. There's a reason Mesos is in Apache heaven now (the Attic.)

That is not very insightful. Your thesis started with the idea of elastic horizontal scaling.

Mesos was designed from ideas with HPC and catered to that. Large hardware capex, which is not elastic. Containers did not make sense in that world when it was architected, and it was retrofited into Mesos's architectural foundation.

Kubernetes was designed for a wide variety of workloads, and designed to be composable, versatile, and extensible. It has a far more decentralized approach, an antithesis to Mesos's Data Center as an OS approach. It isn't that Kubernetes did not try to do too much. It's that they laid a much more flexible foundation.

It turns out, Kubernetes was a better fit for a many more use-cases, even beyond large and medium sized enterprises. Kubernetes works pretty well at the edge and locally as well, and it runs many ML workloads on the other end of scale.

This is not theoretical.

that's..kind of not true. they weren't elastic in the sense that you never had to think about how big they were. but you had say 64k nodes, and people would launch jobs with 1000 of them, or 10000, or if if they could clear the decks all of them. or if they were just debugging, maybe 5 of them.

so I guess idk what you mean by 'elastic' here.

Most data problems don't need to fit in RAM.