Hacker News new | ask | show | jobs
by asdfasgasdgasdg 2118 days ago
Even if you can do a particular problem on a single machine, sometimes that isn't the right call. In a work scheduled cluster environment, a task that wants the entire machine may have trouble getting a slot unless it has the priority to preempt everything else going on on that machine. We call such VMs "picky" and they don't get scheduling guarantees.
2 comments

Heh, sure. But Borg was not part of the stated question, just of the expected answer. It wasn't even close to being needed to meet the stated parameters (IIRC the whole data set would fit into a single SSD card, so one could even have room for growth without scaling out).
I'm having trouble thinking of how you'd end up with a "cluster" that couldn't provide the power of a single machine?
To make it concrete: consider a cluster of ten machines with 256G of ram. Team A and team B both schedule jobs with a requirement of 129G of RAM and five replicas each. The replicas get scheduled, one on each machine. Team C wants to schedule a task that takes an entire 256GB of ram. If they don't have sufficient priority to preempt the other jobs, then they will not schedule.

In real heterogenous-workload production clusters, every available machine likely has several VMs scheduled on it if the cluster isn't idle. There is never a full machine that's free unless some special effort has been taken to make it so.

The knapsack problem is known to be NP complete, and your algorithm that fixates on getting jobs the size of a single machine to run, will fail to run multi-machine jobs as successfully as an algorithm that does not. It's a far more interesting algorithm to think about than sorting lists of numbers. Job priorities are easy enough to add in, but the far more practical issue is with noisy neighbors. Even limiting things to single jobs on a single machines, network and storage bandwidth has bottlenecks the cluster scheduler has to optimize for.

To make things more complicated, a long-lived cluster is going to be made up of different classes of machines, from different CPU micro-architecture, so 'single machine' is overly constraining. Eg it's not interesting that a job with a 4 MiB requirement can always run if your job needs 32 GiB.

http://www.frankmcsherry.org/graph/scalability/cost/2015/01/...

"Rather than making your computation go faster, the systems introduce substantial overheads which can require large compute clusters just to bring under control.

In many cases, you’d be better off running the same computation on your laptop."

My limited experience fits this in that a bit of smarts on a single box beats a bunch of boxes.

(the link is a very good read BTW)

The biggest problem with that write-up is ignoring availability. "Fast all the way up to the crash" can be much worse than slow and steady.

Of course, for a batch job with a runtime under 11 minutes, that probably doesn't really matter too much. Just don't generalize that too much.