| HN Mirror

I don't have any customer data on hand at the moment, but I'd roughly describe it as basic scheduling on strict priority order getting the system to 80+% usage, and then backfill boosting that up to 90%+. Careful backfill tuning, and care in defining the priority structure gets you to 95+%.

97% is the highest specific value I can recall for any of the larger sites with a heavily mixed workload, absent having a nearly infinite supply of short+small jobs at hand to use to fill those gaps. And a lot of users aren't trying to chase that - for these large scale "capability" systems the goal is to scale out as large as you can anyways, the smaller stuff is usually relegated to "capacity" systems elsewhere with a less expensive architecture.

One thing that at least some schedulers can manage is the idea of a min+max runtime for a job, combined with a min+max node/cpu count. If you have users willing to 'scavenge' otherwise wasted time by running under such a regime that can put you closer to full usage.