| HN Mirror

Kubernetes supports this too - CPU "requests" are implemented as a) input to the scheduler (it will not over-schedule a box) b) configuration of CPU shares in the kernel CPU cgroup. So that's effectively what we're relying on now with --cpu-cfs-quota. If you have an 8-hyperthread box and you have three jobs requesting 4, 2, and 2 CPUs each, they will use up the entire machine from the point of view of Kubernetes' scheduler and the first job will get twice as many CPU shares as the second two.

The problem is that we're running services etc., not batch jobs, so we do want them to make meaningful forward progress. So we can't set the shares to zero. We just don't want a misconfigured / runaway job to starve out the rest of the machine, even when the other jobs are not trying to use 100% CPU.

A specific sub-case here is capacity planning - right now, even with CPU shares, you can request one CPU for a computationally-intensive multi-thread/process task, and if the rest of the box is running internal web services with sporadic traffic, it will easily be able to use the whole machine. But then if you launch eight instances of that same job on the machine, they'll all perform much worse. So ideally we want to proactively limit CPU usage so that application developers/operators get realistic expectations about performance, and in turn, we get realistic information about how heavily our cluster is actually used.