Hacker News new | ask | show | jobs
by jrockway 1376 days ago
I think the boundary makes a lot of sense. Cluster autoscaling only responds to scheduling pressure; if there are pending pods, a new node is added to the cluster so those pods can run. Meanwhile, horizontal pod autoscaling is a totally different system; it adds pods for that service when system-level metrics indicate that it should. Vertical pod autoscaling is again mostly unrelated; if metrics indicate that a certain pod should be bigger, a bigger version is scheduled.

I do see why more integration would be useful, though, including disruption budgets. Mostly for consolidating the incremental cluster autoscaling results onto one node from time to time, without waiting for the workload to naturally disappear or decrease in scale. Also, it would be nice to say "hey if ARM spot nodes are cheaper than AMD64, just reschedule these workloads onto ARM". Basically, it's still the very early days of optimizing cost, latency, and throughput.

2 comments

The cluster autoscaler will do pod compaction. It would be nice to specify when to favor more compaction than expansion because you know the traffic is going to fall off after a certain time during the day.

The main thing the integration helps with is reducing the startup time when there is scheduling pressure. If you know your increase in number of pods will always mean an expansion in the nodegroup, you can proactively and optimistically expand the nodegroup.

What's a case where you need to scale up pods but not nodes? (I think the case where you need to scale vertically vs horizontally is easily imagined, though)
Anything that horizontally scales. Remember that scheduling pressure doesn't only add nodes, you can also preempt lower priority workloads.

So maybe you have an application server that uses 1 CPU and 1G of RAM per instance, and can handle 10,000 requests per second. If you are getting 30,000 requests per second, then you'll want at least 2 more replicas to handle the load with an acceptable response time. You also run fuzz tests in the background at a very low priority. So scheduling 2 more application server replicas will causes those jobs to be preempted, and give your application 3 CPUs and 3G of RAM.

Basically, with this sort of autoscaling, you are always using 100% of your computers for something, but when there is some business to do or money to be made, you can give the revenue critical stuff priority.

As always, there are technicalities as to why you wouldn't want to do this. Maybe you think your fuzz testing is going to find a container escape and destroy the VM that it's running on, so you don't want production traffic anywhere near it. Or maybe the 10,000 requests per second that your application server can handle with 1 CPU actually uses all of the network capacity on the node, so you have to scale across other machines in order to handle any more requests. It all depends, but the flexibility is there to get yourself high utilization of your physical hardware.

If you have some spare capacity on another node.

This can happen organically due to scaling different systems at different times.

During the day you do a lot of user load, during the night you do a lot of batch processing, as one workload scales down another can scale up: without needing more virtual machine instances.