What's a case where you need to scale up pods but not nodes? (I think the case where you need to scale vertically vs horizontally is easily imagined, though)
Anything that horizontally scales. Remember that scheduling pressure doesn't only add nodes, you can also preempt lower priority workloads.
So maybe you have an application server that uses 1 CPU and 1G of RAM per instance, and can handle 10,000 requests per second. If you are getting 30,000 requests per second, then you'll want at least 2 more replicas to handle the load with an acceptable response time. You also run fuzz tests in the background at a very low priority. So scheduling 2 more application server replicas will causes those jobs to be preempted, and give your application 3 CPUs and 3G of RAM.
Basically, with this sort of autoscaling, you are always using 100% of your computers for something, but when there is some business to do or money to be made, you can give the revenue critical stuff priority.
As always, there are technicalities as to why you wouldn't want to do this. Maybe you think your fuzz testing is going to find a container escape and destroy the VM that it's running on, so you don't want production traffic anywhere near it. Or maybe the 10,000 requests per second that your application server can handle with 1 CPU actually uses all of the network capacity on the node, so you have to scale across other machines in order to handle any more requests. It all depends, but the flexibility is there to get yourself high utilization of your physical hardware.
This can happen organically due to scaling different systems at different times.
During the day you do a lot of user load, during the night you do a lot of batch processing, as one workload scales down another can scale up: without needing more virtual machine instances.
So maybe you have an application server that uses 1 CPU and 1G of RAM per instance, and can handle 10,000 requests per second. If you are getting 30,000 requests per second, then you'll want at least 2 more replicas to handle the load with an acceptable response time. You also run fuzz tests in the background at a very low priority. So scheduling 2 more application server replicas will causes those jobs to be preempted, and give your application 3 CPUs and 3G of RAM.
Basically, with this sort of autoscaling, you are always using 100% of your computers for something, but when there is some business to do or money to be made, you can give the revenue critical stuff priority.
As always, there are technicalities as to why you wouldn't want to do this. Maybe you think your fuzz testing is going to find a container escape and destroy the VM that it's running on, so you don't want production traffic anywhere near it. Or maybe the 10,000 requests per second that your application server can handle with 1 CPU actually uses all of the network capacity on the node, so you have to scale across other machines in order to handle any more requests. It all depends, but the flexibility is there to get yourself high utilization of your physical hardware.