| IMHO there are some serious problems here that won't relate to many situations, and is not really "waste" in the way claimed and will actually probably result in greater spends. > Memory waste: request - actual usage [0] Memory "requests" are hints to the kube-scheduler for placement, not a target for expected usage. > # Memory over-provisioning: limit > 2x request [1] Memory limits are for enforcement, typically when to call the OOM killer Niether placement nor oomkilling limits should have anything to do with normal operating parameters. > The memory request is mainly used during (Kubernetes) Pod scheduling. On a node that uses cgroups v2, the container runtime might use the memory request as a hint to set memory.min and memory.lo [2] By choosing to label the delta between these two as "waste" you will absolutely suffer from Goodhart's law and you will teach your dev team to not just request, but allocate memory and don't free it so that they can fit inside this invalid metric's assumptions. It is going to work against the more reasonable goals of having developers set their limits as low as possible without negative effects, while also protecting the node and pod from memory leaks, while still gaining the advantages of over-provisioning, which is where the big gains are to be made. [0] https://github.com/WozzHQ/wozz/blob/main/scripts/wozz-audit....
[1] https://github.com/WozzHQ/wozz/blob/main/scripts/wozz-audit....
[2] https://kubernetes.io/docs/concepts/configuration/manage-res... |
If I request 8GB for a pod that uses 1GB, the autoscaler spins up nodes to accommodate that 8GB reservation. That 7GB gap is capacity the company is paying for but cannot use for other workloads.
Valid point on Goodhart's Law, tho the goal shouldn't be fill the RAM, but rather lower the request to match the working set so we can bin-pack tighter.