| HN Mirror

My example was for non-prod and saving money there as I found that our development clusters tended to be the most under utilized per dollar spent. In development it was ok to put as many idle pods as possible on the nodes. If there was a spike, then yes you could get new nodes but I found that they scaled down nodes quite often.

My apologies in advance as the advice can be terrible depending on your environment and services. Below is not an exact science as you are dealing with requests and limits while trying to find optimal performance.

For production you need to calculate your minimum, average and max CPU/Memory for your a pod.

  1) Set your replicas to 1

  2) Determine what your true maximum CPU/Memory is for a pod. 

  Set your limits to very high and performance test against your pod. If your response time slows to a crawl then your limit is too high and your code may not be able to handle the load. If your response time is good while hitting the limit, increase the limit until performance goes down.

  3) Get your minimum CPU/Memory for your pod to start.

  5) Get your average CPU/Memory DURING THE SPIKES. You should be able to get this from past metrics. This can also be difficult to get because your load might be spread over several pods in your metrics.

  6) I use the following formulas:

     requests = (min + average)/2
     limits = (average + max)/2

  7) You now have a baseline for the future so that you can tweak the values.

  8) Set your autoscaler to something high like 80% CPU. You want this value to stay constant. I think GKE sets it to 60% but I found that to be far too low and wasteful.

  9) Observe and tweak the values to see if you can get things 'better' depending on your needs.

There are two other things I always do in production that help with stability and reliability.

  - Set the autoscaler behaviour to scale up quickly and scale down slowly. It stops these cycles of add 3 pods, remove 1 pod, add 2 pods, remove 3 pods chaos in short periods of time during spikes. The behavior field was added to the autoscaler resource a couple releases ago.

  - Set your minimum replicas to 2 for redundancy. I always do this in production.

I hope this helps and I apologize once again for the hand wavyness of things.