|
|
|
|
|
by mlthoughts2018
2223 days ago
|
|
You can’t have it both ways. If you need to monitor it and take corrective action (which you do) then you shouldn’t rely on it. This is an argument for making your liveness probe == readiness probe. It should just check pod availability in a minimal way, and if continuing to send the pod traffic based on this indicator turns out bad because of congestion, you want to see that causing errors and react, not let the scheduler take it out of service for new traffic. You want liveness & readiness to check the same thing, and it should be a non-trivial check of service health that is also very low latency. And as long as that check is passing, keep sending traffic. When the check fails, it should always be for a “hard down” reason that tells you the pod could not, regardless of traffic levels, accept traffic because it’s fundamentally internally down. |
|