Hacker News new | ask | show | jobs
by rsanders 2103 days ago
Can you expand on this: "AWS healthchecks each kubernetes node, but not your pods themselves".

Are you talking about a keepalive connection to an unhealthy pod which is reused for multiple requests? So the failure modes are, if I understand you correctly, a) the ALB keeps sending requests through an established keep-alive HTTP connection which terminates in an unhealthy pod, but which it sees as healthy because the node is healthy and can route traffic to another, healthy pod, and b) the health of an established HTTP keepalive connection is perceived to be that of the node rather than the destination pod, so nodes which become unhealthy can cause the ALB to unnecessarily terminate a keepalive connection.

We had to switch to using target-type=instance because of issues with pods not being deregistered. I'd prefer to use target-type IP but it seemed like preventing 500s on rollouts required a bit of testing and tuning with a very specific approach. e.g. introducing a longish delay on pod termination with a lifecycle hook and using the pod readiness gate support recently added to alb-ingress-controller.

1 comments

You've got it exactly right. Your problem of pods not being deregistered is a real problem, but also with a quick fix: The default "Deregistration delay" for ALBs is 300 seconds but for kubernetes pods the TerminationGracePeriod defaults to 60 seconds. This means that your load balancer keeps trying that pod for 4 whole minutes after it's been hard-shutdown.

Here's the annotation that I used to fix that:

    alb.ingress.kubernetes.io/target-group-attributes: deregistration_delay.timeout_seconds=30,slow_start.duration_seconds=30