Hacker News new | ask | show | jobs
by glenjamin 458 days ago
The fundamental gap in my opinion, is that k8s has no mechanism (that I am aware of) to notify the load balancing mechanism (whether that's a service, ingress or gateway) that it intends to remove a node - and for the load balancer to confirm this is complete.

This is how all pre-k8s rolling deployment systems I've used have worked.

So instead we move the logic to the application, and put a sleep in the shutdown phase to account for the time it takes for the load balancer to process/acknowledge the shutdown and stop routing new traffic to that node.

2 comments

K8s made simple things complicated, yet it doesn't have obvious safety (or sanity) mechanisms, making everyday life a PITA. I wonder why it was adopted so quickly despite its flaws, and the only thing coming to my mind is, like Java in 90s: massive marketing and propaganda that it's "inevitable"..
> put a sleep in the shutdown phase to account for the time it takes for the load balancer to process/acknowledge the shutdown and stop routing new traffic to that node.

Again, I don't see why the sleep is required. You're removed from the load balancer when the last connection from the LB closes.

That’s how you’d expect it to work, but that’s not how pod deletion works.

The pod delete event is sent out, and the load balancer and the pod itself both receive and react to it at the same time.

So unless the LB switchover is very quick, or the pod shutdown is slow - you get dropped requests - usually 502s.

Try googling for graceful k8s deploys and every article will say you have to put a preStop sleep in