| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by glenjamin 505 days ago

The fundamental gap in my opinion, is that k8s has no mechanism (that I am aware of) to notify the load balancing mechanism (whether that's a service, ingress or gateway) that it intends to remove a node - and for the load balancer to confirm this is complete.

This is how all pre-k8s rolling deployment systems I've used have worked.

So instead we move the logic to the application, and put a sleep in the shutdown phase to account for the time it takes for the load balancer to process/acknowledge the shutdown and stop routing new traffic to that node.

2 comments

kunley 505 days ago

K8s made simple things complicated, yet it doesn't have obvious safety (or sanity) mechanisms, making everyday life a PITA. I wonder why it was adopted so quickly despite its flaws, and the only thing coming to my mind is, like Java in 90s: massive marketing and propaganda that it's "inevitable"..

link

deathanatos 505 days ago

> put a sleep in the shutdown phase to account for the time it takes for the load balancer to process/acknowledge the shutdown and stop routing new traffic to that node.

Again, I don't see why the sleep is required. You're removed from the load balancer when the last connection from the LB closes.

link

glenjamin 505 days ago

That’s how you’d expect it to work, but that’s not how pod deletion works.

The pod delete event is sent out, and the load balancer and the pod itself both receive and react to it at the same time.

So unless the LB switchover is very quick, or the pod shutdown is slow - you get dropped requests - usually 502s.

Try googling for graceful k8s deploys and every article will say you have to put a preStop sleep in

link