Hacker News new | ask | show | jobs
by lolc 1543 days ago
In my view it is a clear flaw that the signal to terminate can arrive while the server is still getting new requests. Being able to steer traffic based on your knowledge of the state of the system is one of the reasons why you'd want to set up an integrated environment where the load-balancer and servers are controlled from the same process.

The time to send the signal is entirely under control of the managing process. It could synchronize with the load-balancer before sending pods the term signal, and I'm unclear why this isn't done.

1 comments

I don't think there is anything reasonable to synchronize with that will guarantee no new connections. You can remove the address from the control plane synchronously, but the stale config might live on in the kubelet or kube-proxy distributed throughout the cluster. I don't think you want to have blocking synchronization with every node every time you want to stop a pod.

The alternative is that you wait some amount of time before dying instead of explicit synchronization, which is exactly what this lame-duck period is. You find out that you should die ASAP, and then you decide how long you want to wait until you actually die.

I don't really see an issue with adding synchronisation, there's no fundamental reason why having endpoint consumers acknowledge updates before terminating removed pods would be horrifically expensive. Especially with endpoint slices.
With 10,000 nodes running kube-proxy it is a bit expensive and, more importantly: error prone. A problem on a single node that wasn't even talking to the app could stop that app from exiting indefinitely if acks were required and clusters this size already do gigabits of traffic in endpoints watches.

Additionally, there's no acks possible for clients of headless services, so just kube-proxy handling this doesn't go far enough.

But yeah, maybe accept that as a tradeoff for clusterip services, but more deeply integrate the real load balancer options.