|
|
|
|
|
by glenjamin
458 days ago
|
|
The fact that the state of the art container orchestration system requires you to run a sleep command in order to not drop traffic on the floor is a travesty of system design. We had perfectly good rolling deploys before k8s came on the scene, but k8s insistence on a single-phase deployment process means we end up with this silly workaround. I yelled into the void about this once and I was told that this was inevitable because it's an eventually consistent distributed system. I'm pretty sure it could still have had a 2 phase pod shutdown by encoding a timeout on the first stage. Sure, it would have made some internals require more complex state - but isn't that the point of k8s? Instead everyone has to rediscover the sleep hack over and over again. |
|
Thus, this entire system of "Mark me not ready, wait for ALB/NLB to realize I'm not ready and stop sending traffic, wait for that to finish, terminate and Kubernetes continues with rollout."
You would have same problem if you just started up new images in autoscaling group and randomly SSH into old ones and running "shutdown -h now". ALB would be shocked by sudden departure of VMs and you would probably get traffic going to old VMs until health checks caught up.
EDIT: Azure/GCP have same issue if you use their provided ALBs.