Hacker News new | ask | show | jobs
by toast0 1276 days ago
This time of year, you do get the pages for things that were always broken, but nobody noticed before, because they only show up when the system has been running without changes for more than two weeks.
1 comments

This actually happened to us last week in fact.

No deployments revealed how a legacy background processor started losing connections to the message queue and gets stuck in a state where it never reconnects.

Deployments always cycled the pods before the issue manifested.

This is something a (now former) colleague of mine pointed out: that the kubernetes descheduler can enforce a maximum lifetime[0] that sort of forces continual reboots. So if your system cannot tolerate running for a long time continously, this is one method to gracefully restart long running pods.

[0]: https://github.com/kubernetes-sigs/descheduler#podlifetime