| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by toast0 1276 days ago
	This time of year, you do get the pages for things that were always broken, but nobody noticed before, because they only show up when the system has been running without changes for more than two weeks.

1 comments

purrcat259 1276 days ago

This actually happened to us last week in fact.

No deployments revealed how a legacy background processor started losing connections to the message queue and gets stuck in a state where it never reconnects.

Deployments always cycled the pods before the issue manifested.

link

jldugger 1275 days ago

This is something a (now former) colleague of mine pointed out: that the kubernetes descheduler can enforce a maximum lifetime[0] that sort of forces continual reboots. So if your system cannot tolerate running for a long time continously, this is one method to gracefully restart long running pods.

[0]: https://github.com/kubernetes-sigs/descheduler#podlifetime

link