Hacker News new | ask | show | jobs
by zaphar 806 days ago
This is correct. You need some kind of running check on the environment and when possible code that handle exceptional cases.

Sometimes that's as simple as a service that shoots other services in the head to restart them. Othertimes it's more complicated. But lot's of places can't afford to get more complicated than "alert a human and have them look at it".