Hacker News new | ask | show | jobs
by cramjabsyn 1056 days ago
> The reasonable way to notice is to have alerts for any unexpected restarts. Relying on noticing intermittent service disruption is bound to fail.

I'd argue that unexpected restarts should alert beyond a threshold. Alerting on every occurrence is too noisy. If an individual unit failure causes a service disruption architecture improvements are needed.

1 comments

Don't you want to know if something restarts unexpectedly? It's a bug that should be understood and fixed. (If it's not a bug then it's not unexpected.)