|
|
|
|
|
by pikahumu
1056 days ago
|
|
The reasonable way to notice is to have alerts for any unexpected restarts. Relying on noticing intermittent service disruption is bound to fail. And so is "remembering to check for this": > in the future I'm going to want to remember to check for this Whenever you think that sentence, you should notice this as a red flag and re-think your approach. You will forget. And if not you, then somebody else in your team. You need automation for things you can forget, otherwise your mental checklists will grow too large to handle and are just a distraction. |
|
I'd argue that unexpected restarts should alert beyond a threshold. Alerting on every occurrence is too noisy. If an individual unit failure causes a service disruption architecture improvements are needed.