|
|
|
|
|
by kazinator
236 days ago
|
|
This sounds like a case of the alerts being badly tuned. If you are distracted by a high CPU alert that turns out to just be an expected spike, the alert needs to filter out spikes and only report persistent high CPU situations. Think of how the body and brain report pain. Sudden new pain is more noticeable than a chronic background level of pain. Maybe CPU alarms should be that: CPU activity which is (a) X percent above the 24 hour average, (b) persistently for a certain duration, is alarm worthy. So 100% CPU wouldn't be alarm-worthy for a node that is always at 100% CPU, as an expected condition: very busy sytem constantly loaded down. But 45% CPU for 30 minutes could be alarm-worthy on a machine that averages 15%. Kind of thing. |
|