Hacker News new | ask | show | jobs
by aristofun 236 days ago
> investigating "false positive" infrastructure alerts?

Gradually with each false positive (or negative) you learn to tweak your alerts and update dashboards to reduce the noise as much as possible.

1 comments

So it's really a manual and iterative process....means there should be room for something to be done
You learn pretty quick. Like CPU I don’t alert on it, I do on load average which is more realistic. I’m also solo dev, so I do it on the 15min avg and it need to be above a pretty high threshold 3 times in a row. I don’t monitor ram usage, but swap instead. When it trigger it usually something need to be fixed.

Also check for a monitoring solution with quorum, that way you don’t get bothered by false positives because of a peering issue between your monitoring location and your app (which you have no control over).