Hacker News new | ask | show | jobs
by ericd 5022 days ago
Actually, false positives with alarms are really, really bad - I will start to ignore them very quickly if they're not reliable indicators of an actual problem.

EDIT: If you have reasonable default tolerances or the ability to set tolerances on tasks, I'm pretty interested in trying it out - do you integrate or have plans to integrate with pagerduty, or do you simply fire off an email?

1 comments

What I'm saying is, in the rare chance we are down and your service can't check in, that one time we would send a false positive. We wouldn't be flapping between off and on sending you a lot of false positives. The resolution on our checks is so large (the smallest is an hour) that we won't be flooding you with emails in any case.

We don't replace something like monit to make sure your process continues to run, we are validation that one-off periodic things run... things that are easy to forget about but are important.

Yeah, I'm not terribly worried by that kind of failure, if that happens, a bunch of other things have already gone wrong. I would like something to keep track of all processes, though, that can back up monit. Good to know it's got some slack in the tolerances, I'll give it a try.