Hacker News new | ask | show | jobs
by marcosdumay 4509 days ago
> and have set up a basic ssmtp check to SMS us if there is an issue.

And what will happen when the network (or the alert server) is down?

You must put some check outside your network, with independent infrastructure. Adding another protocol on the same net is still subject to Murphy law.

1 comments

Independent infrastructure is a good idea but not always feasible for everyone. At OpsGenie, to resolve this problem, we came up with a solution we refer as "heartbeat monitoring". This basically allows monitoring tools to send periodic heartbeat messages to us that indicate that the tools is up and can reach us. If we don't receive heartbeat messages from them in 10 minutes, we generate an alert and notify the admins. Not out of band management but does the trick to prevent situations like jsmeaton described.

http://support.opsgenie.com/customer/portal/articles/759603-...