Hacker News new | ask | show | jobs
by what2 774 days ago
You need to implement a deadman switch. For example if using Prometheus you can configure it to access an HTTP endpoint of a deadman switch service every X seconds. When that service detects you have not accessed it in some time it will alert you.

For example: https://blog.ediri.io/how-to-set-up-a-dead-mans-switch-in-pr...

1 comments

And now you need to monitor the deadman switch service!
Oh, you just run two instances of those and point them at each other.
And then a third instance in case those both go down at the same time, and a fourth just in case there's a major worldwide outage.... it's monitoring instances all the way down
Even number so that everything is symmetrical.
You want odd numbers (n+1) so you can have a quorum in case of network partition.
I think you're missing the vibe (sarcasm).
Typically that's why people use dead man's switch as a service. You don't assume that they'll never go down, but you're paying for someone who's failures are very likely uncorrelated with your own.