Hacker News new | ask | show | jobs
by nathancahill 3904 days ago
I've been using Dead Man's Snitch[0] in production for a few years. It's been a life saver. Not affiliated, just a happy customer.

[0] https://deadmanssnitch.com/

3 comments

Seconded. DMS is the easiest thing to just drop-in on the Nth cron job you add. Eventually you might need something more complicated for monitoring/outages/etc, and that something is probably either a whole lot of Nagios and bailing wire and/or PagerDuty, but DMS is perfect for "I really need Tarsnap backups to not just silently fail."

I also end up creating a lot of Twilio scripts which are either positive control or negative control for the call/SMS, depending on how critical the thing is that I'm monitoring. For example, one of my sites updates an /api/healthcheck result with a timestamp every five minutes if everything is going peachy, and another box polling that endpoint blows up my phone if it fails to get HTTP 200 and a timestamp within the last five minutes. (This works, but I swear I need to tweak it just a wee bit, as today I had my once quarterly woken-up-at-4-AM-because-gremlins-ate-a-single-HTTP-request.)

This reminds me of https://docs.google.com/a/gravitant.com/document/d/199PqyG3U... on how you should only wake up engineers when there really is a problem. I'd suggest logging based on error messages -- though I get it, if a problem occurs upstream, you wouldn't know it unless you'd polled for it too, as a data point. HN comments on that doc at: https://news.ycombinator.com/item?id=8450147
Shameless plug: https://healthchecks.io Same idea, open source
Healthchecks.io looks really interesting, both because it's an open source django project and because I was disappointed with Dead Man's Snitch. DMS forces me to live within their timing for running checks -- If you have something that has to occur @ 3am every morning, you won't know it failed until midnight UTC later that day, or when a customer calls to complain.

Healthchecks handles this a lot more sensibly. I might throw it on a linode and give it a shot. Thanks for releasing it.

Wow, that's awesome. That really is the biggest problem with DMS. I asked them about that feature a couple years ago, they said it was on the roadmap. Might ping them again.
I'll throw in a vote for DMS. I use it at work to verify that our cron jobs ran successfully. Dead simple and very effective.