Hacker News new | ask | show | jobs
by patio11 3904 days ago
Seconded. DMS is the easiest thing to just drop-in on the Nth cron job you add. Eventually you might need something more complicated for monitoring/outages/etc, and that something is probably either a whole lot of Nagios and bailing wire and/or PagerDuty, but DMS is perfect for "I really need Tarsnap backups to not just silently fail."

I also end up creating a lot of Twilio scripts which are either positive control or negative control for the call/SMS, depending on how critical the thing is that I'm monitoring. For example, one of my sites updates an /api/healthcheck result with a timestamp every five minutes if everything is going peachy, and another box polling that endpoint blows up my phone if it fails to get HTTP 200 and a timestamp within the last five minutes. (This works, but I swear I need to tweak it just a wee bit, as today I had my once quarterly woken-up-at-4-AM-because-gremlins-ate-a-single-HTTP-request.)

1 comments

This reminds me of https://docs.google.com/a/gravitant.com/document/d/199PqyG3U... on how you should only wake up engineers when there really is a problem. I'd suggest logging based on error messages -- though I get it, if a problem occurs upstream, you wouldn't know it unless you'd polled for it too, as a data point. HN comments on that doc at: https://news.ycombinator.com/item?id=8450147