Hacker News new | ask | show | jobs
by ydant 1433 days ago
Thank you for the response!

I saw you make that suggestion on this issue - https://github.com/healthchecks/healthchecks/issues/525#issu...

----

Thinking about it, this does solve the issue as I described it. I do like being able to distinguish the states:

  - started, but never finished (no error reported)
  - started, and finished with error reported ("crash") (need immediate alert)
  - finished (without crashing), but not 100% successful (data not fetched)
  - finished successfully
As you mention, it makes sense to have the alerts be:

  - no successful completion (regardless of number of attempts) within X time
  - explicit error occurred
I think your /log approach does have the advantage of allowing for still having an explicit error alert regardless of duration - a critical error "alert NOW!" state.

The only (weak) argument against this approach that I see (and this is an argument for putting this as a configuration option) - is that the reason I started using HealthChecks.io is because it's incredibly simple to set up for a cron job. Moving this logic to the client means slightly more complicated error handling logic to call the right endpoint for which type of failure.

The counter-argument is by the time you move from calling just "/success" to calling multiple endpoints, you're already in that position of more complicated client-side logic. If you want the simple "just run at least once every X hours" approach, then all you need to do is never call "fail" and set the grace period appropriately.

For our use-case, our logic for when to alert/not got much more complicated than described so the move to doing the rules in our code still made sense, but I think there are some other instances where we'd benefit from your proposal.