| Thank you for the response! I saw you make that suggestion on this issue - https://github.com/healthchecks/healthchecks/issues/525#issu... ---- Thinking about it, this does solve the issue as I described it. I do like being able to distinguish the states: - started, but never finished (no error reported)
- started, and finished with error reported ("crash") (need immediate alert)
- finished (without crashing), but not 100% successful (data not fetched)
- finished successfully
As you mention, it makes sense to have the alerts be: - no successful completion (regardless of number of attempts) within X time
- explicit error occurred
I think your /log approach does have the advantage of allowing for still having an explicit error alert regardless of duration - a critical error "alert NOW!" state.The only (weak) argument against this approach that I see (and this is an argument for putting this as a configuration option) - is that the reason I started using HealthChecks.io is because it's incredibly simple to set up for a cron job. Moving this logic to the client means slightly more complicated error handling logic to call the right endpoint for which type of failure. The counter-argument is by the time you move from calling just "/success" to calling multiple endpoints, you're already in that position of more complicated client-side logic. If you want the simple "just run at least once every X hours" approach, then all you need to do is never call "fail" and set the grace period appropriately. For our use-case, our logic for when to alert/not got much more complicated than described so the move to doing the rules in our code still made sense, but I think there are some other instances where we'd benefit from your proposal. |