Hacker News new | ask | show | jobs
by ttymck 774 days ago
For my applications, monitored by prometheus + grafana, we have alerts when no data is reported for certain metrics in the past 5 minutes, indicating a malfunction in the subsystem.

With a metric, you can use a monotonic counter to serve as a heartbeat. A timestamp would work. In your monitoring system, when the heartbeat value has not increased in X minutes, you alert.