|
|
|
|
|
by embedding-shape
38 days ago
|
|
Why wouldn't your monitoring system alert you when metrics suddenly disappear? Sounds like you need a better monitoring system, prometheus is not gonna magically solve that problem for you. No wonder you were having issues with prometheus... |
|
Of course the systems that have to alert me to failure have to be designed with mechanisms to alert me to the fact that they themselves are failing.
Zabbix, Nagios, Munin -- practically everything that existed before: understood this.
Prometheus doesn't because it optimised intentionally for being easy to deploy and for there being a hierarchy of prometheus's in a tree-like formation. Which makes sense, but forces a much more distributed and difficult to reason model.
Monitoring systems can't be designed for the happy path. By definition, they only matter when things are going wrong- which is precisely when the happy path isn't available. Prometheus is excellent when everything is fine (scaling aside). That's not when you need your monitoring system to be excellent.