|
|
|
|
|
by dijit
44 days ago
|
|
Yes, my monitoring system not alerting me when the systems it runs on are failing is the entire problem. That's not a general "software breaks when disks fail" situation: that's a monitoring system failing at its one job. Your monitoring system failing silently when your infrastructure is under stress is precisely the failure mode that monitoring exists to prevent. Zabbix solves this with native HA and self-checks. Prometheus makes it your problem to solve with external tooling, and most people don't, until they need it. |
|