| > Hi dozie, It's two "z" there. > [...] icinga aims to maintain backwards compatibility with its historically-derived nagios configuration file syntax which is difficult to understand and hard to parse in an automated fashion. Parsing Icinga/Nagios configuration file is easy, even if you count object
templates (register=0 and use foo) and use handcrafted parser instead of
generated one. The syntax is not a complicated one. I don't know what problems
have you encountered. > On the other hand, there are exampls of architectural choices that I believe icinga gets right: It implements an approach to secure and authenticated metrics collection that virtually every other monitoring system leaves as an "exercise" for the user. Oh, this is more or less easy task if your monitoring system has some secret
(e.g. X.509 certificate) exchanged with the monitored hosts, and can be bolted
on pretty much any monitoring system with some stunnel-fu (which proves that
it's nothing on the architecture side of the system). It's sharing that secret in a robust and automatic way that is quite
difficult. I doubt Icinga does anything better than the rest of the crowd. > It provides checks and alerts and notification thershholds by default, which many other monitoring systems don't. Once you have an established flow of monitoring messages, then thresholds,
alerts, and notification become simple stream processing and consumption.
Sure, Icinga and others give you this simple processing in the package, and
some systems give you a few more queries than others (e.g. originally Nagios
only processed what I call "state", while Cacti only processed metrics, and
Zabbix processes both). But this processing rarely is complex. And none of
them give you an ability to process the data stream itself. And then there is this almost universally shared requirement that you need to
define all the instances of hosts and services beforehand, only differing how
the template system is implemented in a given monitoring system. You can't just start collecting data about servers as they get installed and
about services and resources as they emerge and disappear. No, you need to
tell the monitoring system that it should expect data from this host and this
service (collectd, Graphite, and InfluxDB got it right here). It is useful sometimes for monitoring system to expect some data to show up
(and possibly alert that it's missing), only sometimes, not all the time.
Usually other data can easily cover the scenario where something silently goes
down, and there's still this "stream processing" thing I mentioned that can
just monitor that some data stopped being received. |
You can do that in Zabbix, it has two functions to discover hosts (called "network discovery") or "features" like running services or e.g. switch ports (called "low level discovery") and apply templates based on certain query results (open ports, SNMP values etc.). Alerting the lack of new incoming data is also possible. I used Zabbix a lot until a year ago and liked it much more than Nagios+descendants.