Hacker News new | ask | show | jobs
by Sin2x 1367 days ago
Prometheus and other modern application monitoring oriented timeseries solution stacks seem to be an overkill for simple server infrastructure monitoring, Zabbix is easier to setup and use and has all the batteries included.
4 comments

I hadn't known about Zabbix (or netdata) and I'm currently evaluating our options for monitoring. I was going to explore in the direction of Prometheus (+ Grafana), so I'm glad to know about these other seemingly more straightforward and simple options. Thanks for mentioning it.
Currently using Zabbix for monitoring my own servers. It's... okay.

The UI sometimes feels a bit dated and not everything is as straightforward as one might expect, but for my use case (monitoring a bunch of GNU/Linux hosts) it's sufficient.

Some of the things that are good about it:

  - can be run in containers if you want to, use a familiar DB like MySQL/MariaDB/PostgreSQL
  - the agent installation on the hosts that you want to manage is also pretty simple
  - supports both active (monitored host sends data to Zabbix) and passive (Zabbix asks the host for data) configurations
  - depending on the template that you use, has lots of built in metrics out of the box
  - easy to integrate with something like e-mail or SMS messaging for alerting, other plugins exist AFAIK
  - also has built in alerts, such as when disk space is low, CPU load is high, memory usage is high, swap space is low, host is unreachable etc.
  - has the ability to build dashboards with graphs, network maps etc.
Some of the less nice points about it:

  - last I checked (older version, now using Uptime Kuma for this use case) the web monitoring didn't send alerts by default
  - last I checked (since haven't bothered to set up again) maintenance windows straight up didn't work and still sent notifications about the agent going down
  - in general the UI can be a bit cumbersome and there is some legacy to be found, at least a while back there were legacy graphs and the "modern" ones, each of which had different sets of functionality available
  - sometimes certain parts of the OS template also decided not to work, e.g. currently have disk usage not showing up in like 1/6 graphs, even though the configuration is pretty much the same for all of the nodes in question
  - in general, it's just not as popular as other solutions and might not have as much tutorials around setting things up in it
I'd probably compare Zabbix against something like Nagios or LibreNMS, rather than netdata or Prometheus/Grafana, but perhaps that's just me.

Edit: That said, the docs of Zabbix are pretty decent.

Here's how configuring something in it typically looks like in the UI: https://www.zabbix.com/documentation/current/en/manual/web_m...

Here's an example of some of the dashboard functionality: https://www.zabbix.com/documentation/6.2/en/manual/web_inter...

And here's maps that you can embed in the dashboard: https://www.zabbix.com/documentation/6.2/en/manual/web_inter...

> - supports both active (monitored host sends data to Zabbix) and passive (Zabbix asks the host for data) configurations

But this requires different templates for each mode.

Why they did this way is beyond me... though I have a suspicion what they aren't using it themselves on anything than a tiny lab with a couple dozens of hosts.

Thanks for the thorough review and for sharing your experience. Very helpful!
I'm working on a side project that will essentially cover uptime and metrics monitoring for servers, as I find that trying to roll out prometheus monitoring across large numbers of servers across multiple organisations and setups is too much effort. I want to be able to quickly and easily add servers to a dashboard, set a number of alarms, and then not have to worry. Anything happens, an alarm trips, offline etc, and I get an email/app/slack/discord notification, etc.

I looked around but didn't really find anything that fit for me. There are a lot of complicated (albeit powerful) options, but I want simple, easy, lightweight, quick. These days I'm juggling so much, I want to be as efficient as possible with my time.

It's still early days but hoping to be able to onboard people towards the end of the year, for anyone who is interested feel free to join the waitlist: https://serverduty.co

Interestingly, I'm using ServerDuty to monitor ServerDuty as I build ServerDuty. I mean, if that isn't dogfooding, I don't know what is.

In what way do you think it's an overkill? As in, what's too much / gets in the way? I've gone exactly the opposite way with: I don't want to deal with a highly opinionated and integrated thing like zabbix if I can put a 3 small things together (grafana, influx, telegraf in my case) and have a simple system monitoring that also can handle anything I want to throw at it.
You already need to deal with 3 things as opposed to one and learn PromQL to boot. If you want to have dashboards you also need to learn and use Grafana. I used both Prometheus+Thanos+Grafana stack and Zabbix in production and can say with certainty that the latter is much easier to use and set up for infra monitoring than the former. You need a whole dedicated observability team to use Prometheus effeciently.
I'd disagree as someone running grafana+influx+telegraf for a few reasons.

Dealing with 3 things... feels about as much work as zabbix. That one requires server + db + agents. Basically the same components I'm running.

Promql/flux - while I know it, I almost never use it - instead in grafana click the database name, metric name, aggregation and I'm done.

While you need people to use it effectively, it's the same for everything, including mysql for zabbix. You can dump prometheus/influx somewhere with no configuration and survive for quite a long time.

Agree with this take, depending on just how simple a home setup is.

If you're going to be working with it on the job, Prometheus is relatively simple to provision, though, and can provide some experience if you use it at home.