Hacker News new | ask | show | jobs
by Jedd 2144 days ago
We've got a somewhat similar landscape, on a pretty sizeable network - big investment in Zabbix and looking to move, perhaps slowly and perhaps only in part, towards Prometheus.

Coming from a monitoring system that supports push and pull with elegant auto-discovery, we're struggling to work out a sane architecture around (effectively pull-only) Prometheus.

3 comments

Yeah, I think we've looked at that. It provides push for the last mile, and I suppose you could wrangle some auto-discovery using that tooling, but you're still doing pull from Prometheus to that/those server(s).

We're still a bit stuck trying to replicate all the make-life-easy functionality we get with Zabbix sitting on a honking great PostgreSQL / Timescale database, with a bunch of proxies, and automated agent installs that auto-register.

There's places that doesn't work well (k8s, f.e.) but for conventional fleet metrics it's difficult to abandon.

Yeah true, we find it easy for us because we're using K8s annotations for Prometheus scrape target discovery, so the gateway is just another target, and we're not running too many ephemeral jobs that we need more than one gateway.
What are the benefits or main reason for replacing zabbix with prometheus, especially since zabbix introduced prometheus checks?
Good and valid question.

I expect we won't outright replace, but rather augment, especially in spaces where a host-centric tool like Zabbix isn't ideal.

Partly it's driven by a need to monitor things like k8s (in the form of openshift) and pub/sub systems (eg kafka), and to integrate with other data sources (eg elastic).

Possibly more compelling is the need to do more sophisticated things with our data than we can conveniently accomplish with the Zabbix data store -- it's not the DB performance or scalability (PostgreSQL and optionally TimescaleDB) so much as dealing with the schema. Mildly sophisticated wrangling of our data ranges from difficult to impossible.

There's a couple of ways around that - bespoke tooling to facilitate ad hoc interrogations into the DB, duplicate the data at ingest time into multiple datastores, frequent ETL of the Zabbix SQL data into long term (time series) storage. None of these are great options. Plus we're fans of Grafana, so some of our decisions are, and will be, based around maintaining or improving end-user experience of that tool -- and while the Zabbix integration is excellent, the Prometheus integration is even better, so (on the end-user side) that's a highly compelling path.

Take a look at VictoriaMetrics. It supports both pull and push models. It is inspired by Prometheus and it supports PromQL-inspired query language - MetricsQL [0].

[0] https://victoriametrics.github.io/MetricsQL.html

Yup - it's on our radar for evaluation.