Hacker News new | ask | show | jobs
by wbh1 2606 days ago
Surprised to see Prometheus hasn't been mentioned yet, and even Nagios is being mentioned as a better alternative. My company (higher-ed, ~100k combined students/fac/staff) is desperately trying to get away from Nagios. Once you get Nagios to the scale where you have to implement mod_gearman, you've gone too far.

I'd recommend taking a look at Prometheus[1]. It has its own _very_ performant TSDB, there's exporters for just about everything, it's the defacto way that things like Kubernetes expose metrics, and it has first class support in Grafana for visualization.

We POC'd Zabbix, Icinga, ScienceLogic, Instana, Sensu, and Prometheus. Prometheus was our favorite. Take a look at the comparison between it and other popular monitoring products to see if it fits your needs though [2].

[1] https://github.com/prometheus/prometheus [2] https://prometheus.io/docs/introduction/comparison/

1 comments

The problem I have with Prometheus is, I have most of my nodes in very closed networks I don't have control (Healthcare) and I can't set up proxies so Prometheus can reach them, I can only go outside. So, by now, my best option seems to be InfluxDB, which doesn't look bad to me.
I've been using InfluxDB for ~3 years now for storing metrics (almost exclusively via Telegraf, a few custom ones), and it has been great! It replaced a collectd setup and dramatically decreased load across my fleet.

When I first started using it, it was pretty early and had some issues. In fact, I nearly trashed it. I also didn't like the pull vs. push model from Prometheus. They ended up resolving the InfluxDB issues I was having right as I was about to give up on it, and it's been solid since. I use it with Grafana to generate graphs of system use. I set it up before TICK was a thing.

I was about to like InfluxDB but ever since people say it eats memory and your data, I stopped caring.

https://github.com/VictoriaMetrics/VictoriaMetrics/wiki/FAQ

("How does VictoriaMetrics compare to InfluxDB?")

That hasn't been my experience. I've been running it for ~3 years in our dev, stg, and prod environments. Prod is using 1.5GB of RAM on a 5GB instance. I've never had a data loss issue.
I'd recommend giving a try to VictoriaMetrics. It requires less hardware resources - RAM, CPU, disk - comparing to InfluxDB [1] and it supports PromQL - much nicer query language for typical time series queries comparing to InfluxQL or Flux [2]. It may be used as a drop-in replacement instead of InfluxDB on the ingestion path [3].

[1] https://medium.com/@valyala/insert-benchmarks-with-inch-infl...

[2] https://medium.com/@valyala/promql-tutorial-for-beginners-9a...

[3] https://github.com/VictoriaMetrics/VictoriaMetrics/wiki/Sing...