| I try to monitor everything because it can get much more accessible to debug weird issues when sh*t hits the fan. > Do you also keep tabs on network performance, processes, services, or other metrics? Everything :) > What's your take—would you trust a simple, bespoke agent, or would you feel more secure with a well-established solution? I went with collected [1] and Telegraf [2] simply because they support tons of modules and are very stable. However, I have a couple of bespoke agents where neither collected nor Telegraf will fit. > Lastly, what's your preference for data collection—do you prefer an agent that pulls data or one that pushes it to the monitoring system? We can argue to death, but I'm for push-based agents all the way down. It is much easier to scale, and things are painless to manage when the right tool is used (I'm using Riemann [3] for shaping, routing, and alerting). I used to run Zabbix setup, and scaling was always the issue (Zabbix is pull-based). I'm still baffled how pull-based monitoring gained traction, probably because modern gens need to repeat mistakes from the past. [1] https://www.collectd.org/ [2] https://www.influxdata.com/time-series-platform/telegraf/ [3] https://riemann.io/ |
For us, knowing immediate who should have had data on last scrape but didn’t respond is the value. What mistakes are you referring to?
( I am genuinely curious, not bating you into an argument! )