|
|
|
|
|
by baby_souffle
667 days ago
|
|
> I'm still baffled how pull-based monitoring gained traction, probably because modern gens need to repeat mistakes from the past. For us, knowing immediate who should have had data on last scrape but didn’t respond is the value. What mistakes are you referring to? ( I am genuinely curious, not bating you into an argument! ) |
|
Maybe I don't understand your use case well, but with tools like Riemann, you can detect stalled metrics (per host or service), who didn't send the data on time, etc.
> What mistakes are you referring to?
Besides scaling issues and having a simpler architecture, in Zabbix's case, there were issues with predictability: when the server would start to pull the metrics (different metrics could have different cadence) when the main Zabbix service was reloaded, had connection issues, or was oversaturated with stuck threads because some agents took more time to respond than the others. This is not only Zabbix-specific but a common challenge when a central place has to go around, query things, and wait for a response.