Hacker News new | ask | show | jobs
by cheald 2156 days ago
Prometheus has AlertManager which provides a framework for incident notification (we route incidents to Mattermost and PagerDuty, for example; PD ends up being our big incident response tool, which lets us cascade into a variety of "wake the sysadmin up" methods). It doesn't do APM, but it wouldn't be difficult to expose a Prometheus agent for your APM (just like you'd expose metrics for anything else you want to monitor).

I appreciate new tools, but I do think it's fair to ask what it does better than the existing tools. Prometheus' biggest problem is its learning curve, IMO, so there might be some gains to be made there, but after using it, I think the learning curve is a function of its architecture, which is a large part of what makes it so resilient. If it can be improved while maintaining (or improving on) resilience, awesome, but I personally know that I won't sleep well at night if my monitoring service isn't rock-solid.