Hacker News new | ask | show | jobs
by gedrap 3697 days ago
Looks like a nice release! One thing I really miss in grafana and seems like it's not included is alerts.

It would be so damn convenient to have data visualization and alerts on the same system because usually they are strongly related from the user point of view. And, well, one thing less to setup and maintain.

However, I am aware of the debate whether alerts do actually belong to grafana, or should it be responsible for visualization only and seems like they have settled with the later. Which definitely makes sense because once you start to expand to alerting it's a whole new world and I respect the choice. So yeah, I am a bit sad as a user, however I totally get the authors.

Maybe it will be available as a plugin?

That being said... What tools HNers are using for placing alerts on data stored in graphite?

5 comments

I'd love to hear how others manage it too. I have a bunch of little python scripts in cron jobs that pull / compare numbers from graphite and then post to slack. Adhoc but at least there was nothing much to set up / maintain and it's totally flexible.

As things expand though I'd definitely like to move to something to look after it for me.

We're extremely happy campers here combining the Grafana dashboard with https://prometheus.io/ Datastore and Alerts. That project has some serious traction and is one of just a few that seems actually built for the cloud and distributed systems first, as it's primarily role and not host based.
Can I ask how you manage logs?

My minimal understanding is that prometheus is time-series only so you'd have to supplement with something like ElasticSearch to aggregate logs. Does this mean you are alerting only on metrics or have multiple alert systems or ...?

We run Grafana for production dashboards with a KairosDB (we are a C* shop) and use ELK for text logs. Grafana can add annotations from ElasticSearch, but beyond that we are looking at our ES alerting options.
Interesting. I briefly looked at it previously but have never had the chance to play around.

Did you (or has anyone else) migrate your legacy data from graphite to prometheus? What's the grafana support like on top? Do you need to run a proxy or something to support the same querying or does it require reworking all the existing graphs?

Prometheus developer here.

> Did you (or has anyone else) migrate your legacy data from graphite to prometheus?

Prometheus isn't intended as a long-term data store, so there's usually not much point. The data model is also quite different.

> What's the grafana support like on top?

Grafana supports Prometheus as a 1st class integration, and when I was speaking with the Raintank team in person they are very supportive of Prometheus.

> Do you need to run a proxy or something to support the same querying or does it require reworking all the existing graphs?

The data model and query language is more powerful and very different, you have to redo all the graphs.

To aid transition we have https://github.com/prometheus/graphite_exporter which will take in graphite formatted exports from your clients, and convert it to the format Prometheus likes.

The one thing that holds me back is the pull nature of Prometheus. We are pushing metrics already, so moving to a pull model feels like a return to the 20th century.
It actually tends to work a lot better. You almost never actually need metric granularity so fine that push is necessary. If you do, your usually using a purpose built tool for that. It also saves you from accidentally DOSing yourself when someone starts unexpectedly emitting more metrics than you can ingest (due to bug, more traffic, etc). All doable with push based systems but they tend to end up using queues or something to compensate which is its own kind of pain.
Actually, you probably want to have both. Push is great when you are collecting results of some operations, otherwise you would have to save data somewhere until it is fetched.
The https://github.com/prometheus/pushgateway serves those cases, primarily service-level batch jobs.
We use Sensu with the graphite plugins[1] for alerting based on graphite queries. It works quite well, but you do need to set up Sensu server – which is very straightforward in my experience

[1]: https://github.com/sensu-plugins/sensu-plugins-graphite

I use icinga2 (nagios-like monitoring) to store perfdata/metrics in graphite, then use Grafana to visualize those stats.
I've only had a cursory glance over it but there's Moira for creating alerts based off graphite metrics

http://moira.readthedocs.io/en/latest/

Alerting is in the pipeline according to https://github.com/grafana/grafana/issues/2209