Hacker News new | ask | show | jobs
by user5994461 3490 days ago
Quick note for the ones who are tired of the giant clusterfuck of open-source tools for monitoring + alerting + storage + other, which is no less than:

- statsd

- collectd

- graphite

- whisper

- carbon

- prometheus

- grafana

- seyren

- riemann

- nagios

- icinga

- zabbix

There are multiple modern SaaS software that will do all of that in a single tool with better integrations, more polish, less work and no maintenance.

1) See https://www.datadoghq.com and last news https://techcrunch.com/2016/01/12/investors-feed-datadog-a-h...

2) https://signalfx.com/ and last news https://techcrunch.com/2015/03/12/signalfx-emerges-from-stea...

3) http://www.bmcsoftware.uk/it-solutions/truesight.html if you're not anti entreprisey (that was the "Boundary" startup, bought by BMC a few years ago and integrated in their offerings).

And don't think that they are "new" fancy tools. They've been around for many years.

8 comments

Agree that the SaaS offerings are a lot more turnkey; the integrations and polish that make all the difference.

What you call a 'clusterfuck' is really a wider ecosystem. It would be pretty crazy for a single organization to use all or even most of the tools that you list.

Right now, people accept high degrees of cost (especially for at scale users) and lock-in, in exchange for the convenience of SaaS. Or, they go open source (which to your point, certainly is an investment in time)

Watch out for what team Grafana will be doing in 2017. Our plan is to provide a fully turnkey, hosted offering based around Grafana (and a handful of other open source tools). OpenSaaS.

We hope that for many users, this can be a third choice, and in some ways the best of both worlds.

No offense but Grafana is only as good as the weaker piece in the monitoring chain.

Having nice graphs is nice... until they fall apart because the source is unavailable.

And that doesn't help with alerts either. (I tested the alerts in the v4 beta, it's just not comparable to the better alerting tools out there).

No offense taken ;) You're spot on about needing a solid and scalable backend; it’s more than 'nice graphs'. We think Grafana is a great piece in the chain to start with. We're trying to put as much momentum behind it as our burgeoning company will support.

The alerting in v4.0 is just the beginning. Torkel and the team have tried to optimize for the “relatively simple" 80% of alert use cases.

We are fans of other, more sophisticated open source alerting tools like Bosun, and you can be sure that we'll be both improving our alerting capabilities in 4.x

what are you missing compared to other alerting tools?
As long as enterprises can will understand that they can get support options for Grafana(on-prem, SaaS, etc.) it just comes down to choosing the most economic option. I see benefit in symmetry for enterprise who has hybrid or still mostly in their datacenter.
For installations of a few hundred instances or more, some of the SaaS offerings cost more than the engineering salaries it would take to maintain the OSS tools.
Shame that many of the OSS tools do not have any sort of corporate sponsorship, or if they do, that it doesn't cover all the work that goes into releasing OSS in this space.

Note: I am one of the maintainers of Diamond, a metrics collection tool written in python. https://github.com/python-diamond/Diamond

Unfortunately, the post doesn't share things like: how much infra is needed and how much does it cost, how much time it took to set up, how much maintenance it needs, how long upgrades of the setup take, how much time future hacking of missing features will take, and so on. After that sort of stuff is truthfully taken into account I suspect most if not all savings would be lost.
To have been on the maintainer sides of the OSS tools, your statement is untrue.

The OSS tools costs a fortune in human to maintain them, and another fortune in hardware to run it.

Datadog will cost you $165,600 a year for 600 hosts. That is objectively equal to a very well paid engineer. So no, the statement is not untrue.

(I picked 600 because that was the approximate number of machines we had at my last job, where we used Graphite maintained by one guy, part time).

You included a LOT of redundancy in your OSS list. Multiple timeseries databases. Multiple collection daemons. Multiple dashboards. Multiple alerting systems (Who in their right mind would use Nagios AND Icinga?). You're effectively arguing about maintaining multiple monitoring stacks, some of which are quited aged.

Yikes. I'm sure there's discount pricing available but some of us have tens of thousands of hosts to monitor. The pricing you quoted doesn't scale. For me it might be cheaper to collect with OSS and graph with SaaS.
Indeed, I gave a list of all the tools, you only need to make a stack of about 4 to 8 of them to get the job done.

Let's say statsd + collectd (metrics collection) + graphite (aggregation) + carbon/whisper (graphite storage) + icinga (alerting) + grafana (graphing). That doesn't exactly come easy.

No offense but a single graphite is not a monitoring solution. It's just the tip of the iceberg. Monitoring does take a lot of engineering work and a lot of maintenance. You won't get away operating 600 hosts on the cheap, just think about how much are the hosts themselves.

Let's talk about how much Amazon will charge you for 600 instances a year...
For microservices based architectures, things like OpenTracing can go a long way in de-cluttering the clusterfuck. Of course, it requires developers being up to speed on distributed tracing, which isn't the case across the board. http://opentracing.io
s/clusterfuck/ecosystem/

A typical system doesn't use all of the tools above. You use what fits you and many of the tools play pretty well together. I've had luck with Icinga2 and Grafana lately, for example, which integrated quite smoothly out of the box.

And you have include in the price the problems of a paid app:

- customization will be very expensive, if not impossible

- you must have people for the procurement process (x10 more costly if you are in a gov agency),

- weird failures due to not finding the license,

- your cheap personal that install software won't be able to do it,

- you'll have problems creating testing environments because you don't have licenses

- you won't be able to do some things immediately because there aren't enough licenses.

And these are just the problems that came to my mind right now. All of them are real problems that I'd found in commercial software.

> - customization will be very expensive, if not impossible

You've got a full API and integrations with a hundred different tools and services out of the box.

Seriously, my coworker was skeptic at first too (so was I). Then we configured the full integrations with AWS/the-agent/statsd/postgre/mysql/cassandra/elasticsearch/riak/nginx/haproxy/redis/memcache/pagerduty/slack and some more.

My co-worker concluded in front of my CEO, "it was 2 orders of magnitude faster [than anything else we've ever tried for monitoring]". And that's not even talking about the additional features and customization we couldn't even dream of.

> - you must have people for the procurement process (x10 more costly if you are in a gov agency),

True. That's the only major problem I can see: People who can't buy the software they need. That's a social problem, not a software problem.

> - weird failures due to not finding the license

It's only one API key to put in the agent config file.

> your cheap personal that install software won't be able to do it

I don't know who you're talking about. Monitoring has our best people working on it. At other places I've seen, it's done by devops consultants raking up £600 a day.

There is no cheap personal involved. (Maybe you're thinking about of cheap interns who add alerts? that's an anti pattern).

- you'll have problems creating testing environments because you don't have licenses

Same license. Put a tag environment=<environment> in the config and done, all metrics all servers and all alerts will be tagged.

- you won't be able to do some things immediately because there aren't enough licenses.

Not applicable. It's not a limited license by seats.

You pay the bill at the end of the month depending on the number of hosts in your package. There is a hourly price for ephemeral hosts and overrun.

I think the reason why you're getting negative reactions is that you're talking very broadly as if your personal experience is representative for everyone in the field. Rather than asserting that the real problems which neves mentioned don't exist, try describing how the specific products you've used were designed to avoid them.
Also https://www.hostedgraphite.com - We host Graphite and StatsD with Grafana dashboards, as well as alerting and integrations to several other dev tools. It's a self-funded business running for 5 years, and profitable with 14+ staff.
You guys are awesome. Can't wait to see Grafana 4.0 on hosted graphite.
As someone who's configured and worked with almost all of the tools in this list, I can only disagree with you. The old saying of you get what you pay for is some what relevant. But the integration of the newer OSS monitoring tools is becoming increasingly awesome. Take The graphite/prometheus/elasticstack integration with grafana for instance.

I think having one pane of glass to do all passive monitoring tasks is an incredible step forward.

I am yet to see if the Active monitoring of Grafana is any good, but it does look very promising

Is there any hosted monitoring solution that integrates with service discovery, so that it's actually useful for serious alerting in nowadays' dynamic environments? Otherwise you can't even tell if things that should be there are reporting in or missing.