| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by the_evacuator 3190 days ago
	Prometheus is an escaped implementation of Google’s borgmon, which is seen inside Google as a kind of horror show, and alternatives have been developed. It is kind of frightening that it has got out in the wild and people like it.

8 comments

packetslave 3190 days ago

"Borgmon is the worst form of monitoring, except for all those other forms that have been tried from time to time."

-- Winston Churchill if he worked at Google.

link

alin-sinpalean 3188 days ago

Having worked both on borgmon and with borgmon, I can think of a few reasons why some (many?) don't like it all that much:

1. As already mentioned, the macro system (and the fact that its use is basically required to set up basic monitoring) has quite a steep learning curve. Prometheus doesn't have that and I personally would prefer it did.

2. It's not a service, so you have to set up your own instance, configure it, maintain it. In many engineers' mind this is just another hurdle in front of them launching their service.

3. As a software engineer (particularly new to Google) you might not expect to have to do ops work and carry a pager yourself.

Out of all three, only (1) is a valid reason to hate on borgmon. That and the language itself, which is almost a 1:1 match with Prometheus, are very different from your regular programming language. But given the choice between flat, simple metrics (which is what most monitoring systems give you) and the ability to have arbitrary dimensions and be able to work with them to build useful alerts and dashboards and troubleshoot quickly, I (again personal opinion) will always go with the latter.

link

clhodapp 3190 days ago

Why is borgmon considered a horror show? Is something fundamentally flawed in the model? How do the alternatives differ from the original borgmon?

link

kyrra 3190 days ago

Google has an alternative that they gave a talk on back in December. Sadly there aren't any papers on it yet. It's called Monarch and it's what backs up Stackdriver.

It's config language is less crazy (Python based) and operates globally.

https://www.youtube.com/watch?v=LlvJdK1xsl4

Edit: Monarch config isn't sane, it's just different and at least not in the crazy languages that borgmon uses.

link

alin-sinpalean 3188 days ago

I will point out that borgmon's language (minus macros) is almost a 1:1 match with Prometheus. You can judge for yourself how crazy that is, but I feel that it's close to as simple as you can get for the power it gives you.

As for Monarch, it's a very different beast. For one, it stores all its rules in a protocol buffer format, so it's more structured. But then you have to write Python code that generates the protocol buffers and pushes them to storage. It looks similar but not the same as the ad-hoc query language. I wouldn't go as far as calling it sane.

It is also a service and it's optimized for Google's network architecture with datacenter local and global nodes and the language itself is aware of this distinction and some computations are done locally, others globally and so on.

For your local monitoring needs (or even global ones, if you're willing to put in the effort), Prometheus is a solid choice.

link

kyrra 3188 days ago

I'd agree with you after thinking about it some. I haven't really written either, mainly either copy/paste or using tools to assist in creation. So I can't really judge either monitoring language on their ease of use.

link

packetslave 3190 days ago

"sane" is an interesting choice of words to describe Monarch configuration...

link

adrianratnapala 3190 days ago

Borgmon's language is weird and crusty. People can debate whether this is the main problem with Borgmon or whether more fundamental changes are necessary.

I won't weight in on that debate. But you can think of Prometheus as an experiment to decide the issue: it is very similar to Borgmon, but has a cleaner language.

link

lima 3190 days ago

Some details: https://landing.google.com/sre/book/chapters/practical-alert...

link

mbertschler 3190 days ago

And what would be a better alternative available outside of Google?

link

user5994461 3190 days ago

The decade old, open source, self hosted, debug it yourself standard for monitoring is collectd+graphite+grafana.

The equivalent easy to setup and use, with ALL the features working out of the box, SaaS standard is https://www.datadoghq.com/ or potentially Google Stack driver if you are on Google Cloud.

link

toomuchtodo 3190 days ago

Plugging https://www.librato.com/ after reading an HN thread yesterday that DataDog pricing is insane [1].

[1] https://news.ycombinator.com/item?id=15315028

Disclaimer: No relation to either org.

link

user5994461 3184 days ago

The thread you are linking to is a complete joke. The guy didn't see that the pricing was per host, even though it's written in big letters. His whole series of rant is ridiculous.

link

kyrra 3190 days ago

Stackdriver works with AWS.

https://cloud.google.com/monitoring/quickstart-aws

link

user5994461 3184 days ago

Stack Driver is a Google Cloud service, you need a Google account with billing to use it. It can gather metrics from hundreds of vendors, including AWS and Azure.

link

caust1c 3187 days ago

It wasn't universally liked at Cloudflare. The federation component in particular is a PITA.

link

bbrazil 3188 days ago

> alternatives have been developed

Such as Prometheus :) Even some teams in Google use it.

link

orf 3190 days ago

So you have any reason to not like it?

link

the_evacuator 3190 days ago

Not really. I just think it’s funny to see it described by one groups as a reasonable or even state-of-the-art system, while another group describes it as brain damage from ten years ago.

link

delroth 3190 days ago

Borgmon being "brain damage" or some kind of "horror show" is more a meme than a serious opinion held by people who have used Borgmon.

Personally I'm very happy that the open source world is adopting something derived from Borgmon rather than something derived from its supposed "replacement".

link

SuperQue 3188 days ago

Yup, I've had more than one current Google SRE state "You can have my Borgmon when you pry it from my cold dead hands.".

Borgmon may be dead in the eyes of some people, but I know for a fact that it's still the only thing monitoring core and critical systems.

Most of the problem with Borgmon, IMO, is the cruft that has built up over the decade+, and neglect due to the Google pattern of "The new thing that doesn't work, and the old thing that is deprecated.".

The difficulty at Google is that developers are rewarded for writing new and shiny from scratch, rather than fix the old but working systems.

This isn't always a problem, as some good things can come out of starting from scratch. But sometimes they throw out too many of the good ideas, in an attempt to be fancy and new.

link

bogomipz 3190 days ago

I have heard unflattering things about it from a few different ex-Google SREs, specifically about the macro system and it being cumbersome to use.

link

bbrazil 3188 days ago

You will note that Prometheus explicitly does not have a macro system.

link

bogomipz 3188 days ago

Oh sure, I was responding to the OP stating that criticism of Borgmon was more a meme than reality.

It wasn't meant to be commentary on Prometheus(which I quite like) at all :)

link

user5994461 3190 days ago

It's pretty normal when you consider that google is 10 years ahead of almost every other company when it comes to infrastructure.

link

toomuchtodo 3190 days ago

> is 10 years ahead of almost every other company when it comes to infrastructure.

For Google-scale orgs or infrastructure needs. Most everyone else in the world does not need Google scale tools.

link

user5994461 3190 days ago

Google needs are extremely common. Take a look at any Fortune 500 and and it could usually benefit greatly from a lot of the infrastructure that powers google.

Most of them do run their own datacenters, sometimes in numerous locations, they have massive and extremely complex IT systems in place.

link