Hacker News new | ask | show | jobs
by bbrazil 3701 days ago
> Most places have no outbound port restrictions, or when they do they usually always have a proxy for traffic to go out (like https for updates etc)

Depends which company. For companies that really care about security, letting arbitrary traffic out is a big no-no. Even via proxy, it requires tight controls.

We've had potential users that were quite excited that Prometheus works the other way, as their network security team were likely to permit it.

> There is a reason why distributed systems exist and that is to scale efficiently with some degree of redundancy baked in.

The efficiency here is in humans (in theory anyway), not in resources or reliability. Distributed systems are a very hard problem, and we avoid those approaches for Prometheus as it's a critical monitoring system. CP systems like Kafka and Zookeeper are not things you want on your alerting path, as they'll fall apart when your network does. Prometheus will keep chugging along.

> Single node HA is probably the most inefficient method of scaling.

I'd disagree, the standard approach these days tends to be a cluster of three which'd use at least 50% more resources.

> It sounds like Prometheus needs to run properly on a single box for simplicity but over time needs to be broken up and made scalable beyond the bounds of a single server.

That's correct. If you manage to have enough targets inside a single datacenter (many thousands of machines), then we recommend vertical sharding first and only if that doesn't work horizontal sharding. Prometheus is really easy to run, so you'll likely end up for organisation reasons choosing to vertically sharding anyway so that each team can control their own decentralised monitoring.

> We didn't want lots of individual consoles.

With Grafana you can view things across many Prometheus servers.

> We wanted a central large system that could hold years of historical data,

Prometheus explicitly doesn't do historical data. As we're effectively talking unbounded amounts of storage that can't fit on one machine, which implies a distributed storage system. As mentioned above that's not something we want in our core for reliability.

The chosen approach is that we'll interface with something else such as OpenTSDB that'll do the long term storage, and we'll support seamlessly graphing across it. That's much easier to make reliable (just add a timeout), and if it does go down you'd still have the last few weeks of data sitting on the Prometheus box.

> as well as allowing people to query it in real time

That we do really well, PromQL is very powerful and anything you can graph you can also alert on.

> and we liked the idea of not needing to double up on hardware costs.

I feel this is a bit of a red herring. The question isn't whether there's a 2X multiplier in the math, it's the overall cost as compared to the benefits.

Prometheus is astoundingly efficient, the latest numbers are 800k samples/s on one machine. I haven't heard of anything else that is even close, and I believe we're also holding the record on storage efficiency.

Even 10X that cost isn't likely to break the bank, so I'd suggest taking a look at the full range of features it offers and comparing to the real world cost. The operational aspects you mention are generally manageable, and if they aren't your infrastructure likely has bigger problems.

2 comments

> It sounds like Prometheus needs to run properly on a single box for simplicity but over time needs to be broken up and made scalable beyond the bounds of a single server.

I can understand the reasoning behind them building for a single node. Building distributed systems is hard, not everyone is capable of building these systems. They also require languages and frameworks suited to working in clusters. Golang might not scale too well, but its great for the simple things.

> The chosen approach is that we'll interface with something else such as OpenTSDB that'll do the long term storage, and we'll support seamlessly graphing across it. That's much easier to make reliable (just add a timeout), and if it does go down you'd still have the last few weeks of data sitting on the Prometheus box.

Doesn't OpenTSDB require zookeeper? So if your "network falls apart" you have no historical data? I guess thats fine if you're not alerting on data trends, and also have 10 minutes for your graphs to render.

> I can understand the reasoning behind them building for a single node. Building distributed systems is hard, not everyone is capable of building these systems. They also require languages and frameworks suited to working in clusters. Golang might not scale too well, but its great for the simple things.

I'd say we're up to it, and Go is a great language for this class of system. We're however wise enough to know that even with the best people and tools this would take years to get production ready.

> So if your "network falls apart" you have no historical data? I guess thats fine if you're not alerting on data trends, and also have 10 minutes for your graphs to render.

Yes, you'd lose access to historical data (typically more than a few weeks ago). But everything else works, which should include the vast majority of your alerts.

The point on the ingress vs egress is that most systems already have a route out. To create a route in takes much more effort especially when you have NAT's etc. It's very nice to be able to spin up nodes and not have to care about opening firewall ports in security groups. By default AWS limits inbound and has no limitations on outbound (in classic ec2, things can be changed in vpc). Managing that security list centrally is far more auditable. I'm very surprised you've encountered anyone who thinks it's a great idea to open a port to every security group from a certain location (or many).

Like it or not you are beginning to split out Prometheus into a set of services regardless of the underlying belief that it all needs to fit onto a single node. That will only become more obvious over time as devices and metrics increase (which they very obviously will with containers and application metrics). There is a limit to a node, 800k metrics per box is not that huge when you consider the things that could be measure just on a single host. We have several thousand metrics coming out of just a single MySQL instance.

With Grafana you can only choose a single data source per widget. So you then can't overlay your data between prometheus nodes on the same graph. You can of course put them onto the same dashboard in different widgets but that is limiting. We chose a system where you could store all metrics across all systems at the same resolution and use them for analytics purposes on the same graphs. Having two layers of storage (robust.. on a single node :) this stretches the definition oddly) and then long term storage isn't desirable.

No matter how efficient you make things they won't keep up with the rate of metrics coming out of systems. Limiting to a single box 'because otherwise it's a hard problem' really doesn't seem like a great philosophy. It might be practical for now but longer term that's a very limited vision.

> There is a limit to a node, 800k metrics per box is not that huge when you consider the things that could be measure just on a single host. We have several thousand metrics coming out of just a single MySQL instance.

To clarify, that's 800k metrics per second per box. That's an upper limit of around 50M metrics with a 60s scrape interval.

It'd be quite difficult to hit this limit with a single service inside one datacenter.

If you do manage to hit it, we do have documented ways to horizontally scale beyond that.

> With Grafana you can only choose a single data source per widget.

this is not correct. Since 2.5 you can mix datasources. see http://docs.grafana.org/guides/whats-new-in-v2-5/