|
|
|
|
|
by bbrazil
3705 days ago
|
|
> Not only that, only scaling vertically on a single node doesn't seem like a good design. For Prometheus at least, we're so efficient that it actually works out okay for the vast majority of users. You'd typically need thousands of instances doing the same thing inside a single datacenter before you get into our (admittedly more involved) horizontal sharding approach. http://www.robustperception.io/scaling-and-federating-promet... has more information. > There are ways to poll things and push metrics without opening millions of firewall ports to every security group. Sensu does that quite well, and it scales. I don't think that's quite a fair comparison. Sensu Just Works when there's no outbound firewall, Prometheus Just Works when there's no inbound firewall. If you add the other direction of firewall for either then things break down. |
|
The outbound vs inbound firewalls is totally a fair comparison. At almost every company I have worked for the perimeter security blocks all incoming ports. Most places have no outbound port restrictions, or when they do they usually always have a proxy for traffic to go out (like https for updates etc). This is what makes the design of outbound only connections considerably better (Sensu, Datadog, insert pretty much any SaaS vendor). I honestly don't know of any company who would open up such an extreme number of ports required to allow scraping from an external monitoring tool. In a large enterprise you want to host the monitoring tool as a service for other groups which means potentially in a totally different data center or cloud and allowing a small list of subnets 'in' is considerably better than exposing every single server to external access.
For the scaling I'm not sure I totally agree. There is a reason why distributed systems exist and that is to scale efficiently with some degree of redundancy baked in. Single node HA is probably the most inefficient method of scaling. It sounds like Prometheus needs to run properly on a single box for simplicity but over time needs to be broken up and made scalable beyond the bounds of a single server. Being limited to a single server in 2016, like Nagios was 10 years ago before they too started to split some things out, isn't something I'd advertise as a feature.
I think right now Prometheus is probably filling the gap of what an industrial historian would do in a factory. It's a console you would have sitting next to the thing you were building that would provide real time measurements. We didn't want lots of individual consoles. We wanted a central large system that could hold years of historical data, as well as allowing people to query it in real time, and we liked the idea of not needing to double up on hardware costs. We're using a mix of elasticsearch and cassandra here to achieve that.