| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cmckn 2143 days ago
	Prometheus is great. I first heard about it at KubeCon last fall, and kind of shrugged it off as one of those fledgling "cloud native" projects that I probably didn't need or didn't have time to learn. There's actually a lot of adoption, you can find great exporters and grafana dashboards for almost any OSS you're running today. I started collecting metrics from Zookeeper and HBase in about an hour, having never had access to that telemetry before. From the existence of Cortex[1], it seems that Prometheus doesn't scale incredibly well, but I don't think many users will hit these limits. [1] https://cortexmetrics.io/

3 comments

edoceo 2143 days ago

My Prometheus system is a $10/mo Linode. It collects from 27 other hosts, and at least 100 services distributed across those hosts - doesn't even break a sweat. All the exporters run through a wireguard VPN. Prometheus is great for a small/medium SaaS type environment.

abhishekjha 2143 days ago

What do you use as a frontend? As far as I could tell grafana free tier doesn’t allow monitoring cluster of servers.

edoceo 2142 days ago

I use Grafana and some custom ones, I have only one Prometheus box so clustering is not a problem I'm having (and likely won't, I can vertical scale a long way for my smallish operation)

dewey 2143 days ago

You could self host it.

abhishekjha 2143 days ago

Can I self host for monitoring cluster of servers? Currently I have grafana installed on each of my servers and I am having to monitor them individually. I want a centralised dashboard over telegraf + influxdb.

detaro 2143 days ago

Why would you install Grafana + Influx on each server instead of one central one?

abhishekjha 2142 days ago

I haven't spent much time on this but most of the docs were for setting it up on each hosts. Is there a proper tutorial for clusters?

Also I wanted to keep the monitoring unaffected for other servers if one of them go down. If I setup a central server for monitoring then that becomes a single point of failure.

sagichmal 2143 days ago

Prometheus "scales" really well, but it does so via segmentation and federation, rather than increasing the size of an e.g. cluster. Some use cases don't fit to that model, so projects like Cortex and Thanos exist.

nielsole 2143 days ago

not vertically at least. the memory usage for indexing has room for improvement. If I read the pprofs correctly, every scrape interval and every remote write allocates huge amounts of memory which is only cleaned up on garbage collection. You can easily need >64 gb ram for tenthousands of time series, otherwise you oom.

bjakubski 2143 days ago

Biggest single promethueus server I have access to currently uses almost 64GiB of RAM and ingests about 80000 samples per second. Most of scrape intervals is 60s. It is about 5 000 000 time series. Note that we do have more time series - above server is just a horizontal shard, ingesting just one part of total metrics volume there.

dikei 2143 days ago

Tens of thousands seems rather low, we are running 3 million series with less than 32GB of RAM and still have room to spare.

base698 2143 days ago

I do 15 million on about 64GB average memory. Have you tried recently?

sagichmal 2143 days ago

This has not been my experience at all. I'd file that as a bug.

jeffbee 2143 days ago

Interesting to say that Prometheus is "fledgling". The project is almost 7 years old and the Google thing on which it is based is ~15 years old.

cmckn 2143 days ago

> I first heard about it at KubeCon last fall, and kind of shrugged it off as one of those fledgling...

I didn’t know the age of the project, because I hadn’t heard of it. That’s why I go on to say that in actuality it has a ton of adoption and I’ve had a great experience with it.