Hacker News new | ask | show | jobs
by syastrov 2918 days ago
Nice write up. I love reading these kinds of postmortems.

Unlike a lot of those I read, it sounds like you actually set out with a good set of requirements and really understood the problem.

I had a good experience using Prometheus as well for a smaller project (server monitoring). It’s interesting to know that it can handle so many metrics and scale so well to more complex problem areas.

1 comments

One of the blog authors here -- thanks!

> I had a good experience using Prometheus as well for a smaller project (server monitoring). It’s interesting to know that it can handle so many metrics and scale so well to more complex problem areas.

Yep, we started out with a pretty simple prometheus setup too (two instances scraping the same metrics, just for redundancy) but have been adding federated instances and doing some pre-aggregation to scale; the nice part is that we've been able to do it pretty gradually by updating the config (e.g. splitting out one bucket of metrics into a separate node for scraping at a time).

We took a similar journey with Prometheus @ Improbable. We found federation to have its limits & wanted a global query view as well as a few other nice features: https://improbable.io/games/blog/thanos-prometheus-at-scale