Hacker News new | ask | show | jobs
by sagichmal 2143 days ago
Prometheus "scales" really well, but it does so via segmentation and federation, rather than increasing the size of an e.g. cluster. Some use cases don't fit to that model, so projects like Cortex and Thanos exist.
1 comments

not vertically at least. the memory usage for indexing has room for improvement. If I read the pprofs correctly, every scrape interval and every remote write allocates huge amounts of memory which is only cleaned up on garbage collection. You can easily need >64 gb ram for tenthousands of time series, otherwise you oom.
Biggest single promethueus server I have access to currently uses almost 64GiB of RAM and ingests about 80000 samples per second. Most of scrape intervals is 60s. It is about 5 000 000 time series. Note that we do have more time series - above server is just a horizontal shard, ingesting just one part of total metrics volume there.
Tens of thousands seems rather low, we are running 3 million series with less than 32GB of RAM and still have room to spare.
I do 15 million on about 64GB average memory. Have you tried recently?
This has not been my experience at all. I'd file that as a bug.