Hacker News new | ask | show | jobs
by pickledish 2017 days ago
14 cents per "query processing minute" sounds like it could add up very fast. Prom queries can get somewhat complex and it's not rare at all IME to have a dashboard making several multi-second queries per load (whether that falls into "you're using Prometheus wrong" being a separate discussion of course)

Edit: The example from their pricing page:

> We will assume you have 1 end user monitoring a dashboard for an average of 2 hours per day refreshing it every 60 seconds with 20 chart widgets per dashboard (assuming 1 PromQL query per widget)... assuming 18ms per query for this example.

Comes out to over $3 per month in query costs. Replace this 1 person with a TV showing the dashboard all day, and the cost jumps to $36, for just one dashboard and (again IME) overly fast query estimates... o.O

2 comments

Does it put any limits on cardinality of metrics? Grafana cloud's offering was absolutely awful for my use cases. They charge per-series so if you have metrics with a "pod=..." label your prices go through the roof.
Every managed metrics system will put a limit on cardinality because all mainstream available metrics systems cost more per cardinality to query and store. If they don’t limit that you can assume you or some other customer is going to use up the clusters resources and cause an outage.

Like most metrics systems, under the covers in Prometheus each unique combination of dimensions is the same as a new metric line.

Plenty has been written about not using the server/container/pod id as a label because it leads to high cardinality which leads to poor performance (cost aside). Time series databases have been purpose-built for certain workloads and you can consider this their weakness.
Plenty has also been written about the bugs/issues that have cropped up that are only visible when inspecting what regions/nodes/cgroups an issue is coming from [0]. My use case wasn't exactly `pod=...` but it was very similar. It was more like `device=...`. Also, for a huge application, it's not uncommon to have 100s or even 1000s of metrics that are important to application health/performance. Constantly saying "do you really need X? It will cost us Y" will lead to an extremely under-monitored application.

[0] - https://cloud.google.com/blog/products/management-tools/sre-...

Plenty of companies run their own servers because cloud is too expensive at their scale. Same goes for metrics. It's a direct result of one-price-fits-all pricing models for software as well as pricing that is not correctly tied to value.
I like Weave Cloud’s Prometheus hosting model — it’s per host, which is predictable and forecastable.
Now do six dashboards, 10 widgets each, multiple viewers, 18h/day and one slowish query on each dashboard. Seems like we get to hundred+ pretty quick
Caching means that multiple viewers cost very little extra.

(I am a Cortex maintainer)