Hacker News new | ask | show | jobs
by heliodor 2017 days ago
Plenty has been written about not using the server/container/pod id as a label because it leads to high cardinality which leads to poor performance (cost aside). Time series databases have been purpose-built for certain workloads and you can consider this their weakness.
1 comments

Plenty has also been written about the bugs/issues that have cropped up that are only visible when inspecting what regions/nodes/cgroups an issue is coming from [0]. My use case wasn't exactly `pod=...` but it was very similar. It was more like `device=...`. Also, for a huge application, it's not uncommon to have 100s or even 1000s of metrics that are important to application health/performance. Constantly saying "do you really need X? It will cost us Y" will lead to an extremely under-monitored application.

[0] - https://cloud.google.com/blog/products/management-tools/sre-...

Plenty of companies run their own servers because cloud is too expensive at their scale. Same goes for metrics. It's a direct result of one-price-fits-all pricing models for software as well as pricing that is not correctly tied to value.