| HN Mirror

This gets even worse if you have a language with one process per CPU as you can get clobbering other values on the same instance if you don't add fields to uniquely identify them.

We got a lot of pushback when migrating our telemetry to AWS after initially being told to just move it when they saw how OTEL amplified data points and cardinality versus our old StatsD data.

You probably need less cardinality than you think, and there are a mix of stats that work fine with less frequent polling, while others like heap usage are terrible if you use 20 or 30 second intervals. Our Pareto frontier was to reduce the sampling rate of most stats and push per-process things like heap usage into histograms.

An aggregator per box can drop a couple of tags before sending them upstream which can help considerably with the number of unique values. (eg, instanceID=[0..31] isn't that useful outside of the box)