Hacker News new | ask | show | jobs
by tignaj 1412 days ago
This. The statelessness of the OTLP is by design. I did consider stateful designs with e.g. shared state dictionary compression but eventually chose not to, so that the intermediaries can remain stateless.

An extension to OTLP that uses shared state (and columnar encoding) to achieve more compact representation and is suitable for the last network leg in the data delivery path has been proposed and may become a reality in the future: https://github.com/open-telemetry/oteps/pull/171

1 comments

Windows has something like 15,000 performance counters and error metrics that can be collected. There isn’t a system on earth that can even approach this. At scale, I have to pick and choose maybe 20-100 counters for fear of overloading a cluster(!) of servers collecting the data… once a minute.

That’s because the protocol overheads cause “write multiplication” of a hundred-to-one or worse. Every byte of metric ends up nearly a kilobyte on the wire.

Meanwhile I did some experiments that showed that even with a tiny bit of crude data-oriented design and delta compression a single box could collect 10K metrics across 10K endpoints every second without breaking a sweat.

The modern REST / RPC approach is fine for business apps but is an unmitigated disaster for collecting tiny metrics.

Set your goals higher than collecting a selected subset of 1% of the available metrics 60x less frequently than admins would like…