Hacker News new | ask | show | jobs
by jauntywundrkind 795 days ago
We've turned off logging & tracing on a bunch of our high volume routes. Ideally I'd prefer we still sample them, at like 0.1% or what not, to give us some indicator, some chance of seeing anomalies. It just seems easier to gather & use this information than it is to go develop a suite of metrics that can register all issues.

OpenTelemetry recently ish gained Open Agent Management Protocol (OpAMP), which allows some runtime control over things generating telemetry. The ability to stay fairly low but then scale up as needed sounds tempting, but gee it also sends shivers down my spine thinking of having such a elastic demands on one's telemetry infrastructure, as engineers turn telemetry up as problems are occuring. https://opentelemetry.io/docs/specs/opamp/

The idea of having a local circular buffer sounds excellent to me. Being able to run local queries & aggregate would be sweet. Are there any open otel issues discussing these ideas?