Hacker News new | ask | show | jobs
by markl42 735 days ago
At the risk of hijacking the comments, I've been trying to use OTel recently to debug performance of a complex webpage with lots of async sibling spans, and finding it very very difficult to identify the critical path / bottlenecks.

There's no causal relationships between sibling spans. I think in theory "span links" solves this, but afaict this is not a widely used feature in SDKs are UI viewers.

(I wrote about this here https://github.com/open-telemetry/opentelemetry-specificatio...)

2 comments

I don't believe this is a solved problem, and it's been around since OpenTracing days[0]. I do not think that the Span links, as they are currently defined, would be the best place to do this, but maybe Span links are extended to support this in the future. Right now Span links are mostly used to correlate spans causally _across different traces_ whereas as you point out there are cases where you want correlation _within a trace_.

[0]: https://github.com/opentracing/specification/issues/142

I was underwhelmed by the max size for spans before they get rejected. Our app was about an order of magnitude too complex for OTEL to handle.

Reworking our code to support spans made our stack traces harder to read and in the end we turned the whole thing off anyway. Worse than doing nothing.

As per the spec there's no formal limits on size, although in practice there can be in several levels:

- Your SDK's exporter

- Collector processors and general memory limitations based on deployment

- Telemetry backend (this is usually the one that hits people)

Do you know where the source of this rejection happened? My guess would be backend, since some will (surprisingly) have rather small limits on spans and span attributes.

Sounds like a knob you can turn, from my practice at least.