| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fishtoaster 682 days ago

For what it's worth, I found it almost trivial to set up open telemetry and point it honeycomb. It took me an afternoon about a month ago for a medium-sized python web-app. I've found that I can replace a lot of tooling and manual work needed in the past. At previous startups it's usually like

1. Set up basic logging (now I just use otel events)

2. Make it structured logging (Get that for free with otel events)

3. Add request contexts that's sent along with each log (Also free with otel)

4. Manually set up tracing ids in my codebase and configure it in my tooling (all free with otel spans)

Really, I was expecting to wind up having to get really into the new observability philosophy to get value out of it, but I found myself really loving this setup with minimal work and minimal koolade-drinking. I'll probably do something like this over "logs, request context, metrics, and alarms" at future startups.

1 comments

JoshTriplett 682 days ago

I've currently done this, and I'm seriously considering undoing it in favor of some other logging solution. My biggest reason: OpenTelemetry fundamentally doesn't handle events that aren't part of a span, and doesn't handle spans that don't close. So, if you crash, you don't get telemetry to help you debug the crash.

I wish "span start" and "span end" were just independent events, and OTel tools handled and presented unfinished spans or events that don't appear within a span.

link

bamboozled 682 days ago

Isn’t the problem here that your code is crashing and you’re relying on the wrong tool to help you solve that ?

link

JoshTriplett 682 days ago

Logging solves this problem. If OTel and observability is attempting to position itself as a better alternative to logging, it needs to solve the problems that logging already solves. I'm not going to use completely separate tools for logging and observability.

Also, "crash" here doesn't necessarily mean "segfault" or equivalent. It can also mean "hang and not finish (and thus not end the span)", or "have a network issue that breaks the ability to submit observability data" (but after an event occurred, which could have been submitted if OTel didn't wait for spans to end first). There are any number of reasons why a span might start but not finish, most of which are bugs, and OTel and tools built upon it provide zero help when debugging those.

link

phillipcarter 681 days ago

OTel logs are just your existing logs, though. If you have a way to say "whoopsie it hung" then this doesn't need to be tied to a trace at all. The only tying to a trace that occurs is when there's active span/trace in context, at which point the SDK or agent you use will wrap the log body in that span/trace ID. Export of logs is independent of trace export and will be in separate batches.

Edit: I see you're a major Rust user! That perhaps changes things. Most users of OTel are in Java, .NET, Node, Python, and Go. OTel is nowhere near as developed in Rust as it is for these languages. So I don't doubt you've run into issues with OTel for your purposes.

link

growse 682 days ago

Can you give an example of an event that's not part of a span / trace?

link

Spivak 682 days ago

Unhandled exceptions is a pretty normal one. You get kicked out to your app's topmost level and you lost your span. My wishlist to solve this (and I actually wrote an implementation in Python which leans heavily on reflection) is to be able to attach arbitrary data to stack frames and exceptions when they occur merge all the data top-down and send it up to your handler.

Signal handlers are another one and are a whole other beast simply because they're completely devoid of context.

link

growse 681 days ago

Two good examples - thank you.

They're icky (as language design / practices) to me precisely because you end up executing context-free code. But I'd probably also just start a new trace in my signal handler / exception handler tagged with "shrug"...

link

jononor 682 days ago

Can't you close the span on an exception?

link

JoshTriplett 682 days ago

See https://news.ycombinator.com/item?id=41205665 for more details.

And even in the case of an actual crash, that doesn't necessarily mean the application is in a state to successfully submit additional OTel data.

link