Hacker News new | ask | show | jobs
by buchanae 159 days ago
I share a lot of this sentiment, although I struggle more with the setup and maintenance than the diagnosis.

It's baffling to me that it can still take _so_much_work_ to set up a good baseline of observability (not to mention the time we spend on tweaking alerting). I recently spent an inordinate amount of time trying to make sense of our telemetry setup and fill in the gaps. It took weeks. We had data in many systems, many different instrumentation frameworks (all stepping on each other), noisy alerts, etc.

Part of my problem is that the ecosystem is big. There's too much to learn: OpenTelemetry, OpenTracing, Zipkin, Micrometer, eBPF, auto-instrumentation, OTel SDK vs Datadog Agent, and on and on. I don't know, maybe I'm biased by the JVM-heavy systems I've been working in.

I worked for New Relic for years, and even in an observability company, it was still a lot of work to maintain, and even then traces were not heavily used.

I can definitely imagine having Claude debug an issue faster than I can type and click around dashboards and query UIs. That sounds fun.

2 comments

I completely agree w/ your points about why observability sucks: - Too much setup - Too much maintenance - Too steep of a learning curve

This isn't the whole picture, but it's a huge part of the picture. IMO, observability shouldn't be so complex that it warrants specialized experience; it should be something that any junior product engineer can do on their own.

> I can definitely imagine having Claude debug an issue faster than I can type and click around dashboards and query UIs. That sounds fun.

Working on it :)

> Part of my problem is that the ecosystem is big. There's too much to learn: OpenTelemetry, OpenTracing, Zipkin, Micrometer, eBPF, auto-instrumentation, OTel SDK vs Datadog Agent, and on and on. I don't know, maybe I'm biased by the JVM-heavy systems I've been working in.

We've had success keeping things simple with VictoriaMetrics stack, and avoiding what we perceive as unnecessary complexity in some of the fancier tools/standards.