Hacker News new | ask | show | jobs
by starkparker 892 days ago
> 1. The common failure of docs to explain to users why they might choose one thing or another. "If you want to do x.. If you want to do y.." what if I don't know?

Observability docs in general struggle with this. So many data sources can emit so many types of metrics in so many formats, and every tool makes this impossible promise of consolidating it all into one space seamlessly. But tools like Grafana pride themselves so much on visualizing _anything_ that they paint themselves into a corner where they can't be prescriptive about common uses or methods without excluding or confusing others.

So a lot of the prescriptive answers to "what if I don't know?" gets chucked onto account and support teams of commercial vendors, because the docs can't anticipate every possible context in which an observability tool will get deployed. Each solution ends up being custom tailored and poorly portable to anyone else's, often not even to other customers with the same data sources and goals at the same scale due to wacky labelling differences or legacy requirements or some internal stakeholder demand.

More narrowly focused tools don't have as many of these problems, but not many organizations want narrowly focused observability tools. (Lots of _people_ do, but orgs don't want to pay out deals to multiple vendors for what looks like different flavors of the same result. And hey look it's Grafana Cloud or Datadog or whatever, it can do _anything_, so you devs and also bizops and SRE and IT and hey sales wants a dashboard too and so does the company cafeteria, why not, you all can just use this one tool and we just deal with one bill with a volume discount, right? Right??)

Smarter tools don't have as many of these problems by papering over the docs limitations by being better able to anticipate or surface connections between data sources, metrics, logs, traces, events, etc., and does so with better interfaces. But especially for high-cardinality data the usability of those tools either seems to fall apart or their companies charge Datadog-sized invoices.

1 comments

Are there narrowly focused tools in the observability space even?

I was shopping for one after being outside of this field for a while, and they all do the 101 features and the kitchen sink model, which adds onto the complexity. DataDog, Grafana, but also the open source ones like SigNoz itself.

Ages ago it was all about metrics, today it's metrics traces logs APM alerting exceptions and a dozen other acronyms, on top of the protocols (statsd, Prometheus, OpenTelemetry), paired with crazy complicated yet unwieldy graph building UIs. Let's not even talk about pricing models. The entire business model is based around having one more checkmark in the feature list than the competition. The wire format (OpenTelemetry) has never been the pain point in this space.

For a moment, I seriously considered just going back to the 2000s and using RRDtool.

Most new observability tools start narrow but every economic incentive is to expand. Which makes sense, really, because most production systems people have are complicated as all hell and have tons of different needs. Some tools are better than others at containing the chaos -- I will humbly submit that the one I work for, Honeycomb, is one of the best at doing this -- but support for several telemetry signals, visualization tools, alerting systems, dashboarding systems, etc. are all what people eventually ask for as they roll out observability to more of their production systems.

Put differently, when you have sufficient observability of your entire system, you now have a complete abstraction of that system represented in some other UI and data streams. There's just no way out of the fact that for larger systems, this will be complicated, and the tools that can represent this reality must also be complex.