Hacker News new | ask | show | jobs
by jerf 2707 days ago
"Also what does observability mean is this context?"

Something went wrong, and now your site is serving 500 server errors to everybody at the rate of 25,000 per minute. The ops team already tried "just reboot it" and it didn't help. How are you going to figure out what is going on and fix it?

It's (mostly) too late to add anything, so all you've got is the logs you already had, the metrics you already had, etc. That's the "observable" stuff in a system. There's an art to recording what it is you need to know, while at the same time recording so much that you can't find what you need in the mess.

(The "mostly" is that if you have a good enough setup, you might be able to bring up a new system and route some very small fraction of traffic to it to examine it more intensely in real-time with a debugger or something, though in my experience, on those occasions I've had the opportunity to try this, it's never been a problem that would manifest on a new system receiving a vanishing fraction of a percent of the scale of a production box. But maybe you'll get lucky.)

You certainly want to do everything you can to not be in that mess in the first place, but it won't be enough. You need a system sufficiently observable that you can find the problem and find some sort of solution.

1 comments

Oh thank you, I didn't know that was referred to as "observability" I thought it was just logging. This article from Etsy's engineering blog [1] was part of the inspiration for this question. Funnily enough when I googled "Etsy engineering logging" the 5th result was for a position on Etsy's observability team.

[1] https://codeascraft.com/2011/02/15/measure-anything-measure-...

I think of observability as a triad:

- logging (ex tools: Splunk, Sumologic, LogDNA)

- metrics (Prometheus, datadog, Grafana)

- tracing (lightstep, new relic, zipkin)

As mentioned above, observability is the data collected about a system.

When it comes to "measure everything" I've found services that have clients that already grok popular frameworks to be a godsend. We use NewRelic and it's abilty to automatically insturment all rest apis and db transactions is delightful. I could not imagine going back to having to do it manually or guess what information might be useful later.