Hacker News new | ask | show | jobs
by layer8 998 days ago
> Log Levels are meaningless. Is a log line debug, info, warning, error, fatal, or some other shade in between?

I partly agree and disagree. In terms of severity, there are only three levels:

– info: not a problem

– warning: potential problem

– error: actual problem (operational failure)

Other levels like “debug” are not about severity, but about level of detail.

In addition, something that is an error in a subcomponent may only be a warning or even just an info on the level of the superordinate component. Thus the severity has to be interpreted relative to the source component.

The latter can be an issue if the severity is only interpreted globally. Either it will be wrong for the global level, or subcomponents have to know the global context they are running in to use the severity appropriate for that context. The latter causes undesirable dependencies on a global context. Meaning, the developer of a lower-level subcomponent would have to know the exact context in which that component is used, in order to chose the appropriate log level. And what if the component is used in different contexts entailing different severities?

So one might conclude that the severity indication is useless after all, but IMO one should rather conclude that severity needs to be interpreted relative to the component. This also means that a lower-level error may have to be logged again in the higher-level context if it’s still an error there, so that it doesn’t get ignored if e.g. monitoring only looks at errors on the higher-level context.

Differences between “fatal” and “error” are really nesting differences between components/contexts. An error is always fatal on the level where it originates.

4 comments

The OP is wrong, log levels are very valuable if you leverage them.

Here's a classic problem as an illustration: The storage cost of your logs is really prohibitive. You would like to cut out some of your logs from storage but cannot lower retention below some threshold (say 2 weeks maybe). For this example, assume that tracing is also enabled and every log has a traceId

A good answer is to run a compaction job that inspects each trace. If it contains an error preserve it. Remove X% of all other traces.

Log levels make the ergonomics for this excellent and it can save millions of dollars a year at sufficient scale.

> In addition, something that is an error in a subcomponent may only be a warning or even just an info on the level of the superordinate component.

Or, keep it simple.

- error means someone is alerted urgently to look at the problem

- warning means someone should be looking into it eventually, with a view to reclassifying as info/debug or resolving it.

IMO many people don't care much about their logs, until the shit hits the fan. Only then, in production, do they realise just how much harder their overly verbose (or inadequate) logging is making things.

The simple filter of "all errors send an alert" can go a long way to encouraging a bit of ownership and correctness on logging.

> - error means someone is alerted urgently to look at the problem

The issue is that the code that encounters the problem may not have the knowledge/context to decide whether it warrants alerting. The code higher up that does have the knowledge, on the other hand, often doesn’t have the lower-level information that is useful to have in the log for analyzing the failure. So how do you link the two? When you write modular code that minimizes assumptions about its context, that situation is a common occurrence.

> When you write modular code that minimizes assumptions about its context, that situation is a common occurrence.

so your code isn't modular after all, because the code is _doing_ logging as a side-effect of the actual functionality.

The modularity of your code should mean that the outcome of the functionality is packaged into a bundle of data, and this bundle includes information about errors (or warnings) - aka, a status result.

The caller of this module will inspect this data, and they themselves will decide to log (or, if they are a module of their own, pass the data up again). This goes on, until the data goes into a logging layer - solely responsible for logging perhaps.

Yes, except the problem here is that if the app crashes, you'll lose all the messages in the bundle. That's why people tend to use side-effect logging that persists messages immediately. That, and because it keeps timestamps correct.

I suppose this approach would make most sense in event-driven apps where no particular processing takes any meaningful amount of time, so you're constantly revisiting the top-level loop, where the "logging layer" could live. However, most software isn't written this way.

App segfaulting before having chance to log is mostly a thing in the past, unless you are writing c++. Any other language will instead have a top level exception handler.

If you were to take hard crashes into account, you would even have to log before each operation instead of after, basically reverting to printf-debugging.

> unless you are writing c++

Guilty as charged.

> If you were to take hard crashes into account, you would even have to log before each operation instead of after

Yes, that's exactly what I see done and do for large enough operations (substeps of those operations only log when they're done).

> basically reverting to printf-debugging

That's what logging is, fundamentally. printf debugging, but with your own printf that has a few more knobs.

If the code detecting the error is a library/subordinate service then the same rule can be followed - should this be immediately brought to a human's attention?

The answer for a library will often be no, since the library doesn't "have the knowledge/context to decide whether it warrants alerting".

So in that case the library can log as info, and leave it to the caller to log as error if warranted (after learning about the error from return code/http status etc.).

When investigating the error, the human has access to the info details from the subordinate service.

I agree with your premise, but do consider debug to be a fourth level.

Info is things like “processing X”

Debug is things like “variable is Y” or “made it to this point”

I tend to think of "warning" as - "something unexpected happened, but it was handled safely"

And then "error" as - "things are not okay, a developer is going to need to intervene"

And errors then split roughly between "must be fixed sometime", and "must be fixed now/ASAP"

> I tend to think of "warning" as - "something unexpected happened, but it was handled safely"

It was handled safely at the level where it occurred, but because it was unusual/unexpected, the underlying cause may cause issues later on or higher up.

If one were sure it would 100% not indicate any issue, one wouldn’t need to warn about it.

That would indicate an issue - i.e. something we don't want. Just that it's not something where an engineer needs to go and mop up, and in theory would continue to operate correctly indefinitely. I guess correct as in - safe but not necessarily the most desirable behavior