| > Log Levels are meaningless. Is a log line debug, info, warning, error, fatal, or some other shade in between? I partly agree and disagree. In terms of severity, there are only three levels: – info: not a problem – warning: potential problem – error: actual problem (operational failure) Other levels like “debug” are not about severity, but about level of detail. In addition, something that is an error in a subcomponent may only be a warning or even just an info on the level of the superordinate component. Thus the severity has to be interpreted relative to the source component. The latter can be an issue if the severity is only interpreted globally. Either it will be wrong for the global level, or subcomponents have to know the global context they are running in to use the severity appropriate for that context. The latter causes undesirable dependencies on a global context. Meaning, the developer of a lower-level subcomponent would have to know the exact context in which that component is used, in order to chose the appropriate log level. And what if the component is used in different contexts entailing different severities? So one might conclude that the severity indication is useless after all, but IMO one should rather conclude that severity needs to be interpreted relative to the component. This also means that a lower-level error may have to be logged again in the higher-level context if it’s still an error there, so that it doesn’t get ignored if e.g. monitoring only looks at errors on the higher-level context. Differences between “fatal” and “error” are really nesting differences between components/contexts. An error is always fatal on the level where it originates. |
Here's a classic problem as an illustration: The storage cost of your logs is really prohibitive. You would like to cut out some of your logs from storage but cannot lower retention below some threshold (say 2 weeks maybe). For this example, assume that tracing is also enabled and every log has a traceId
A good answer is to run a compaction job that inspects each trace. If it contains an error preserve it. Remove X% of all other traces.
Log levels make the ergonomics for this excellent and it can save millions of dollars a year at sufficient scale.