Hacker News new | ask | show | jobs
by Too 173 days ago
This is why it’s almost always wrong for library functions to log anything, even on ”errors”. Pass the status up through return values or exceptions. As a library author you have no clue as how an application might use it. Multi threading, retry loops and expected failures will turn what’s a significant event in one context into what’s not even worthy of a debug log in another. No rule without exceptions of course, one valid case could be for example truly slow operations where progress reports are expected. Modern tracing telemetry with sampling can be another solution for the paranoid.
10 comments

Depending on the language and logging framework, debug/trace logging can be acceptable in a library. But you have to be extra careful to make sure that it's ultimately a no-op.

A common problem in Java is someone will drop a log that looks something like this `log.trace("Doing " + foo + " to " + bar);`

The problem is, especially in a hot loop, that throw away string concatenation can ultimately be a performance problem. Especially if `foo` or `bar` have particularly expensive `toString` functions.

The proper way to do something like this in java is either

    log.trace("Doing $1 to $2", foo, bar);
or

    if (log.traceEnabled()) {
      log.trace("Doing " + foo + " to " + bar);
    }
Ideally a logging library should at least not make it easy to make that kind of mistake.
This isn't really something the logging library can do. If the language provides a string interpolation mechanism then that mechanism is what the programmers will reach for first. And the library cannot know that interpolation happened because the language creates the final string before passing it in.

If you want the builtin interpolation to become a noop in the face runtime log disabling then the logging library has to be a builtin too.

I feel like there's a parallel with SQL where you want to discourage manual interpolation. Taking inspiration from it may help: you may not fully solve it but there are some API ideas and patterns.

A logging framework may have the equivalent of prepared statements. You may also nudge usage where the raw string API is `log.traceRaw(String rawMessage)` while the parametrized one has the nicer naming `log.trace(Template t, param1, param2)`.

You can have 0 parameters and the template is a string...
The point of my message is that you should avoid the `log(string)` signature. Even if it's appealing, it's an easy perf trap.

There are many ideas if you look at SQL libs. In my example I used a different type but there other solutions. Be creative.

    logger.log(new Template("foo"))`
    logger.log("foo", [])
    logger.prepare("foo").log()
Ideally, but realistically, I have never heard of any major programming language that allows you to express "this function only accepts static constant string literal".
Python has LiteralString for this exact purpose. It's only on the type checker level, but type checking should be part of most modern Python workflows anyway. I've seen DB libraries use this a lot for SQL parameters.

https://typing.python.org/en/latest/spec/literal.html#litera...

Beyond LiteralString there is now also t-strings, introduced in Python 3.14, that eases how one writes templated strings without loosing out on security. Java has something similar with Template class in Java 21 as preview.
We have this in c++ at Google. It's like securitytypes::StringLiteral. I don't know how it works under the hood, but it indeed only allows string literals.
Even PHP has that these days via static analysis https://phpstan.org/writing-php-code/phpdoc-types#other-adva...
In Rust, this can almost be expressed as `arg: &'static str` to accept a reference to a string whose lifetime never ends. I say “almost” because this allows both string literals and references to static (but dynamically generated) string.

For Rust’s macros, a literal can be expressed as `$arg:lit`. This does allow other literals as well, such as int or float literals, but typically the generated code would only work for a string literal.

c++20 offers `consteval` to make this clear, but you can do some simple macro wizardry in c++11 to do this:

    #define foo(x) ( \
        (void)std::integral_constant<char, (x)[0]>::value, \
        foo_impl(x) \
    )
(the re-evaluation of x doesn't matter if it compiles). You can also use a user-defined literal which has a different ergonomic problem.
Not the language, but the linter can do it. IntelliJ inspections warn you if you do it: https://www.jetbrains.com/help/inspectopedia/StringConcatena...
it does seem like something a good static analysis tool should be able to catch though
> The problem is, especially in a hot loop ... The proper way to do something like this in java is either log.trace(..., ...) or if (log.traceEnabled()) log.trace(...)

The former still creates strings, for the garbage collector to mop up even when log.traceEnabled() is false, no?

Also, even if the former or latter is implemented as:

  fn trace(log, str, args...) {
     if (!log.tracing) return;
     // ...
  }
Most optimising JIT compilers will code hoist the if-condition when log.tracing is false, anyway.
This is not true. Any modern Java compiler will generate identical bytecode for both. Try it yourself and see! As a programmer you do not need to worry about such details, this is what the compiler is for. Choose whatever style feels best for you.
> Any modern Java compiler will generate identical bytecode for both. Try it yourself and see!

You may be misunderstanding something here.

If you follow the varargs-style recommendation, then concatenation occurs in the log class.

If you follow the guard-style recommendation, then the interpolated expressions will not be evaluated unless the log level matches.

In the naive approach, concatenation always occurs and all expressions which are part of the interpolation will be evaluated no matter the log level.

Could it be that you were thinking about StringBuffer vs. concatenation, an entirely unrelated problem?

Still quite like the windows log approach which (if logged) stores the template as just the id, with the values, saving lots of storage as well eg 123, foo, bar. You can concatenate in the reader.
So, it costs perf every time it’s read, instead of when it’s written (once). And of course has a lot of overhead to store metadata. Bad design. As usual.
Most logs are probably never read, but nevertheless should be written (fast) for unexpected situations when you will later need them. And logging have to be fast, and have minimal performance overhead.
No, the size is a fraction of a text file, much faster to write and read. The only difference is you can't grep like text.
Except it's always written, but almost never read. Something that is fast/non-resource-intensive to write is definitionally a better design for logging.

What metadata? The raw template? That's data in this case, data for the later rendering of logs. Yes, the template plus the params is going to be slightly bigger than a rendered string, but that's the speed/size tradeoff inherent almost everywhere. It may even keep seperate things like the subsystem, event type, log level, etc; which trades off size (again) for speed/ease of filtering. It's all trade-offs, and to blanket declare one method (the Windows method in this case) as just bad design is only displaying your own ignorance, or bias.

How about wrapping the log.trace param in a lambda and monkeypatching log.trace to take a function that returns a string, and of course pushing the conditional to the monkeypatched func.
That is why the popular `tracing` crate in Rust uses macros for logging instead of functions. If the log level is too low, it doesn't evaluate the body of the macro
Does that mean the log level is a compilation parameter? Ideally, log levels shouldn't even be startup parameters, they should be changeable on the fly, at least for any server side code. Having to restart if bad enough, having to recompile to get debug logs would be an extraordinary nightmare (not only do you need to get your customers to reproduce the issue with debug logs, you actually have to ship them new binaries, which likely implies export controls and security validations etc).
I don't know how rust does it, but my internal C++ framework has a global static array so that we can lookup the current log level quickly, and change it at runtime as needed. It is very valuable to turn on specific debug logs at times, when someone has a problem and we want to know what some code is doing
I know this is standard practice, but I personally think it's more professional to attach a gdb like debugger to a process instead of depending on coded log statements.
A very common thing that will happen in professional environments is that you ship software to your customers, and they will occasionally complain that in certain situations (often ones they don't fully understand) the software misbehaves. You can't attach a debugger to your customer's setup that had a problem over the weekend and got restarted: the only solution to debug such issues is to have had programmed logs set up ahead of time.
In my professional life, somewhere over 99% of time, the code suffering the error has either been:

1. Production code running somewhere on a cluster.

2. Released code running somewhere on a end-user's machine.

3. Released production code running somewhere on an end-user's cluster.

And errors happen at weird times, like 3am on a Sunday morning on someone else's cluster. So I'd just as soon not have to wake up, figuring out all the paperwork to get access to some other company's cluster, and then figure out how to attach a debugger. Especially when the error is some non-reproducible corner case in a distributed algorithm that happens once every few months, and the failing process is long gone. Just no.

It is so much easier to ask the user to turn up logging and send me the logs. Nine times out of ten, this will fix the problems. The tenth time, I add more logs and ask the user to keep an eye open.

The idea in Java is to let the JIT optimise away the logging code.

This is more flexible as it still allows runtime configuration of the logging level.

The OP is simply pointing that some programmers are incompetent and call the trace function incorrectly.

Then you still have the overhead of the log.trace function call and the lambda construction (which is not cheap because it has closure over the params being logged and is passed as a param to a function call, so probably gets allocated on the heap)
>Then you still have the overhead of the log.trace function call

That's not an overhead at all. Even if it were it's not compareable to string concatenation.

Regarding overhead of lambda and copying params. Depends on the language, but usually strings are pass by ref and pass by values are just 1 word long, so we are talking one cycle per variable and 8 bytes of memory. Which were already paid anyways.

That said, logging functions that just take a list of vars are even better, like python's print()

> printtrace("var x and y",x,y)

> def printtrace(*kwargs):

>> print(kwargs) if trace else None

Python gets a lot of slack for being a slow language, but you get so much expressiveness that you can invest in optimization after paying a flat cycle cost.

That’s what most languages, including Java do.

The problem the OP is pointing out is that some programmers are incompetent and do string concatenation anyway. A mistake which if anything is even easier in Python thanks to string interpolation.

What you are proposing sounds like a nightmare to debug. The high level perspective of the operation is of course valuable for determining if an investigation is necessary, but the low level perspective in the library code is almost always where the relevant details are hiding. Not logging these details means you are in the dark about anything your abstractions are hiding from higher level code (which is usually a lot)
Those details don't belong in the error log level, that's what info or trace is for.
They were replying to a person who says “it’s almost always wrong for library functions to log anything”. Not just errors.
If it’s not your code how is a log useful vs returning an error?

Even relatively complex operations like say convert this document into a PDF etc basically only has two useful states either it worked or something specific failed at which point just tell me that thing.

Now independent software like web servers or database can have useful logs because they have completely independent interfaces with the outside world. But I call libraries they don’t call me.

That’s a very simple operation. Try “take these 100 user generated pdfs and translate all of them”. Oh, “cannot parse unexpected character 0x001?” Cool beans, I wish I knew more.
That’s ok, I’ll just check the log. 50MB of ‘This is my happy place.’ followed by a one liner “cannot to parse unexpected character 0x001?’

Any library can do a bad job here, that doesn’t come down to logging vs error messages.

Trace can become so voluminous that it is switched on only on a need basis which can be too late for rare events. Also trace level as more a need to use debug tool tends to be less scrutinized for exposing sensitive data making it unsuitable for continuous operation or use in live production.
Simple: include those relevant details in the exceptions instead of hiding them.
At the extreme end: If my Javascript frontend is being told about a database configuration error happening in the backend when a call with specific parameters is made - that is a SERIOUS security problem.

Errors are massaged for the reader - a database access library will know that a DNS error occurred and that is (the first step for debugging) why it cannot connect to the specified datastore. The service layer caller does not need to know that there is a DNS error, it just needs to know that the specified datastore is uncontactable (and then it can move on to the approriate resilience strategy, retry that same datastore, fallback to a different datastore, or tell the API that it cannot complete the call at all).

The caller can then decide what to do (typically say "Well, I tried, but nothing's happening, have yourself a merry 500)

It makes no sense for the Service level to know the details of why the database access layer could not connect, no more than it makes any sense for the database access layer to know why there is a DNS configuration error - the database access just needs to log the reasons (for humans to investigate), and tell the caller (the service layer) that it could not do the task it was asked to do.

If the service layer is told that the database access layer encountered a DNS problem, what is it going to do?

Nothing, the best it can do is log (tell the humans monitoring it) that a DB access call (to a specific DB service layer) failed, and try something else, which is a generic strategy, one that applies to a host of errors that the database call could return.

That’s how we get errors like ”file not found”, without a file name. A pain for mankind.
> At the extreme end: If my Javascript frontend is being told about a database configuration error happening in the backend when a call with specific parameters is made - that is a SERIOUS security problem.

I'll accept that it is a security problem; why would it be a serious security problem? Any error that the client knows about the configuration is unlikely to be one that is exploitable anyway, and if it is (for example, the client gets told "could not connect to 192.168.1.139:5432"), then you have bigger problems than sending error messages to clients.

What sort of example did you have in mind that makes this a serious security problem?

2. Verbose Error Messages: When Your Application Talks Too Much Verbose error messages represent another common misconfiguration that gifts critical information to attackers. When applications encounter errors, they often generate detailed messages intended for developers. In production environments, these messages can reveal:

Technical infrastructure details: Database types, versions, server configurations File paths and directory structures: Enabling directory traversal attacks Programming logic: Including code snippets that expose application behavior Sensitive credentials: Database connection strings, usernames, passwords Software versions: Allowing attackers to identify known vulnerabilities The impact of this vulnerability is significant. Error messages can expose not just that a system runs PHP, but that it runs a specific, unsupported version — providing attackers with a clear exploitation path.

Security researchers have documented numerous instances where verbose error messages enabled breaches:

Dating App Vulnerability (2016): Tinder’s login system displayed error messages indicating whether specific email addresses were registered, enabling brute-force attacks to identify valid accounts. Password Manager Leak (2019): A popular password manager’s login form disclosed through error messages whether email addresses were registered with the service, facilitating targeted attacks. Government Agency Breach (2020): A major US government agency’s website displayed error messages revealing whether specific usernames existed in the system, enabling attackers to enumerate valid accounts.

[1] https://medium.com/@instatunnel/security-misconfiguration-th...

First, I disagree that "user emails can be brute-forced" is a serious security issue.

I mean, sure, it's a security issue, but on a scale of 1-10, with 1 being "security issue, we'll fix in next point release" and 10 being "All-hands until this emergency patch goes out, and we keep the system offline while fixing it", this is definitely a 1.

Secondly, this barely counts as a security issue; some systems I worked on recently required error messages to tell the user how to fix the error they got. You don't simply say (for example) "attachment not found", you say "Field $FIELD is empty. This is a mandatory field" or similar.

There are still plenty of secure systems out there that will direct the user to create an account if an unregistered user attempts to log in.

It's a trade-off in usability: some places go the "Authentication failed (but we won't tell you why)" route, and others go the "Click here to sign up" route.

It’s not that simple. First, this results in exception messages that are a concatenation of multiple levels of error escalation. These become difficult to read and have to be broken up again in reverse order.

Second, it can lose information about at what exact time and in what exact order things happened. For example, cleanup operations during stack unwinding can also produce log messages, and then it’s not clear anymore that the original error happened before those.

Even when you include a timestamp at each level, that’s often not sufficient to establish a unique ordering, unless you add some sort of unique counter.

It gets even more complicated when exceptions are escalated across thread boundaries.

> First, this results in exception messages that are a concatenation of multiple levels of error escalation. These become difficult to read and have to be broken up again in reverse order

Personally I don't mind it... the whole "$outer: $inner" convention naturally lends to messages that still parse in my brain and actually include the details in a pretty natural way. Something like:

"Error starting up: Could not connect to database: Could not read database configuration: Could not open config file: Permission denied"

Tells me the config file for the database has broken permissions. Because the permission denied error caused a failure opening the config file, which caused a failure reading the database configure, which caused a failure connecting to the database, which caused an error starting up. It's deterministic in that for "$outer: $inner", $inner always caused $outer.

Maybe it's just experience though, in a sense that it takes a lot of time and familiarity for someone to actually prefer the above. Non-technical people probably hate such messages and I don't necessarily blame them.

Sometimes you don’t have all the relevant details in scope at the point of error. For instance some recoverable thing might have happened first which exercises a backup path with slightly different data. This is not exception worthy and execution continues. Then maybe some piece of data in this backup path interacts poorly with some other backend causing an error. The exception won’t tell you how you got there, only where you got stuck. Logging can tell you the steps that led up to that, which is useful. Of course you need a way to deal with verbose logs effectively, but such systems aren’t exactly rare these days.
> Then maybe some piece of data in this backup path interacts poorly with some other backend causing an error. The exception won’t tell you how you got there, only where you got stuck.

Then catch the exception on the backup path and wrap it in a custom exception that conveys to the handler the fact that you were on the backup path. Then throw the new exception.

Not all problems cause exceptions.
That's a matter of good taste, but there's nothing preventing you from throwing exceptions on every issue and requiring consumers to handle them
Imagine you have a caching library that handles DB fallback. A cache that should be there but goes missing is arguably an issue.

Should if throw an exception for that to let you know, or should it gracefully fallback so your service stays alive ? The middle ground is leaving a log and chugging along, your proposition throws that out of the window.

You can log your IO and as long as your functions are idempotent that should be enough info to replicate.
Assuming everything is idempotent is a tall order.

There are a lot of libraries that haven non-idempotent actions. There are a lot of inputs that can be problematic to log, too.

Say like opening a file?

I guess in those cases standard practice is for lib to return a detailed error yeah.

As far as traces, trying to solve issues that depend on external systems is indeed a tall order for your code. Isn't it beyond the scope of the thing being programmed.

From my experience working on B2B applications, I am happy that everything is generally spammed to the logs because there would simply be no other reasonable way to diagnose many problems.

It is very, very common that the code that you have written isn't even the code that executes. It gets modified by enterprise anti virus or "endpoint security". All too often do I see "File.Open" calls return true that the caller has access, but actually what's happened is AV has intercepted the call, blocked it improperly, and returns 0 bytes file that exists (even though there is actually a larger file there) instead of saying the file cannot open.

I will never, in a million years, be granted access to attach a debugger to such a client computer. In fact, they will not even initially disclose that they are using anti virus. They will just say the machine is set up per company policy and that your software doesn't work, fix it. The assumption is always that your software is to blame and they give you nearly nothing, except for the logs.

The only way I ever get this solved in a reasonable amount of time is by looking at verbose logs, determining that the scenario they have described is impossible, explaining which series of log messages is not able to occur, yet occurred on their system, and ask them to investigate further. Usually this ends up being closed with a resolution like "Checked SuperProtectPro360 logs and found it was writing infernal error logs at the same time as using the software. Adjusted the monitoring settings and problem is now resolved."

I don’t really understand what you mean about opening files. Is this just an example of an idempotent action or is there some specific significance here?

Either way logging the input (file name) is notably not sufficient for debugging if the file can change between invocations. The action can be idempotent and still be affected by other changes in the system.

> trying to solve issues that depend on external systems is indeed a tall order for your code. Isn't it beyond the scope of the thing being programmed.

If my program is broken I need it fixed regardless of why it’s broken. The specific example here of a file changing is likely to manifest as flakiness that’s impossible to diagnose without detailed logs from within the library.

I was just trying to think of an example of a non idempotent function. As in it depends on an external IO device.

I will say that error handling and logging in general is one of my weakpoints, but I made a comment about my approach so far being dbg/pdb based, attaching a debugger and creating breakpoints and prints ad-hoc rather than writing them in code. I'm sure there's reasons why it isn't used as much and logging in code is so much more common, but I have faith that it's a path worth specializing in.

Back to the file reading example, for a non-idempotent function. Considering we are using an encapsulating approach we have to split ourselves into 3 roles. We can be the IO library writer, we can be the calling code writer, and we can be an admin responsible for the whole product. I think a common trap engineers fall for is trying to keep all of the "global" context (or as much as they can handle) at all times.

In this case of course we wouldn't be writing the non-idempotent library, so of course that's not a hat we wear, do not quite care about the innards of the function and its state, rather we have a well defined set of errors that are part of the interface of the function (EINVAL, EACCES, EEXIST).

In this sense we respect the encapsulation boundaries and are provided the information necessary by the library. If we ever need to dive into the actual library code, first the encapsulation is broken and we are dealing with a leaky abstraction, second we just dive into the library code, (or the filesystem admin logs themselves).

It's not precisely the type of responsibility that can be handled at design time and in code anyways, when we code we are wearing the calling-module programmer hat. We cannot think of everything that the sysadmin might need at the time of experiencing an error, we have to think that they will be sufficiently armed with enough tools to gather the information necessary with other tools. And thank god for that! checking /proc/fs and looking at crash dumps, and attaching processes with dbg will yield far better info than relying on whatever print statements you somehow added to your program.

Anyways at least that's my take on the specific example of glibc-like implementations of POSIX file operations like open(). I'm sure the implications may change for other non-idempotent functions, but at some point, talking about specifics is a bit more productive than talking in the abstract.

I very much appreciate libraries that provide optional logging. Tracing error causes in network protocol calls can be pretty near impossible without throwing a library/package/crate/whatever into TRACE mode.

Of course they shouldn't just be dumping text to stdout/stderr, but as long as the library logging is optional (or only logs when the library has reached some kind of unrecoverable state with instructions to file a bug report), logging is often the right call.

It's easier to have logs and turn them off at compile time/runtime than to not have logs and need them once deployed.

I think an example where libraries could sensibly log error is if you have a condition which is recoverable but may cause a significant slowdown, including a potential DoS issue, and the application owner can remediate.

You don't want to throw because destroying someone's production isn't worth it. You don't want to silent continue in that state because realistically there's no way for application owner to understand what is happening and why.

We call those warnings, and it's very common to downgrade errors to warnings by wrapping an exception and printing the trace as you would an exception.
Logging warnings are cowardly, you just push the decision to the log consumer to decide if the error should be acted on.

Warnings are just errors that no one wants to deal with.

Warnings are for where you expect someplace else to know/log if it really is an error but it might also be normal. You might log why a file io operation failed: if the caller recovers somehow it isn't an errer, but if they can't they log an error and when investigating the warning gives the detail you need to figure it out.
Who proactively investigates warnings?
statistacs are someimes run and the most common investigated (normally shut up the noise)

mostly though when you are on a known problem warnings should be a useful filter to find where in the logs the problem might have started, then you use that timestamp to find info logs in the same area

Warning logs are usually polluted with stuff nobody wants to fix but try to wash their hands off with a log. Like deprecated calls or error logs that got demoted because it didn't matter in practice.

Anything that has a measurable impact on production should be logged above that, except if your system ignores log levels in the first place, but that's another can of worms.

In such scenarios it makes sense to give clients an opportunity to react on such conditions programmatically, so just logging is wrong choice and if there’s a call back to client, client can decide whether to log it and how.
It's a nice idea but I've literally never seen it done, so I would be interested if you have examples of major libraries that do this. Abstractly it doesn't really seem to work to me in place of simple logs.

One test case here is that your library has existed for a decade and was fast, but Java removed a method that let you make it fast, but you can still run slow without that API. Java the runtime has a flag that the end use can enable to turn it back on a for a stop gap. How do you expect this to work in your model, you expect to have an onUnnecessarilySlow() callback already set up that all of your users have hooked up which is never invoked for a decade, and then once it actually happens you start calling it and expect it to do something at all sane in those systems?

Second example is all of the scenarios where you're some transitively used library for many users, it makes and callback strategy immediately not work if the person who needs to know about the situation and could take action is the application owner rather than the people writing library code which called you. It would require every library to offer these same callbacks and transitively propagate things, which would only work if it was just such a firm idiomatic pattern in some language ecosystem and I don't believe that it is in any language ecosystem.

> library has existed for a decade

>but Java removed a method that let you make it fast, but you can still run slow without that API

I’d like to see an example of that, because this is extremely hypothetical scenario. I don’t think any library is so advanced to anticipate such scenarios and write something to log. And of course Java specifically has longer cycle of deprecation and removal. :)

As for your second example, let’s say library A is smart and can detect certain issues. Library B depending on it is at higher abstraction level, so it has enough business context to react on them. I don’t think it’s necessary to propagate the problem and leak implementation details in this scenario.

Protobuf is the example I had in mind. It uses sun.misc.Unsafe which is being removed in upcoming Java releases, but it has a slow fallback path. It logs a warning when it runs if it can tell it's only using the fallback path but the fast path is still available if the application owner set a flag to turn it back on if they want to:

https://github.com/protocolbuffers/protobuf/issues/20760

Java Protobuf also logs a warning now if you can tell you are using gencode old enough that it's covered by a DoS CVE. They actually did a release that broke compatability of the CVE covered gencode but restored it and print a warning in a newer release.

What’s stopping you from using the replacements provided in VarHandle and MemorySegment? Just wanting to support the 10 year old JDK 8?
I’ve written code that followed this model, but it almost always just maps to logging anyway, and the rest of the time it’s narrow options presented in the callback. e.g. Retry vs wait vs abort.

It’s very rarely realistic that a client would code up meaningful paths for every possible failure mode in a library. These callbacks are usually reserved for expected conditions.

> almost always just maps to logging anyway

Yes, that’s the point. You log it until you encounter it for the first time, then you know more and can do something meaningful. E.g. let’s say you build an API client and library offers callback for HTTP 429. You don’t expect it to happen, so just log the errors in a generic handler in client code, but then after some business logic change you hit 429 for the first time. If library offers you control over what is going to happen next, you may decide how exactly you will retry and what happens to your state in between the attempts. If library just logs and starts retry cycle, you may get a performance hit that will be harder to fix.

Defining a callback for every situation where a library might encounter an unexpected condition and pointing them all at the logs seems like a massive waste of time.

I would much prefer a library have sane defaults, reasonable logging, and a way for me to plug in callbacks where needed. Writing On429 and a hundred other functions that just point to Logger.Log is not a good use of time.

This sub-thread in my understanding is about a special case (a non-error mode that client may want to avoid, in which case explicit callback makes sense), not about all possible unexpected errors. I’m not suggesting hooks as the best approach. And of course “on429” is the last thing I would think about when designing this. There are better ways.
This seems like such an obvious answer to the problem, your program isn't truly modularized if logging is global. If an error is unexpected it should bubble all the way up, but if it's expected and dealt with, the error message should be suppressed or its type changed to a warning.
I’ve worked on systems with “modularized” logging. It’s never been pleasant because investigations involve stitching together a bunch of different log sources to understand erase actually happened. A global log dump with attribution (module/component/file/line) is far easier to work with.
You need a tuple: (context, level)

The application owner should be able to adjust the contexts up or down. This is the point of ownership and where responsibility over which logs matter is handled.

A library author might have ideas and provide useful suggestions, but it's ultimately the application owner who decides. Some libraries have huge blast radius and their `error` might be your `error` too. In other contexts, it could just be a warning. Library authors should make a reasonable guess about who their customer is and try to provide semantic, granular, and controllable failure behavior.

As an example, Rust's logging ecosystem provides nice facilities for fine-grained tamping down of errors by crate (library) or module name. Other languages and logging libraries let you do this as well.

That capability just isn't adopted everywhere.

Python's built-in logging is the same if used correctly, where the library gets a logger based on its module name (this part isn't enforced) and the application can add a handler to that logger to route the logs differently if needed.
On paper, USDT probes are the best way for libraries (and binaries) to provide information for debugging because they can be used programmatically and have no performance overhead until they are measured but unfortunately they are not widely used.
Conflicting goals for the predominant libraries is what causes this. Log4J2 has a rewrite appender that solves the problem. But if you want zero-copy etc I don’t think there’s such a solution.
It may be unwise to log errors at low layers but logging informational and debug messages are useful (at least, when the caller enables them).
Wonder if someone used effect handlers for error logging. Sounds like a natural and modular way of handling this problem.