Hacker News new | ask | show | jobs
by kiitos 1035 days ago
Every error should be annotated at the call site. fmt.Errorf("...: %w", err) isn't litter, it should be a basic expectation of any code which passes code review.

> Errors are akin to exceptions in C++/Java: no happy path should rely on errors for control flow (except io.EOF, but that won't generate a call stack). They should be rare enough that any cost below about 1ms and 10k is negligible.

This may be true in C++ or Java, but in Go, it is absolutely not the case.

Errors are essential to, and actually the primary driver of, control flow!

Any method or function which is not guaranteed to succeed by the language specification should, generally, return an error. Code which calls such a method or function must always receive and evaluate the returned error.

Happy paths always involve the evaluation and processing of errors received from called methods/functions! Errors are normal, not exceptional.

(Understanding errors as normal rather than exceptional is one of the major things that distinguish junior vs. senior engineers.)

2 comments

I think there's less daylight between us than it seems.

> Errors are normal, not exceptional.

The _handling_ of errors is normal. Code that doesn't consider errors is not production code.

And granted, in Go, control flow is driven by errors more often than in C++ or Java. Sentinel error values are common. See for example all usage of error.Is, checking for io.EOF, packages that define ErrSituationA and ErrSituationB, etc.

But my argument was about errors that can't be dealt with locally, where the origination and ultimate handling are very far apart. A given flow will encounter these errors relatively rarely compared to the happy path (and if it's not rare, you probably need to fix or change something). Having an intuition about this is important for predicting your code's performance. For example:

- The SQL call failed because the network connection dropped; client gets 500 or 502, or retry.

- A call to an external service failed because the network was bad; it gets retried.

- The SQL call succeeded, but the record the client asked for wasn't found, so the client gets a 404.

- Writing to a temporary file failed because the disk is full, so some batch job fails with an error.

Apart from potential concerns about DoS, worrying too much about the performance of error handling in these relatively rare cases is absolutely premature optimization.

DoS isn't even a concern. I just benchmarked capturing a call stack in Go, and it's on the order of a few microseconds. Unless you're in performance critical code (and you're benchmarking, right?), it's fine.

> But my argument was about errors that can't be dealt with locally, where the origination and ultimate handling are very far apart. A given flow will encounter these errors relatively rarely compared to the happy path (and if it's not rare, you probably need to fix or change something). Having an intuition about this is important for predicting your code's performance.

When code encounters an error, it can either deal with that error programmatically, or return that error to its caller. I don't think you can make any generalized assertions about whether one or the other of these cases is more common, and I'm confident that you can't assert that one or the other of these cases is better or worse than the other, or that one of them represents a problem worth fixing.

Errors potentially occur at every fallible expression. Where an error is handled is orthogonal, and generally unknowable, to the given bit of code that receives that error.

I agree with you that "the performance of error handling" should never be a first-order concern when writing code.

I don't agree with you that capturing a call stack is fast enough to ignore. Calling runtime.Callers (https://pkg.go.dev/runtime#Callers) takes time proportional to the size of the pc []uintptr slice, and can easily get to O(ms) or beyond. It's fine if a given bit of code opts in to this cost, but it's not something that you should do by default; the threshold for performance critical code is O(ns), not O(us).

> worrying too much about the performance of error handling in these relatively rare cases is absolutely premature optimization.

It's not something to worry about, but it's also a premature optimization to include when there is no need. The Go team considered adding stack traces as described before 1.13, postulating that it would be useful, but measurement determined that they were rarely used in the real world.

If your measurements (you are measuring, right?) that pertain to your specific situation tells a different story, they aren't something to be afraid of, but would be silly to make the default for everyone. The standard tools don't need to serve every single use case ever imagined.

The reality is, unless you forget how to program every time you see the word error (which seems to be a thing), in the real world you are never going to just `return err` up, up, up the stack anyway. Even ignoring traceability concerns, that is going to introduce horrible coupling. You wouldn't do that for any arbitrary type T, so why would you for type error? There is nothing special about errors.

> Any method or function which is not guaranteed to succeed by the language specification should, generally, return an error.

Most Go programmers are too scared to panic and abort when invariants are violated. I think most codebases contain at least 2x as much error handling as is really necessary.

Nope.

Panic isn't an ersatz error reporting mechanism, it's a tool of absolute last resort. Any function or method that can fail should return an error, and should signal failure via that error. Callers that invoke any fallible function or method should always receive, inspect, and respond to the returned error.

Who said panic should report errors? I specifically said abort…
Panic doesn't reliably abort the program.

And, in any case, arbitrary code doesn't have the right to abort the program in the first place! Only func main is allowed to terminate the process. Errors in any other context should always be reported to the caller via normal control flow, i.e. return.

This is exactly the broken view I mean.
If you allow arbitrary code to terminate the process, then the control flow of the program is effectively non-deterministic, and impossible to model, or even really understand. Software written in this way is fundamentally unreliable.