Hacker News new | ask | show | jobs
by physicles 1035 days ago
Indeed, our code base is littered with fmt.Errorf("...: %w", err), but that only works if enough places in the code add context. Currently only about 15% of return sites do this.

And I disagree that the cost of carrying around the callstack is something to worry about. Errors are akin to exceptions in C++/Java: no happy path should rely on errors for control flow (except io.EOF, but that won't generate a call stack). They should be rare enough that any cost below about 1ms and 10k is negligible.

3 comments

Every error should be annotated at the call site. fmt.Errorf("...: %w", err) isn't litter, it should be a basic expectation of any code which passes code review.

> Errors are akin to exceptions in C++/Java: no happy path should rely on errors for control flow (except io.EOF, but that won't generate a call stack). They should be rare enough that any cost below about 1ms and 10k is negligible.

This may be true in C++ or Java, but in Go, it is absolutely not the case.

Errors are essential to, and actually the primary driver of, control flow!

Any method or function which is not guaranteed to succeed by the language specification should, generally, return an error. Code which calls such a method or function must always receive and evaluate the returned error.

Happy paths always involve the evaluation and processing of errors received from called methods/functions! Errors are normal, not exceptional.

(Understanding errors as normal rather than exceptional is one of the major things that distinguish junior vs. senior engineers.)

I think there's less daylight between us than it seems.

> Errors are normal, not exceptional.

The _handling_ of errors is normal. Code that doesn't consider errors is not production code.

And granted, in Go, control flow is driven by errors more often than in C++ or Java. Sentinel error values are common. See for example all usage of error.Is, checking for io.EOF, packages that define ErrSituationA and ErrSituationB, etc.

But my argument was about errors that can't be dealt with locally, where the origination and ultimate handling are very far apart. A given flow will encounter these errors relatively rarely compared to the happy path (and if it's not rare, you probably need to fix or change something). Having an intuition about this is important for predicting your code's performance. For example:

- The SQL call failed because the network connection dropped; client gets 500 or 502, or retry.

- A call to an external service failed because the network was bad; it gets retried.

- The SQL call succeeded, but the record the client asked for wasn't found, so the client gets a 404.

- Writing to a temporary file failed because the disk is full, so some batch job fails with an error.

Apart from potential concerns about DoS, worrying too much about the performance of error handling in these relatively rare cases is absolutely premature optimization.

DoS isn't even a concern. I just benchmarked capturing a call stack in Go, and it's on the order of a few microseconds. Unless you're in performance critical code (and you're benchmarking, right?), it's fine.

> But my argument was about errors that can't be dealt with locally, where the origination and ultimate handling are very far apart. A given flow will encounter these errors relatively rarely compared to the happy path (and if it's not rare, you probably need to fix or change something). Having an intuition about this is important for predicting your code's performance.

When code encounters an error, it can either deal with that error programmatically, or return that error to its caller. I don't think you can make any generalized assertions about whether one or the other of these cases is more common, and I'm confident that you can't assert that one or the other of these cases is better or worse than the other, or that one of them represents a problem worth fixing.

Errors potentially occur at every fallible expression. Where an error is handled is orthogonal, and generally unknowable, to the given bit of code that receives that error.

I agree with you that "the performance of error handling" should never be a first-order concern when writing code.

I don't agree with you that capturing a call stack is fast enough to ignore. Calling runtime.Callers (https://pkg.go.dev/runtime#Callers) takes time proportional to the size of the pc []uintptr slice, and can easily get to O(ms) or beyond. It's fine if a given bit of code opts in to this cost, but it's not something that you should do by default; the threshold for performance critical code is O(ns), not O(us).

> worrying too much about the performance of error handling in these relatively rare cases is absolutely premature optimization.

It's not something to worry about, but it's also a premature optimization to include when there is no need. The Go team considered adding stack traces as described before 1.13, postulating that it would be useful, but measurement determined that they were rarely used in the real world.

If your measurements (you are measuring, right?) that pertain to your specific situation tells a different story, they aren't something to be afraid of, but would be silly to make the default for everyone. The standard tools don't need to serve every single use case ever imagined.

The reality is, unless you forget how to program every time you see the word error (which seems to be a thing), in the real world you are never going to just `return err` up, up, up the stack anyway. Even ignoring traceability concerns, that is going to introduce horrible coupling. You wouldn't do that for any arbitrary type T, so why would you for type error? There is nothing special about errors.

> Any method or function which is not guaranteed to succeed by the language specification should, generally, return an error.

Most Go programmers are too scared to panic and abort when invariants are violated. I think most codebases contain at least 2x as much error handling as is really necessary.

Nope.

Panic isn't an ersatz error reporting mechanism, it's a tool of absolute last resort. Any function or method that can fail should return an error, and should signal failure via that error. Callers that invoke any fallible function or method should always receive, inspect, and respond to the returned error.

Who said panic should report errors? I specifically said abort…
Panic doesn't reliably abort the program.

And, in any case, arbitrary code doesn't have the right to abort the program in the first place! Only func main is allowed to terminate the process. Errors in any other context should always be reported to the caller via normal control flow, i.e. return.

This is exactly the broken view I mean.
> that only works if enough places in the code add context.

It would be a bit odd to not add context, wouldn't it? Same goes for any value. This is not exclusive to errors. If you consider a function which returns T, the T value could equally be hard to trace back if you find you need to determine its call site and someone blindly returned it up the stack. There is nothing special about errors.

While ideally you are returning more context than Errorf allows, indeed, it is a good last resort. If your codebase is littered with blind returns, the good news is that it shouldn't be too hard to create a static analyzer which finds blind returns of the error type and injects the Errorf pattern.

Are you suggesting it's OK if ParseInt failures take 1ms? Or should ParseInt use a different "kind of error" that's not commensurate with the regular error kind?

Do you think most errors look more like ParseInt, or more like sql.Open where 1ms might be acceptable? (Do you think a call stack from the insides of sql.Open would be useful? My experience, mostly not...)

So the stacks should probably only be for "complex errors", and only for frames that happen in code you (hand waving) "care about". Maybe your programs just have far too complex internal error handling?

See my response to a sibling. I wasn't clear; I was implicitly differentiating between these:

1. errors that can be handled locally (such as parsing; in other languages, these situations are often signaled with return values instead of exceptions)

2. errors that can't be handled locally (such as network errors; other languages use exceptions for these)

My argument was that worrying too much about error handling performance in #2 is premature optimization. 1ms is extreme, but the actual figure of capturing a call stack in Go -- several microseconds, by my benchmark -- puts it squarely in the "don't worry about it unless your code is performance-critical" category.

An error is an error. The immediate caller is always responsible for detecting and handling errors in whatever way is appropriate for their calling context.