Hacker News new | ask | show | jobs
by camdencheek 1252 days ago
> The WaitGroup looks suspiciously like errgroup

I heavily used errgroup before creating conc, so the design is likely strongly influenced by that of errgroup even if not consciously. Conc was partially built to address the shortcomings of errgroup (from my perspective). Probably worth adding a "prior art" section to the README, but many of the ideas in conc are not unique.

> In go land, this seems desirable.

I mostly agree, which is why `Wait()` propagates the panic rather than returning it or logging it. This keeps panics scoped to the spawning goroutine and enables getting stacktraces from both the spawning goroutine and the spawned goroutine, which is quite useful for debugging.

That said, crashing the whole webserver because of one misbehaving request is not necessarily a good tradeoff. Conc moves panics into the spawning goroutine, which makes it possible to do things like catch panics at the top of a request and return a useful error to the caller, even if that error is just "nil pointer dereference" with a stacktrace. It's up to the user to decide what to do with propagated panics.

3 comments

A "prior art" section is especially useful for people evaluating your library!
> That said, crashing the whole webserver because of one misbehaving request is not necessarily a good tradeoff. Conc moves panics into the spawning goroutine, which makes it possible to do things like catch panics at the top of a request and return a useful error to the caller, even if that error is just "nil pointer dereference" with a stacktrace. It's up to the user to decide what to do with propagated panics.

The problem is that panics aren't "goroutine scoped" in terms of their potential impact. So it really shouldn't be up to the user to decide how to handle a panic. Application code shouldn't handle panics at all! They're not just a different way to yield an error, they're critical bugs which shouldn't occur at all.

> panics aren't "goroutine scoped" in terms of their potential impact

I'm with ya there. However, there are also many classes of logic errors that are not goroutine-scoped. And there are many panics that do not have impact outside of the goroutine's scope. In my experience, this is true of most panics.

In practice, panics happen. They are (almost) always indicative of a bug, and almost always mean there is something that needs fixed. However, if a subsystem of my application is broken and panicking, there's a pretty good chance that reporting the panic without crashing the process will provide a better end user experience than just blowing up.

Yes, that means I'm accepting the risk that my application is left in an inconsistent state, but coupled with good observability/reporting, that's a tradeoff I'm willing to make.

(bonus: this is especially true when propagating panics allow me to capture more debugging information to fix the panics faster)

> In practice, panics happen.

I guess this is the crux of the issue. I don't think this is true, or needs to be true. It certainly hasn't been my experience. I think assuming panics are normal will take you down some paths that make it basically impossible to write reliable software. But, to each their own.

> I'm accepting the risk that my application is left in an inconsistent state,

Inconsistent state makes it impossible to reason about your program's execution or outcomes. An account value that previously had balance = 0 may now have balance = 1000. Is this acceptable risk?

Since defers run during panics for exactly this reason, no. You can in fact guarantee that is not the case.

Runtime-safety "panics" in Go, like concurrently modifying and iterating a map that can lead to other memory being corrupted, tend to abort the whole process immediately and not be suppress-able panics.

> Runtime-safety "panics" in Go, like concurrently modifying and iterating a map that can lead to other memory being corrupted, tend to abort the whole process immediately and not be suppress-able panics.

https://go.dev/doc/effective_go#panic

> The usual way to report an error to a caller is to return an error as an extra return value. . . . But what if the error is unrecoverable? Sometimes the program simply cannot continue. For this purpose, there is a built-in function panic that in effect creates a run-time error that will stop the program

Panics express unrecoverable failures. This is plainly stated in the language documentation. There are exceptions to this rule, but they are exceptional.

That's a style decision, not a correctness issue. You are claiming it is a correctness issue.
> An account value that previously had balance = 0 may now have balance = 1000. Is this acceptable risk?

Your entire web app process crashes due to a panic every time a request triggers an extremely rare edge case. A hacker discovers this and uses it to conduct a DoS attack. Is this acceptable risk?

Yes, definitely preferable! Denial of service is definitely better than invalid state, right?
Why the heck are you writing web apps that panic?
This is equivalent to asking "Why the heck are you writing code with bugs?"

Sure, if we could write code without bugs, we wouldn't need to suppress panics. But since we do tend to write code bugs and some of them are bugs that can be detected by the runtime - we get panics.

If you hate panics, you can do better than Go and go for a language with a stronger type system, where you won't get nil pointer panics or interface conversion panics, but even an almost onerously-tyepesafe language like Haskell still panics on some bogus operations such as division by zero or trying to read the head of an empty list. Perhaps Idris really have no runtime errors but they are quite niche.

It is pretty easy to have accidental panics in Go, for instance due to a runtime assertion that unexpectedly failed
Because people make mistakes?
> assuming panics are normal will take you down some paths that make it basically impossible to write reliable software

Na, citation needed. Assuming "panics are normal" is just extrapolating from "errors are normal". It makes reliable software more reliable.

it's pretty obvious that it could influence new developers into the wrong direction though. Saying things like "ha, let's not bother checking this, at worst it'll just panic and i'll simply abort the request".

Which would definitely impact the quality of the software overall in a bad way.

I'd not be so sure. Accepting that everything that can fail will fail shaped me as a young developer, and "Exceptional C++" had a huge influence on me. Now my approach for new code I review is this:

* Make sure you support properly unrolling the stack

* Keep a clean failure boundary, probably somewhere on top of your loop

* Fastidiously check your preconditions

* Fail brutally if they're not met

* Improve from there

Panics are categorically different than errors. Errors are normal, panics are not normal.
> I guess this is the crux of the issue. I don't think this is true, or needs to be true. It certainly hasn't been my experience. I think assuming panics are normal will take you down some paths that make it basically impossible to write reliable software. But, to each their own.

I'd rather have the control to log the panic on a service rather than it forcibly dying and taking down any other connections with it. Kube will just spin up a new one anyway, which just introduces a downtime gap that doesn't need to exist.

I don't think I'm effectively communicating the impact of handling a panic and continuing program execution. A panic that comes from a memory model violation (as one example) can change the value of anything in the memory space of the program. If the program continues, that change will go undetected, and can have results that make the program completely nondeterministic. This isn't a doom and gloom, sky-is-falling prognostication, it's literally what is defined by the spec and memory model of the language.
> A panic that comes from a memory model violation (as one example) can change the value of anything in the memory space of the program ... This isn't a doom and gloom, sky-is-falling prognostication, it's literally what is defined by the spec and memory model of the language.

I do not think you are correct. Go has a class of unrecoverable panics for this specific reason. Go also runs deferred functions after a recoverable panic, so the notion that it's unsafe to handle it, or continue executiona after doesn't hold at all - it is literally a first-class feature of the language.

I have not seen an instance of a recoverable panic that is raised _after_ such a fatal operation. If you have an example of such, I would love to see it.

I would agree if it weren’t super easy to cause a panic in go.

Index slice out of bounds? panic. Close a channel twice? Panic. Incorrect type assertion? Panic. Dereference nil pointer? Panic.

I would argue that all of these examples which are the most common in my experience are “goroutine scoped” because the goroutine was aborted before they potentially modified the application state in an unknown way.

It’s like not in C, or C++ where out of bounds access has now put the entire application into an unknown state.

> Index slice out of bounds? panic. Close a channel twice? Panic. Incorrect type assertion? Panic. Dereference nil pointer? Panic.

These are all really bad things which should never survive to production code. It is not difficult to detect and prevent them.

> I would argue that all of these examples which are the most common in my experience are “goroutine scoped” because the goroutine was aborted before they potentially modified the application state in an unknown way.

What makes you think that terminating the goroutine that triggered these panics prevents them from impacting the process state?

> It’s like not in C, or C++ where out of bounds access has now put the entire application into an unknown state.

What makes you think this is the case? Panics have unknowable impact, and many panics (e.g. data races) absolutely do put the program into an unknown state.

>These are all really bad things which should never survive to production code. It is not difficult to detect and prevent them.

This is equivalent to saying "out of bounds memory writes are not difficult to detect and prevent in C code". Like actually equivalent (possibly worse), not just "well if you squint they look similar".

Of course it's not hard most of the time. Being perfect is beyond hard though. And if you're not perfect, you might open the door to anything in C, or cluster-destroying rolling crashes in Go.

Sometimes shutting down every piece of your software if that happens is the correct choice, and sometimes it's so far beyond reasonable that it's ludicrous to argue in favor of "every panic is an abort".

> Sometimes shutting down every piece of your software if that happens is the correct choice, and sometimes it's so far beyond reasonable that it's ludicrous to argue in favor of "every panic is an abort".

Very much this. And even for the same project: in some cases, I'm a fan of employing a quite strict error handling policy in dev environments (crash and burn) and using a more lenient approach in prod (elevated log level). In my experience, this can result in a robust product. Most importantly, this means the decision is not even made by the application programmer, sometimes it's a config thing.

Go has much stronger memory safety guarantees than C does. They aren't really comparable.
x == y can panic if interface values contain incomparable fields in unexported nested structs, how would I check for that? Should we let it become a query of death and bet thousands of peers’ jobs on it never happening?
Is this really the case? Can you link to anything or an example on go.dev/play/ ?

I can find a mention of "cmp.Equal" having that behavior, but that's just a third-party package panic.

It's true, but you'd never really write code like this

https://go.dev/play/p/r9NkQb6bQTx

Link to an example? I don't think this is true, unless you're playing stupid games with your code, which wouldn't pass code review.
don't do that?

this kind of thing is why deepcompare exists to begin with

It seems unrealistic to assume that everyone on a reasonably sized team knows all of the subtle edge cases to avoid and never makes mistakes
I can nag everyone to use reflect.DeepEqual and live with some false negatives, but maps always use k1 == k2.
> (e.g. data races) absolutely do put the program into an unknown state. Data races do not necessarily result in panics.

Many (perhaps even most?) data races would not result in panic but just in garbled, missing, duplicate, out-of-order or otherwise incorrect data.

Data-race induced panics are generally the side-effect of a data race, not a direct protection against. They can often be inconsistent: e.g. a data race in a slice that contains a binary data format could garble a variable-length string prefix and produce an index-out-of-bounds panic. Or it could prematurely consume a shared pointer and overwrite it with nil, only to have the nil pointer dereferenced by another goroutine. These kind of panics are unpredictable.

If your application has shared global state (in-memory or even a database), it may become inconsistent due to data races. But whether data-race induced indicate irreversibly corrupted global state that requires (and can be fixed with) application restart - that is case-by-case thing.

Let's say your application has some shared state that got corrupted and the corruption triggered a panic down the line.

If your shared state is persisted in a database or some other distributed mechanism and that state got corrupted: restarting the application won't help you.

If your shared state is scoped at the HTTP request level (or whichever boundary you choose for suppressing your panics): you don't need to restart the application. The request is already terminated, along with its shared state.

Which leaves us with in-memory global state. This kind of state is generally minimized in the type of microservice and network infrastructure applications that Go is often used for.

A very small percentage of your panics will indicate corruption of such state. Will you be willing to risk service downtime in order to protect against the small possibility that the service has run into a state where its shared in-memory data became corrupted?

in memory or some other distributed mechanism and that state got corrupted: restarting the application won't help you.

I tend to agree with you that these are relatively easy things to detect. I see no reason for the downvotes.

Production systems should have relatively robust testing whose coverage can be increased over time. When something panics, the cause of the panic should be fixed so that the panic never happens again. Over time panics shouldn't be happening.

Then again the systems that I have relied on I have written on my own without other hands in the pot so maybe I just don't have to deal with the reality of other programmers phoning things in.

If your programming language handles very common errors by crashing the entire application, and if preventing these crashes is actively discouraged, then that suggests a flaw in the language itself.

This would be fine for a low-level language like C where you need to allow SEGFAULTs, but designing it into a high-level language makes no sense.

Go panics should not be used for very common errors.
A lot of very common operations can panic: division, dereferencing a pointer, invoking an interface method, indexing/slicing an array/slice/string, asserting the type of an interface, and converting a slice to pointer to array. It’s possible to check, but I’ve never seen a tool that verifies you never use any of these without checking. You also have to check for nil channels, though they block forever (maybe consuming a goroutine) rather than panicking.

And there are some operations where you cannot check in advance whether a panic will happen: comparing interfaces (underlying values might not be fully comparable), indexing a map (could blow up during any concurrent write), sending to a channel (might be closed), and closing a channel.

You can recover from a panic though, so if you are implementing something that may panic you should have some sensible defer/recover in there if you can't afford to have your process crash.
Division by zero, dereferencing a nil pointer, invoking methods on a nil interface, invalid indexing of an array, unchecked type assertions -- these are not common operations! These are always easily detectable programmer errors.
> This keeps panics scoped to the spawning goroutine

This is exactly what's undesirable. (IMO and from my reading the GP agrees.)