Hacker News new | ask | show | jobs
by gilgad13 1808 days ago
The rule that the writer should be responsible for closing a channel is a good one to keep in mind, but it is often the case that you want to launch an indeterminate number of generators and collect the results from all of them. For example, you may want to make one connection to each server in the config, or to process each file in a directory in parallel. Because the arms of a `select{}` are determined at compile time, it cannot be used to select over the variable generator-owned channels, and you have four more difficult options:

* Use `reflect.Value.Select`[1]. Having to reach for reflect feels ugly for such a common case, and the performance of the reflect-select is much lower than the native select.

* Create a single channel owned by the reader, pass it to each writer, and arrange for this channel to be closed when the final writer exits, through a waitgroup. There is an example under "Parallel digestion" in the Go Concurrency Patterns blog post[2]. Note the little details to get right. We must launch a separate goroutine to monitor the waitgroup / channel closure. If we accidentally do it in-line at the wrong level, everything will work fine if the total number of items written to `c` is less than `c`'s capacity, but will hang once a worker becomes blocked on `c`. Additionally, the waitgroup is threaded directly into the writers, which may be more difficult if those are implemented in some other generic package.

* Wrap the above pattern up into a `merge` function, such as the one under "Fan-in, Fan-out" in the Go Concurrency Patterns post[2]. The lack of generics means we will have to copy-paste this function everywhere we want to use it. Additionally, this launches a goroutine for every channel being watched, which strikes people as "expensive" for such a simple operation.

* We can construct a function that takes two channels and launches a goroutine that selects between the two and writes to a merged output channel. By constructing a tree of these we can merge an arbitrary number of channels. This is really just an optimization of the above.

None of these options are particularly intuitive. Too often I've instead seen developers create a single channel owned by the reader and either:

* Assume it is never closed and the reader doesn't terminate until the application does

* Rely on some external mechanism to know when to stop reading. If the reader can stop reading without confirming that the writers have stopped writing, this can lead to the writers becoming blocked on sending into this channel, which may prevent them from performing necessary cleanup actions (signaling `.Done()` on a waitgroup, for instance) that cause hangs in other areas.

* Thread a cancellation ctx through every reader and writer. This ensures that nothing hangs, but can result in messages that are sitting in the the channels being dropped. If other areas of code have an assumption like, "every accepted request will receive a response", this can break that.

In addition, many developers have a gut instinct to add some amount of buffering to their channels, which usually results in these backpressure / channel issues being papered over during low-load unit tests, only to rear their head during higher load integration tests or in production, when the debugging story is much more difficult.

[1]: https://pkg.go.dev/reflect#Select

[2]: https://blog.golang.org/pipelines

2 comments

> Use `reflect.Value.Select`

Never really a good idea, and never necessary.

> create a single channel owned by the reader

Channels cannot be effectively owned by their reader(s), the contortions you have to bend the code into to make that work never really make sense. That's just a constraint of the type, but it's hardly a problem -- it makes the thing easier to model. So this isn't really an option on the table.

> a Merge function

Yes! The answer. And goroutine per channel is kind of the point of using them! Nothing inefficient about it.

> a function that takes two channels . . .

Now there's some inefficiency! ;) No reason to do this, given Merge.

--

> None of these options are particularly intuitive.

The merge option seems perfectly intuitive to me, assuming you understand channels have to be owned by a singular writer.

This kind of trouble is exactly why Rust's channels feel much more intuitive to me than Go's.

With channels in Rust, the channel is closed when either all senders or all receivers are dropped. This means that doing the default obvious thing is also correct, for a much larger set of tasks than made easy by Go's API choices, and it stays correct under refactoring.