Hacker News new | ask | show | jobs
by cauchyk 956 days ago
Author of the blog here, curious what a better alternative would be in this context. The channel has to be passed around for the producer and consumer to interface with each other. Are there better patterns for this?
5 comments

Not the parent, but I personally dislike it when Go libraries use channels in their public APIs, as it forces a specific concurrency model on the consumer; in particular, channels are quite slow, being protected by an internal mutex, so you're always paying for the overhead no matter if you need it or not.

You also have to be very careful about managing the channel lifecycle. If you're not pulling (selecting from) the channel, the library will be permanently stuck. So you must now have a way to tell the library to stop sending, and it must cancel any in-flight send operations if you call producer.Stop() or whatever. In my experience libraries often have bugs in their channel code. It's far too easy to get deadlocks with channels that have interdependencies, and you have to be very careful about buffered versus unbuffered channels, as they behave differently.

A better API, in my opinion, is to offer a callback or single-method interface. Then the implementer of that callback or interface can choose to use channels internally if they desire, or they can use something else. You get the same backpressure support since you can treat it as synchronous.

After all, a channel's send interface is essentially just:

    type Channel[T any] interface {
        Send(T)
    }
But a "chan T" doesn't offer this flexibility.

My rule of thumb for channels is that they're goroutine glue, not an API primitive. Build APIs out of interfaces, not channels. The only thing that uses channels should be the one that's controlling the goroutines, because it's the thing that orchestrates them.

That said, it's not a hard rule. There are places where channels may have their place in a public API, though I'm not sure I can think of any examples off-hand.

this breaks select to send and is a terrible reduction in capability.

you can always wrap channels to make them worse and less capable, but your API should expose the more capable option.

I think it is a matter of preference. For me personally I use raw channels and goroutines all day every day and I really like using them. Channels are a core primitive in golang so I think it is worth getting familiar with them.

As you say being able to select is really nice too.

> as it forces a specific concurrency model on the consumer

I found your excuse above is really nonsense. when your program is in Golang, you've already picked side, the concerned concurrency model has already been chosen by the user.

we are not talking about one of random concurrency models, we are talking about channel based sychnronization and communication in golang, if you don't want that and consider it as an issue, you shouldn't be using golang in the first place.

Looks like the channel field is private in CDCRecordStream, but exposed by GetRecords. The callers mostly loop over Record objects. [1]

If I wanted to encapsulate iterating over a channel of Records, maybe it would be something like Go's io.Pipe function [2], which returns a PipeReader and PipeWriter? Except that it would work on Records rather than byte streams.

I don't have enough context to know if the extra encapsulation is a good idea in this case, though.

[1] https://github.com/search?q=repo%3APeerDB-io%2Fpeerdb%20GetR... [2] https://pkg.go.dev/io#Pipe

Please see this great talk by Bryan C. Mills touching on the subject: https://youtu.be/5zXAHh5tJqQ?t=421
Why have consumers and producers vs doing it all in one goroutine, utilizing some kind of connection pool?
Because then you are consuming, or producing, you can’t do both at the same time. You are either reading from a stream of data, or you are writing it. Using goroutines to separate these allows you to do both at the same time, as soon as data is available on the channel or you receive the signal to stop.
To get higher throughput we would need one goroutine to pull from the replication slot while the other is pushing to the target. The idea is to keep the Postgres connection useful and reading the slot while also pushing to the target asynchronously.
Use an iterator object that can use channels behind the scenes.