Hacker News new | ask | show | jobs
by swisniewski 398 days ago
There's a much simpler way to do this:

If you want your library to operate on bytes, then rather than taking in an io.Reader and trying to figure out how to get bytes out of it the most efficient way, why not just have the library taken in []byte rather than io.Reader?

If someone has a complex reader and needs to extract to a temporary buffer, they can do that. But if like in the author's case you already have []byte, then just pass that it rather than trying to wrap it.

I think the issue here is that the author is adding more complexity to the interface than needed.

If you need a []byte, take in a []byte. Your callers should be able to figure out how to get you that when they need to.

With go, the answer is usually "just do the simple thing and you will have a good time".

8 comments

The author is trying to integrate with the Go stdlib, which requires you produce images from an 'io.Reader". See https://pkg.go.dev/image#RegisterFormat

Isn't using the stdlib simpler than not for your callers?

I also often hear gophers say to take inspiration from the go stdlib. The 'net/http' package's 'http.Request.Body' also has this same UX. Should there be `Body` and `BodyBytes` for the case when your http request wants to refer to a reader, vs wants to refer to bytes you already have?

The BodyBytes hypothetical isn't particularly convincing because you usually don't actually have the bytes before reading them, they're queued up on a socket.

In most cases I'd argue it really is idiomatic Go to offer a []byte API if that can be done more efficiently. The Go stdlib does sometimes offer both a []byte and Reader API for input to encoding/json, for example. Internally, I don't think it actually streams incrementally.

That said I do see why this doesn't actually apply here. IMO the big problem here is that you can't just rip out the Bytes() method with an upcast and use that due to the wrapper in the way. If Go had a way to do somehow transparent wrapper types this would possilby not be an issue. Maybe it should have some way to do that.

> The BodyBytes hypothetical isn't particularly convincing because you usually don't actually have the bytes before reading them, they're queued up on a socket.

Ah, sorry, we were talking about two different 'http.Request.Body's. For some weird reason both the `http.Client.Do`'s request and `http.Server`'s request are the same type.

You're right that you usually don't have the bytes for the server, but for the client, like a huge fraction of client requests are `http.NewRequestWithContext(context.TODO(), "POST", "api.foo.com", bytes.NewReader(jsonBytesForAPI))`. You clearly have the bytes in that case.

Anyway, another example of the wisdom of the stdlib, you can save on structs by re-using one struct, and then having a bunch of comments like "For server requests, this field means X, for client requests, this is ignored or means Y".

Thinking about that more though, http.Client.Do is going to take that io.Reader and pipe it out to a socket. What would it do differently if you handed it a []byte? I suppose you could reduce some copying. Maybe worth it but I think Go already has other ways to avoid unnecessary copies when piping readers and writers together (e.g. using `WriterTo` instead of doing Read+Write.)
> If body is of type *bytes.Buffer, *bytes.Reader, or *strings.Reader, the returned request's ContentLength is set to its exact value

Servers like to know Content-Length, and the package already special-cases certain readers to effectively treat them like a `[]byte`.

Clearly it does something differently already.

Also, following redirects only works if you can send the body multiple times, so currently whether the client follows redirects or not depends on the type of the reader you pass in... if you add a logging intercepter to your reader to debug something, suddenly your code compiles but breaks because it stops following redirects, ask me how I know.

In this case, there is not any functionality you can't get through other means: You can set GetBody and the content length header manually, which is what you probably wound up doing if I had to guess (been there too, same hat.) I think Go does this mainly to make basic usage more convenient. Unfortunately though, it makes this unnecessarily subtle footgun in return.

Maybe Go 2 will finally do something about this. I would really like some (hopefully efficient) way to make "transparent" wrapper types that can magically forward methods.

The request body on the client do a lot of other things than reading the body once (an io.Reader can only be read once).

There's Content-Length, and there's also the need to read it multiple times in case a redirect happens (so the same body need to be sent again when being redirected).

As a result, the implementation in stdlib would check a few common io.Reader implementations (bytes.Buffer, bytes.Reader, strings.Reader) and make sure it stores something that can be read multiple times (if it's none of the 3, it's read fully into memory and stored instead).

This is the same basic reply as the other one but my thoughts are roughly the same. The only comment I have aside from what I replied on the sibling comment (this just being another case of wrappers not being transparent biting us in the ass) is that they could've done this in a more generic way than they did, at the downside of requiring more interfaces.
It is, but one of the virtues of the Go ecosystem is that it's also often very easy to fork the standard library; people do it with the whole TLS stack all the time.

The tension Ted is raising at the end of the article --- either this is an illustration of how useful casting is, or a showcase of design slipups in the standard library --- well, :why-not-both:. Go is very careful about both the stability of its standard library and the coherency of its interfaces (no popen, popen2, subprocess). Something has to be traded off to get that; this is one of the things. OK!

> people do it with the whole TLS stack all the time.

It's the only way to add custom TLS extensions.

Adding custom TLS extensions plays badly when the standard library implements them.
How does using the stdlib internally simplify things for callers? And what does that have to do with tanking inspiration from the stdlib?

On the second point, passing a []byte to something that really does not want a streaming interface is perfectly idiomatic per the stdlib.

I don’t think it complicates things for the caller if the author used a third party deciding function unless it produced a different type besides image.Image (and even then only a very minor inconvenience).

I also don’t think it’s the fault of the stdlib that it doesn’t provide high performance implementations of every function with every conceivable interface.

I do think there’s some reasonable critique to be made about the stdlib’s use of reflection to detect unofficial interfaces, but it’s also a perfectly pragmatic solution for maintaining compatibility while also not have the perfect future knowledge to build the best possible interface from day 0. :shrug:

Because it forces the reader to read data into a temporary buffer in its entirety. If the thing this function is trying to do doesn't actually require it to do its job, that introduces unnecessary overhead.
What? Where else would it be?

It’s either in the socket(and likely not fully arrived) or … in a buffer.

Peak is not some magic, it is well a temporary buffer.

Beyond that, I keep seeing people ask for a byte interface. Has anybody looked at the IO.reader interface ???

type Reader interface { Read(p []byte) (n int, err error) }

You can read as little or as much as you would like and you can do this at any stage of a chain if readers.

You are still doing a copy, and people want to avoid the needless memory copy.

If you are decoding a 4 megabyte jpeg, and that jpeg already exists in memory, then copying that buffer by using the Reader interface is painful overhead.

Getting an io.Reader over a byte slice is a useful tool, but the primary use case for io.Reader is streaming stuff from the network or file system.

In this context, you can either have the io.Reader do a copy without allocating anything (take in a slice managed by the caller), or allocate and return a slice. There isn't really a middle ground here.

And you are going to work on all 4mb at the time? Even if you were to want to plop it on a socket you would just use IO.copy which would be no overhead, as no matter what you are still always going to copy bits out to place it in the socket to be sent.
>And you are going to work on all 4mb at the time?

Yes? Assume you were going to decode the jpeg and display it on screen. I assume the user would want to see the total jpeg at once.

Consider you are working on processing a program that has a bunch of jpegs and is running some AI inference on them.

1. You would read the jpegs from disk into memory. 2. You decode those jpegs in into RGBA buffers 3. You run inference on the RGBA buffers.

The current ImageDecode interface forces you to do a memcopy in between steps 1 and 2.

1. You would read the jpegs from disk into memory. 2. You copy the data in memory into another buffer because you are using the Reader interface 3. You decode those jpegs in into RGBA buffers 4. You run inference on the RGBA buffers.

Step two isn't needed at all, and if the images are large, that can add latency. If you are coding on something like a Raspberry Pi, depending on the size of the jpegs, the delay would be noticable.

That's how leaky abstraction of many file std implementations starts.

Reading into a byte buffer, pass in a buffer to read values, pass in a buffer to write values. Then OS does the same thing, has its own buffer that accepts your buffer, then the underlying storage volume has its own buffer.

Buffers all the way down to inefficiency.

Seems pretty crazy to force a bunch of data to be saved into memory all the time just for programming language aesthetic reasons
When you are working with streaming data, you really should be passing around io.Readers if you want any sort of performance out of it.

A []byte require you to read ALL data in advance.

And if you still end up with []byte and need to use a interface taking io.Reader, then you wrap []byte in a bytes.Buffer which implements io.Reader.

100% this, that is the easiest and less error prone way to do it.

Even if the author still insisted on using a single interface, he could also do what he wants by relying on bytes.Buffer rather than bytes.Reader.

A good API should just accept either,e.g. the union of []byte and io.Reader.

Both have pros and cons and those should be for the user to decide.

Nah, a good API doesn’t push the conditionals down. You don’t need to pass a union to let the user decide, you just need to present an API for each (including a generic implementation that monomorphizes into multiple concrete implementations) https://matklad.github.io/2023/11/15/push-ifs-up-and-fors-do...
Ah, but go doesn't have union types.
You can just expose two different functions, one of which takes a byte slice and one of which takes an io.Reader.

Given how the code works (it starts by buffering the input stream), the second function will just be a few lines of code followed by a call to the first.

Perfect example of how complex type systems can lead people to have unnecessarily complex thoughts.

One option would be to accept an interface{} and then switch on the type.
It's frightening how quickly the answer in golang becomes "downcast to interface{} and force type problems to happen at runtime".
You don’t need to downcast to interface, io.Reader is already an interface, and a type assertion on an interface (“if this io.Reader is just a byteslice and cursor, then use the byteslice”) is strictly safer than an untagged union and equally safe with a tagged union.

I wish Go had Rust-like enums as well, but they don’t make anything safer in this case (except for the nil interface concern which isn’t the point you’re raising).

Presumably the questions that have simple and easy answers don't get long comment chains.
An io.Reader is already an interface, so you can already switch on its type.
My comment is explaining how

> A good API should just accept either,e.g. the union of []byte and io.Reader.

could be done. Can you elaborate on how the fact that io.Reader is an interface lets you accept a []byte in the API? To my knowledge, the answer is: you can't. You have to wrap the []byte in an io.Reader, and you are at the exact problem described in the article.

I see what you’re saying. You’re correct that you can’t pass a []byte as an io.Reader, but you can implement io.Reader on a []byte in a way that lets you get the []byte back out via type switch (the problem in the article was that the standard library type didn’t let you access the internal []byte).
Personally I rarely use or even implement interfaces except some other part needs them. My brain thinks in terms of plain data by default.

I appreciate how they compose, for example when I call io.Copy and how things are handled for me. But when I structure my code that way, it’s extra effort that doesn’t come naturally at all.

I use them for testing, where I can have a client that is called by the code under test and can either just run a test CB, send a REST call to a remote server, send a gRPC call to a remote server, or make a function call to an in-process gRPC server object.
Yea, mocking is generally the most use I get out of interfaces
Plain data is really convenient for testing though.

I think the reason that your example is so useful is not generally because of testing, but because the thing you're interacting with has operational semantics. It's a good use case for object orientation, so interfaces and mocking are the natural way of testing your logic.