Hacker News new | ask | show | jobs
by bit_flipper 1116 days ago
Your note about encoding/gob being inefficient is somewhat accurate for how you're using it, but I want to talk a bit about how you could improve your use.

encoding/gob is intended for streams, not stateless marshals/unmarshals. The first thing that is sent over the stream is the type information the receiver should expect, that's why your payload was so large. After the first type is received, subsequent messages are much smaller. You can see this by extending your example to do multiple writes; each write after the first is only 10 bytes: https://play.golang.com/p/Po_iaXrTUER

You have to plan differently, but you could get large improvements to transmission sizes by changing to append only files and creating the gob encoder once per file. If you find you're creating a gob encoder/decoder very often, that's a telltale sign you're not using it as intended.

2 comments

Another thing I glossed over (unintentionally) is that I only started limiting batch size after I switched to the custom encoder. So persist() being a function of length wouldn't be quite true anymore.

However, I still keep seeing encoding/gob high up in the profiler taking a lot of time doing reflection during RPC calls.

So it does still seem like it's not ideal. Though I may still just not be understanding how to use net/rpc correctly either.

> encoding/gob is intended for streams, not stateless marshals/unmarshals

I don't understand this within the context of the rest of your comment. I use gob for marshaling stuff to storage all the time, I'm not aware of a better way to do that (serialize data to binary).

Sorry, I should have been more precise. encoding/gob is not optimized for situations where you create an encoder or decoder, read/write a single value, then discard that encoder/decoder. As the author noted, payloads for a single call to Encode() are quite large. Additionally, re-instantiating a gob encoder for each call to Encode() is very expensive allocation-wise and benchmarks where this happens will show gob to perform poorly in these scenarios. You can certainly still use gob this way, and if the performance works for you then have at it! But it performs significantly better in situations where you make multiple calls to Encode() with the same encoder.
The parent is saying that gob includes the type information in the serialized payload. This is extraneous information in many cases, and the result is an unnecessarily large payload.