| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kentonv 3573 days ago

Depends on the use case! In fact, the answer to "is X format faster than Y format" always depends on the use case. It's always easy to construct cases where one or the other looks better. People of course want to know "on average", but in reality there's no such thing as an "average" use case. You'll ultimately have to test the case you have in mind to find out.

With that said, here are some considerations:

- msgpack is usually used as a binary encoding of JSON, with no schemas. That means that textual field names are included in the encoded message. Formats like Protobuf and Cap'n Proto that have schemas known in advance can avoid this bloat, making them faster and smaller.

- msgpack is not a zero-copy encoding. It's necessary to parse the whole message upfront before you can use it, like with protobuf. Cap'n Proto is zero-copy, the advantages of which are described extensively on the page. For example, if you have a multi-gigabyte file containing a massive Cap'n Proto message, and you just want to read one field from one place in that message, you can do that by memory-mapping the file. No need to read it all in. That's not possible with Protobuf or Msgpack.

I think it's best to focus on these kind of paradigm-shifts when trying to reason about performance. You can always micro-optimize the encoding path later on, but you can't suddenly switch to zero-copy later if your data format wasn't designed for it.