|
Hi, I wrote Protobuf v2 (the version everyone uses) and Cap'n Proto. I don't know if I'd say Protobuf has "awful" performance. It's certainly much better that text-based formats like JSON. But the format is rather branch-y. You have to process it byte-by-byte, because e.g. integers are encoded in a variable-width encoding where each byte contains 7 bits of data plus 1 bit to indicate if this is the last byte. This results in a compact encoding, but takes a lot of cycles to encode and decode. Moreover, since everything is variable-width, in order to find any one field of the message, you must scan through all previous fields, parsing them one by one. Cap'n Proto, FlatBuffers, and SBE all use "zero-copy" encodings, meaning the data is laid out on the wire in a format that is easy for a CPU to use directly. This means, for example, that integers are fixed-width, and fields are located at fixed offsets. This is must faster to parse (or even use in-place without parsing at all), but does result in somewhat larger encodings. (But then, you can always layer on independent compression when bandwidth matters more than CPU.) My understanding is that Thrift is closer to Protobuf and contemporaneous with it, so I don't know why GP included it the list. |
This is the hot path in C++[1]. A really large amount of work has gone into protobuf C++ performance in the last 3 years or so.
1: https://github.com/protocolbuffers/protobuf/blob/master/src/...