Hacker News new | ask | show | jobs
by JonathonW 3694 days ago
The protobuf varint format itself actually is pretty expensive to decode-- the way it's encoded pretty much mandates a branch per seven bit chunk (of the resulting integer); you can't tell in advance how long an integer is (as opposed to a length-prefixed varint, where you can).

That's in addition to the encoding-independent overhead inherent to any variable-length integer scheme, which is what mandates the encoding/decoding step in protobuf. Combined, this makes protobuf pretty expensive to decode (protobuf varints are everywhere; they're used for field keys and length delimiters for length-delimited field types).

That's unrelated to ZigZag encoding of negative numbers, though, which is fairly straightforward and computationally inexpensive (it's a couple bitshifts and an XOR).

1 comments

Funny thing is, despite all this, protobuf beats almost everything in benchmarks, in part due to excessive micro-optimization. Almost all varints in practice are 1-byte, so the larger ones don't really matter.

That said, the fact that you can't skip forward in a protobuf without parsing through all the previous fields is unfortunate in some use cases, hence Cap'n Proto.

Yeah, we've been pretty happy with protobuf in the projects where we've used it.

Cap'n Proto would've been a pretty strong contender if it had existed when we started using protobuf, though. Serialization/deserialization was on the critical path for our server-side code, so we most likely would have been able to get at least a bit of a performance boost out of Cap'n Proto.