| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by shereadsthenews 2551 days ago

For simple protocols protobuf decoding has no taken branches. I.e. if you only use the first 15 field numbers (all your tags are 1 byte) and if all the types are the expected types, and if all the variable-length items are < 128 bytes long then you can decode the message without taking any branches. In C++. Most of the other languages have simpler and slower codecs.

This is the hot path in C++[1]. A really large amount of work has gone into protobuf C++ performance in the last 3 years or so.

1: https://github.com/protocolbuffers/protobuf/blob/master/src/...

2 comments

kentonv 2551 days ago

And all your integer fields must be < 128, right?

Yes, I suppose the branches in Protobuf can be pretty predictable. Still, you do generally have to examine each byte individually.

link

shereadsthenews 2551 days ago

Sure. In this specific case of a kv store it's hard to imagine how to simplify it dramatically from protobuf. As a proto you might have: tag-length-key-tag-length-value. Instead you could store the key and value lengths in host format using 8-16 bytes: length-length-key-value. It's not _dramatically_ faster to decode this, and you traded away extensibility to get a marginal speedup.

link

kentonv 2551 days ago

Sure, I was speaking in general, not specifically about the key-value case.

I think most serialization frameworks are likely to be overkill for such a use case, spending more time on setup than actual parsing.

Also note that storing the value (and maybe the key) with proper alignment might make it easier to use the data in-place, saving a copy.

link

yencabulator 2546 days ago

Hit 'y' before copying the link; the line numbers have already shifted.

link