Hacker News new | ask | show | jobs
by prodigal_erik 5940 days ago
If you send a thousand key/value pairs, protocol buffers waste sixteen thousand bits redundantly expressing the type and tag of each key and value, when the recipient already knows they're all the same. (They revved their format to special-case this but only for a sequence of numbers, not any other type you can construct.)

PER is the only format I've seen that uses the schema to decide how much information the recipient actually needs to decode the message and just sends that, rather than adding pure overhead just in case a recipient has no idea what's going on yet is somehow supposed to do something useful with the message.

1 comments

It is true that Protocol Buffers are not optimized for low-mem systems. On the other hand, for preserving bandwidth and storage one can use lightweight compression algorithms on top of general-purpose serialization library. (This is the approach that Google ended up using, AFAIK.)

I think such approach is more flexible than making explicit assumptions on how much data is actually carried in integers.