Hacker News new | ask | show | jobs
by pherl 3612 days ago
The main concern that the deterministic serialization isn't canonical is due to the unknown fields. As string and message type share the same wire type, when parsing an unknown string/message type, the parser has no idea whether to recursively canonicalize the unknown field.

The cross-language inconsistency is mainly due to the string fields comparison performance, i.e. java/objc uses utf16 encodings which has different orderings than utf8 strings due to surrogate pairs.

Feel free to start an issue on the github site asking for canonical serialization with your use case. We may change the deterministic serialization with stronger guarantee (e.g. cross language consistency) or add another API for canonical serialization.

1 comments

This was years ago; I'd feel bad asking you to do a lot of work to support one niche use case in a research project that never quite made it to market. And protobufs ended up saving us quite a bit of development work, even if keeping the blob around is Wrong in a moral sense.

(You can find the niche use case in a response to your sibling comment, BTW.)