Hacker News new | ask | show | jobs
by JoachimSchipper 3618 days ago
This looks like a nice evolution.

It's a pity that the "deterministic serialization" gives so few guarantees; I have worked on at least one project that really needed this.

(Basically, we wanted to parse a signed blob, do some work, and pass the original data on without breaking the signature; unfortunately, this requires keeping the serialized form around, since the serialized form cannot be re-generated from its parsed format.)

2 comments

The main concern that the deterministic serialization isn't canonical is due to the unknown fields. As string and message type share the same wire type, when parsing an unknown string/message type, the parser has no idea whether to recursively canonicalize the unknown field.

The cross-language inconsistency is mainly due to the string fields comparison performance, i.e. java/objc uses utf16 encodings which has different orderings than utf8 strings due to surrogate pairs.

Feel free to start an issue on the github site asking for canonical serialization with your use case. We may change the deterministic serialization with stronger guarantee (e.g. cross language consistency) or add another API for canonical serialization.

This was years ago; I'd feel bad asking you to do a lot of work to support one niche use case in a research project that never quite made it to market. And protobufs ended up saving us quite a bit of development work, even if keeping the blob around is Wrong in a moral sense.

(You can find the niche use case in a response to your sibling comment, BTW.)

In a trusted system, if you don't trust the structure you are working with, why would you trust the signature?

I'd want to always work from the signed blob.

That said, this is one reason to use flatbuffers/capt'n proto I guess: you don't have to worry about this since you never unpack the blob.

Think of a data flow A->B->C, with A e.g. handling incoming message server, B being a spam/virus filter, and C holding the user's mailbox. Spam/virus filters are useful, but are also rather vulnerable - so C is willing to trust B's spam/non-spam judgement, but wants to ensure that B can't alter or make up messages.

If protobufs had one canonical encoding, B could unpack the message and re-pack it when done; with the current protobuf implementation, B needs to keep the original blob around. In either case, C needs to check the signature on whatever blob it receives.

(Some details have been changed.)

So wouldn't you stick with the original message from A, and just have B sign that? You wouldn't want to have B repack it, because then B has the potential to muck with things.