| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jonbronson 2411 days ago

"The solution is as follows:

Make all fields in a message required. This makes messages product types."

Except it also breaks backwards compatibility, one of the most powerful and sought-after features of protobufs.

2 comments

naasking 2411 days ago

> Except it also breaks backwards compatibility, one of the most powerful and sought-after features of protobufs.

It doesn't have to. Just add row types to handle unknown content, ie. if an intermediary knows only of fields foo and bar, then they can process any data with such fields if given a type like "type SomeRecord = { foo : int, bar : string | r }", where 'r' represents the remainder of the record.

The article's criticisms are valid and there are typed solutions to most of the objections that have been raised against it.

link

SpicyLemonZest 2411 days ago

I'm not sure that's simple enough to be a "just", but in any case the primary problem is the other direction. If I add `required baz: int` to my service's definition of a protobuf, all protobufs that have ever been generated before become invalid because they don't contain a value for baz.

link

naasking 2411 days ago

That fact doesn't change if you eschew types. Backward-compatible schema evolution has rules.

link

SpicyLemonZest 2411 days ago

Right, that's the point. The article's suggestion to "make all fields in a message required" fundamentally misunderstands the issues at hand, because no matter how appealing it is from a type theory perspective, following that suggestion would make it impossible to ever add a field in a backwards compatible manner.

link

naasking 2411 days ago

> The article's suggestion to "make all fields in a message required" fundamentally misunderstands the issues at hand, because no matter how appealing it is from a type theory perspective, following that suggestion would make it impossible to ever add a field in a backwards compatible manner.

You absolutely could in multiple ways:

1. You make every accepted product type have a row type at your service interface if you expect schema evolution.

2. If you have to add a field unexpectedly, ie. where you did not have a row type, then you must deprecate the old API. If this seems onerous to you, then your service infrastructure is probably insufficiently flexible.

link

SpicyLemonZest 2411 days ago

Option 1 seems like it defeats the point. If you're going to declare a field with a more permissive type than currently allowed, aren't you just hacking weak types back into your strong type system?

Option 2... look. I've seen a lot of API deprecations, across multiple teams in multiple companies, and every one of them was very onerous in ways that had little to do with the service infrastructure. If you've done easy API deprecations, more power to you, but I don't think your experience is representative.

link

humbledrone 2411 days ago

Protocol buffers already do that; serialized fields that are not recognized by an older message definition are parsed and can be accessed via the "unknown fields" API, exactly as "r" above. Intermediaries can pass these through trivially, or inspect them to see what they didn't understand.

The problem with making fields required is that older serialized protocol buffers parsed by newer message definitions may be missing newly added required fields, which will break things.

link

naasking 2411 days ago

Protobuf does not do this via a typed interface, but via runtime checking.

link

joshuamorton 2411 days ago

You can't statically typecheck deserialized data. You must validate that deserislized value matches the schema, and you can only do so at runtime.

In other words, proto has a typed interface, but you must runtime check that a given bag of bytes conforms to that typed interface.

This is true for any io.

link

naasking 2410 days ago

> You must validate that deserislized value matches the schema, and you can only do so at runtime

I assume you mean serialised data, not deserialized. And yes, deserializing includes type checking. The point is that this happens once and the need for a separate API for dynamic data shouldn't be needed.

link

joshuamorton 2410 days ago

What do you mean by a separate api for dynamic data?

The data under discussion isn't "dynamic", it's still static, it just isn't known to the schema in question at runtime (since it's only known to a different schema). That means you can't access it by name, since the field names aren't known.

link

nabla9 2411 days ago

The lesson is: when you start wrong, you stay wrong.

link