Hacker News new | ask | show | jobs
by pavon 723 days ago
proto2 allowed both required fields and optional fields, and there were pros and cons to using both, but both were workable options.

Then proto3 went and implemented a hybrid that was the worst of both worlds. They made all fields optional, but eliminated the introspection that let a receiver know if a field had been populated by the sender. Instead they silently populated missing fields with a hardcoded default that could be a perfectly meaningful value for that field. This effectively made all fields required for the sender, but without the framework support to catch when fields were accidentally not populated. And it made it necessary to add custom signaling between the sender and receiver to indicate message versions or other mechanisms so the receiver could determine which fields the sender was expected to have actually populated.

2 comments

You can now detect field presence in several cases: https://protobuf.dev/programming-guides/field_presence/#pres...

This is very solid with message types, but for basic types you can add `optional field` if needed as well (essentially making the value nullable)

proto2 required fields were basically unusable for some technical reason I forget, to the point where they're banned where I work, so you had to make everything optional, when in many cases it was unnecessary. Leading to a lot of null vs 0 mistakes.

Proto3 made primitives all non-optional, default 0. But messages were all still optional, so you could always wrap primitives that really needed to be optional. Then they added optionals for primitives too recently, implemented internally as wrapper messages, which imo was long overdue.

The technical reason is that they break the end-to-end principle. In computer networking in general, it is a desirable property for an opaque message to remain opaque, and agnostic to every server and system that it passes through. This is part of what makes TCP/IP so powerful: you can send an IP packet through Ethernet or Token Ring or Wi-Fi or carrier pigeon, and it doesn't matter, it's just a bunch of bytes that get passed through by the network topology.

Protobufs generally respect this property, but required fields break it. If you have a field that is marked 'required' and then a message omitting it passes through a server with that schema, the whole message will fail to parse. Even if the schema definition on both the sender and the recipient omits the field entirely.

Consider what happens when you add a new required field to a protobuf message, which might be embedded deep in a hierarchy, and then send it through a network of heterogenous binaries. You make the change in your source repository and expect that everything in your distributed system will instantly reflect that reality. However, binaries get pushed at different times. The sender may not have picked up the new schema, and so doesn't know to populate it. The recipient may not have picked up the new schema, and so doesn't expect to read it. Some message-bus middleware did pick up the new schema, and so the containing message which embeds your new field fails to parse, and the middleware binary crashes with an assertion failure, bringing down a lot of unrelated services too.