Hacker News new | ask | show | jobs
by smallnamespace 3613 days ago
> I could trust that if parsing succeeded, then I had a guarantee of a populated data structure

Using required fields have actually bit Google more than once and were increasingly being considered harmful.

A canonical example is that you add a required field, and then update binaryA in production (which receives messages from binaryB), which immediately crashes or errors out because the new field is missing.

So practically speaking, you can never add required fields to any message where you can't guarantee binary version syncing amongst all instances of the message-dependent services. At scale, this is essentially operationally impossible.

And if you're not running an RPC-based service architecture, then why are you using protos anyway?

2 comments

> A canonical example is that you add a required field ...

Yeah. Don't do that without versioning your protocol. It's even less difficult to handle than maintaining API/ABI compatibility in a library.

> So practically speaking, you can never add required fields to any message where you can't guarantee binary version syncing amongst all instances of the message-dependent services.

Sure you can. If you version things at the protocol or per-request level, you can negotiate protocol conformance just fine.

Having a message type defined as "Message_V1" OR "Message_V2" is still simpler than having "any or none of the fields from any iteration of the message definition, where consistency is solely defined in terms of the field/message validation code you write in every protocol consumer".

> And if you're not running an RPC-based service architecture, then why are you using protos anyway?

It's a very serviceable compact serialization mechanism for at-rest data.

> Yeah. Don't do that without versioning your protocol. It's even less difficult to handle than maintaining API/ABI compatibility in a library.

Actually, the whole point of that was so you don't have to version your protocol. Protocol versioning actually tends to make code maintenance a pain in the posterior, and working through old data really annoying. Instead, you do optional fields.

If you don't want that, go ahead and just write raw bytes and don't bother with the serialization layer.

> Having a message type defined as "Message_V1" OR "Message_V2" is still simpler than having "any or none of the fields from any iteration of the message definition, where consistency is solely defined in terms of the field/message validation code you write in every protocol consumer".

But you don't have to do either. It seems like you aren't familiar with the use of protocol buffers. You just define optional fields with a reasonable default, and magically all the old protobufs get that default value.

> If you don't want that, go ahead and just write raw bytes and don't bother with the serialization layer.

Or just keep using protobuf2, 'cause it's been working great for us for ~6 years.

> But you don't have to do either. It seems like you aren't familiar with the use of protocol buffers.

I've written my own protobuf compiler. I'm familiar.

> You just define optional fields with a reasonable default, and magically all the old protobufs get that default value.

That only works up until there's no "reasonable default".

> Or just keep using protobuf2, 'cause it's been working great for us for ~6 years.

Oh sure, I wouldn't change practices, but I'd certain question why that practice had been put in place.

> That only works up until there's no "reasonable default".

If there is no reasonable default, I'd be even more wary about making it a required field.

> It's a very serviceable compact serialization mechanism for at-rest data.

That's fair, but then you run into the same issue -- adding required a field requires updating your entire store.

Depending on your store, that can range from onerous to outright impossible.

> Don't do that without versioning your protocol

I think it depends on your needs, but I think for most users, explicit versioning of messages is overkill and is just a more heavy way of encoding the same logic (e.g. I saw an older message, I will implicitly upgrade it by filling in these new fields, vs. just looking for the optional field that I just added)

For many applications, binary version sync is easy. In those cases, it does add unnecessary complexity.
Sure, if you can guarantee that your app will always be in that environment.

Otherwise, you run the risk of having to redo all your protos (+ downtime) if/when your app needs to scale up. I'm not sure whether that's worth avoiding some proto validation logic in the client code.

From https://developers.google.com/protocol-buffers/docs/proto#si...:

> Required Is Forever

> You should be very careful about marking fields as `required`. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead. Some engineers at Google have come to the conclusion that using `required` does more harm than good; they prefer to use only `optional` and `repeated`. However, this view is not universal.