Hacker News new | ask | show | jobs
by teacup50 3612 days ago
> - removing optional values is actually quite nice. In practice, I end up checking for "missing or empty string" anyway.

I feel the opposite; this greatly reduces the utility of protobuf.

Previously, I could trust that if parsing succeeded, then I had a guarantee of a populated data structure.

Now, I have to check each field individually, in manually written code, to verify that no required fields are missing.

That's really lame, and a huge step backwards.

4 comments

> I could trust that if parsing succeeded, then I had a guarantee of a populated data structure

Using required fields have actually bit Google more than once and were increasingly being considered harmful.

A canonical example is that you add a required field, and then update binaryA in production (which receives messages from binaryB), which immediately crashes or errors out because the new field is missing.

So practically speaking, you can never add required fields to any message where you can't guarantee binary version syncing amongst all instances of the message-dependent services. At scale, this is essentially operationally impossible.

And if you're not running an RPC-based service architecture, then why are you using protos anyway?

> A canonical example is that you add a required field ...

Yeah. Don't do that without versioning your protocol. It's even less difficult to handle than maintaining API/ABI compatibility in a library.

> So practically speaking, you can never add required fields to any message where you can't guarantee binary version syncing amongst all instances of the message-dependent services.

Sure you can. If you version things at the protocol or per-request level, you can negotiate protocol conformance just fine.

Having a message type defined as "Message_V1" OR "Message_V2" is still simpler than having "any or none of the fields from any iteration of the message definition, where consistency is solely defined in terms of the field/message validation code you write in every protocol consumer".

> And if you're not running an RPC-based service architecture, then why are you using protos anyway?

It's a very serviceable compact serialization mechanism for at-rest data.

> Yeah. Don't do that without versioning your protocol. It's even less difficult to handle than maintaining API/ABI compatibility in a library.

Actually, the whole point of that was so you don't have to version your protocol. Protocol versioning actually tends to make code maintenance a pain in the posterior, and working through old data really annoying. Instead, you do optional fields.

If you don't want that, go ahead and just write raw bytes and don't bother with the serialization layer.

> Having a message type defined as "Message_V1" OR "Message_V2" is still simpler than having "any or none of the fields from any iteration of the message definition, where consistency is solely defined in terms of the field/message validation code you write in every protocol consumer".

But you don't have to do either. It seems like you aren't familiar with the use of protocol buffers. You just define optional fields with a reasonable default, and magically all the old protobufs get that default value.

> If you don't want that, go ahead and just write raw bytes and don't bother with the serialization layer.

Or just keep using protobuf2, 'cause it's been working great for us for ~6 years.

> But you don't have to do either. It seems like you aren't familiar with the use of protocol buffers.

I've written my own protobuf compiler. I'm familiar.

> You just define optional fields with a reasonable default, and magically all the old protobufs get that default value.

That only works up until there's no "reasonable default".

> Or just keep using protobuf2, 'cause it's been working great for us for ~6 years.

Oh sure, I wouldn't change practices, but I'd certain question why that practice had been put in place.

> That only works up until there's no "reasonable default".

If there is no reasonable default, I'd be even more wary about making it a required field.

> It's a very serviceable compact serialization mechanism for at-rest data.

That's fair, but then you run into the same issue -- adding required a field requires updating your entire store.

Depending on your store, that can range from onerous to outright impossible.

> Don't do that without versioning your protocol

I think it depends on your needs, but I think for most users, explicit versioning of messages is overkill and is just a more heavy way of encoding the same logic (e.g. I saw an older message, I will implicitly upgrade it by filling in these new fields, vs. just looking for the optional field that I just added)

For many applications, binary version sync is easy. In those cases, it does add unnecessary complexity.
Sure, if you can guarantee that your app will always be in that environment.

Otherwise, you run the risk of having to redo all your protos (+ downtime) if/when your app needs to scale up. I'm not sure whether that's worth avoiding some proto validation logic in the client code.

From https://developers.google.com/protocol-buffers/docs/proto#si...:

> Required Is Forever

> You should be very careful about marking fields as `required`. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead. Some engineers at Google have come to the conclusion that using `required` does more harm than good; they prefer to use only `optional` and `repeated`. However, this view is not universal.

In practice, to have decent compatibility as revisions changed, you really had to minimize the use of "required" fields anyway. While I agree it was sometimes nice to be able to avoid having to worry about it, in practice protobuf parsing imposes a very minimal set of constraints on data types. A successful protobuf parse was not nearly enough to ensure you had data integrity. I've run in to more than a few cases of developers using the wrong protobuf (v2) definition and not realizing their successful parse was still wrong.
I agree. In particular in languages without null you will have a lot of Option types in the mapping. You no longer can generate useful type definitions from the proto spec.
> Now, I have to check each field individually, in manually written code, to verify that no required fields are missing.

You always had to check the individual fields for the zero value. A required field in a proto2 message can be set but also have the default value and pass initialization.

> You always had to check the individual fields for the zero value.

No, you didn't. A required field has a value, period. If it defaults to a particular value, then that's the value it has.

If you had a non-required field, then you marked it 'optional', and checked for the field's existence (or mapped optional fields to a Maybe/Option monad representation, forcing the issue).

I think you wording is a bit difficult to parse. Does this convert what you are trying to say:

In proto2, default values are for optional fields. The value of an optional field would be that value, but there's a separate concern as to whether that field had been set.