|
|
|
|
|
by ghostwriter
1659 days ago
|
|
These are not mutually exclusive things, as superiority of the systems is a multi-dimensional metric. I quoted cap'n proto as an alternative to protobuf that I would definitely choose over any protobuf, because in my book it does at least a few things better. Namely, the bits related to immutability & zero-copying, and random access. But at the same time I do not like and do not agree with your field optionality stance, as I think it is based on a false premise that a universal optionality is the only viable path towards compatibility. I will cite the original article regarding the matter to clarify this point: > protobuffers achieve their promised time-traveling compatibility guarantees by silently doing the wrong thing by default. Of course, the cautious programmer can (and should) write code that performs sanity checks on received protobuffers. But if at every use-site you need to write defensive checks ensuring your data is sane, maybe that just means your deserialization step was too permissive. All you’ve managed to do is decentralize sanity-checking logic from a well-defined boundary and push the responsibility of doing it throughout your entire codebase. This approach doesn't free you as a developer from having to maintain multiple code-paths as you claim to be able to avoid in your older comments ("you end up needing to maintain parallel code paths, leading to ugly and hard-to-test code"). Code paths are still there, they are now intertwined with your business logic as conditional checks on a field presence. At every calling site that utilises the schema. That's one of the reasons why I prefer flatbuffers over cap'n proto when I have a choice, and it is the reason why I think that you are not fully aware of the issues that stem from the choices of protobuf and that are clearly manifested in ecosystems that model network communications via advanced type systems. In fact, this comment from your linked thread suggests a similar idea - advanced type systems can provide a strict schema negotiation in semi-automated way, at a fraction of the effort required to maintain schemas with all-optional fields - https://news.ycombinator.com/item?id=18201601 |
|
More abstractly, the hard lesson was: In a large distributed system, the site of use is the only reasonable place to do data validation. If you do it anywhere else, you will create a more brittle system that can't handle changes. The reason is pretty straightforward, but is more of a human reason than a mathematical one: when someone decides to modify a protocol for some new feature, they know they obviously have to modify the code that produces and consumes the protocol in order to implement the feature. But if they have to update a bunch of other places too, that's at best more work, and at worst easily forgotten. It's really important that any part of the system that is just a middleman will be agnostic to the data and pass it through unmodified -- even if the data is based on a newer version of the schema than the middleman is aware of.
So yes, you actually want the validation to be in your business logic. But you don't want it to complicate that business logic too much. Most of the time, optional fields (with default values) provide the right balance between making changes easy without making code ugly. Sometimes, a more drastic change -- like declaring a new version of the protocol and writing translation layers -- is a good idea, but this is an expensive step that you want to do rarely.
Now, obviously you don't agree with this. But your arguments sound like they are coming from a place of intuition, not experience. That's fine, intuition is critical to innovation. But you can't go around claiming your intuition is "superior" without proving it out in practice. Intuition is always based on a simplified model in your head, and the real world often doesn't work like you think it will. I assure you you don't know anything I don't, in decades of working on this stuff I've heard all the ideas. The only way to prove yourself right is to actually build systems your way and show success in the field. Of course, there will likely never be a definitive proof that one idea or the other is superior, only anecdotal experience. However, the fact that a large majority of successful distributed systems today are built on Protobuf or a similar model to Protobuf suggests that experience leans heavily in that model's favor.