| Disclaimer: I have written and designed many parts of the Protocol Buffer libraries. https://tools.ietf.org/html/rfc1832.html#section-6 has a good encoding example of XDR. Contrasting that with Protocol Buffers is enlightening as it clearly demonstrates some differences in design goals and where tradeoffs are made. Feel free to correct my interpretations as I may have missed something! 1. XDR appears to have no equivalent of a Protocol Buffer field number
=> the format is not self describing. That is, one must have a schema to properly interpret the data 2. XDR appears to encode lengths as fixed width based on the block size
=> faster to read/write but larger on the wire than using a varint encoding 3. XDR's string data type is defined as ASCII
=> it does not support modern unicode outside of the variable-length opaque type 1 would seem to present difficulty for modern distributed systems as one cannot control the release process of every distinct binary in the ecosystem to ensure that they are all schema equivalent at the same time. This can be remediated by propagating the schema as a header to the underlying data for consumption on the other side, but that bloats the wire format. Header compression could help with this but may be problematic for systems with severely constrained networks that constantly reestablish new connections (ex. mobile). 1 also has implications for data persistence. One cannot ever remove or reorder an XDR struct member else they will incorrectly parse data that was written in the older format. This is in contrast with Protocol Buffers, where one can remove or reorder message members whenever they'd like, as long as they take care not to reuse a tag number (and the newer `reserved` feature can help with that). 2 is just a performance tradeoff: binary size or (de|en)code performance? 3 has implications for memory constrained systems. Ex. on Android we eagerly parse string fields to avoid doubling the allocation overhead (first as raw bytes, then as a String object). If we required all string datatypes in Protocol Buffers to be defined as bytes fields (the variable-length opaque data type equivalent), we wouldn't be able to provide this optimization. Overall, XDR looks like a good fit for inter-process communication in a homogeneous environment. Protocol Buffers looks like it's a good fit for cross language communication across heterogenous and unversioned environments. Directly comparing the two, XDR is much more verbose on the wire (particularly if we mitigate the versioning issues by serializing the schema in a header) whereas it's likely significantly faster to (en|de)code. i.e. there's a tradeoff for networking/storage costs vs. CPU performance. Scott makes a bunch of provocative declarations in his post but I think many of them betray a lack of background to appropriately understand the tradeoffs involved. As illustrated above, Protocol Buffers makes a bunch of design affordances for compactness on the wire which XDR does not accommodate. He also believes Google's RPC system to be "unarguably shitty" even though it has never been open sourced due to dependency issues (what is open sourced as part of Protocol Buffers is a shim, gRPC is the future here). His impression of why Facebook built Thrift is similarly misinformed as Protocol Buffers was not open source when Thrift was written. |
RFC1832 has a rationale for decision 2 towards the end of the document. Here is an excerpt.
Most RISC architectures of the time did not support unaligned memory accesses.Decision 3 is a matter of timing. RFC1014 ( https://tools.ietf.org/html/rfc1014) was released in 1987. Unicode was still a work in progress.
Edited to fix formatting of the excerpt and a typo.