|
What do you mean by they both have self-describing schemas? In order to read or write Avro data, an application needs to possess a schema for that data -- the specific schema that the data was written with, and (when writing) the same schema that a later reader expects to find. This means the data is not self-describing. Ion is designed to be self-describing, meaning that no schema is necessary to deserialize and interact with Ion structures. It's consequently possible to interact with Ion in a dynamic and reflective way, for example, in the same way that you can with JSON and XML. It's possible to write a pretty-printer for a binary Ion structure coming off the wire without having any idea of or schema for what's inside. Ion's advantage over those formats is that it's strongly typed (or richly typed, if you prefer). For example, Ion has types for timestamps, arbitrary-precision decimals like for currency, and can embed binary data directly (without base64 encoding), etc. I wouldn't try to say that one or the other is better across the board. Rather, they have tradeoffs and relative strengths in different circumstances. Ion is in part designed to tackle scenarios like where your data might live a really long time, and needs to be comprehensible decades from now (whether you kept track of the schema or not, or remember which one it was); and needs to be comprehensible in a large distributed environment where not every application might possess the latest schema or where coordinating a single compile-time schema is a challenge (maybe each app only cares about some part of the data), and so on. Ion is well-suited to long-lived, document-type data that's stored at rest and interacted with in a variety of potentially complex ways over time. Data data. In the case of a simple RPC relationship between a single client and service, where the data being exchanged is ephemeral and won't stick around, and it's easy to definitively coordinate a schema across both applications, a typical serialization framework is a fine choice. |
"Avro data is always serialized with its schema. Files that store Avro data should always also include the schema for that data in the same file. Avro-based remote procedure call (RPC) systems must also guarantee that remote recipients of data have a copy of the schema used to write that data."
https://avro.apache.org/docs/current/spec.html#Data+Serializ...