Hacker News new | ask | show | jobs
by Mordak 2514 days ago
At my work we recently went through a large exercise to decide on a common data storage format. The contenders were JSON, MessagePack, and Avro. MessagePack won because:

- Msgpack serialization and deserialization is very fast in many languages - often 100x faster then JSON

- Msgpack natively supports encoding binary data

- Msgpack has type extensions, making it trivial to represent common types in an efficient way (eg. IPv4 address, timestamps)

- Msgpack has good libraries available in many languages

If you do not care about those things (no binary data, no need for extended types, not performance critical) then JSON is just fine.

1 comments

I'm curious why didn't you consider FlatBuffers as well.
FlatBuffers are not self-describing.

FlatBuffers, Protobuf, Cap'n Proto, etc., all require an external schema configuration that you compile into a code chunk that you include into your program. Without this it is impossible to make sense of the data. In our case, the data is semi-structured and changes frequently. The prospect of maintaining a schema registry for all the data users and keeping everyone up to date and backwards compatible is enough of a burden that it was excluded.

Avro also uses schemas, but since the schema is embedded in the file it is self-describing so the reader does not need to do anything special to interpret the data. But Avro's C library is buggy and the python deserialization performance was terrible, so Avro was not selected.