Hacker News new | ask | show | jobs
by emef 1504 days ago
parquet is great but it's not particularly easy to read or write. the libraries that do exist to work with it are few and far between, and those that do either have a hundred dependencies or depend on native code (e.g. libarrow). certainly an important dimension in an ideal file format should be the ease of parsing/writing it, and parquet gets an extremely low score on that front imo
2 comments

Parquet is also column-major which is great for many use cases, but bad for others, where row-major is more useful. For example, if you want to get just the first x rows.
Then you want avro
Sure, but any new format is going to have the same problems. I think you're right that implementation complexity needs to be considered, but it's not like Word or Excel files or something where you need to replicate bug for bug a format accreted over decades.

Parquet isn't trivial to parse / write but that's probably good imo. CSV is really easy to write, and... that just means everybody does it slightly differently. Being somewhat difficult to interact with encourages people to use a library to centralize a bit, but it's not so complex that someone motivated couldn't write a new implementation in a reasonable amount of time.