Hacker News new | ask | show | jobs
by largbae 2 hours ago
This could use a bit more "why".

Shortcomings of Parquet are mentioned as overcome by this, which ones? Certainly not wide tool support...

Why should one leave Parquet or ORC for this structure?

4 comments

The ‘why’ is referenced in the bibliography at the end of the readme; this repo is not meant to be consumed standalone. Start with the paper instead:

https://doi.org/10.1145/3749163

I also had no idea what they were talking about, but there's good points about how hardware oblivious and somewhat global is Parquet around metadata.

I found this post interesting,

- https://medium.com/@reliabledataengineering/f3-the-future-pr...

Yeah it seems like most of this can be handled by some more dev hours to Parquet
Paper mentions Parquet, ORC, Nimble, Lance, TSFile, Bullion, and BtrBlocks.