Hacker News new | ask | show | jobs
by didip 1968 days ago
Question, doesn't Parquet already do that?
2 comments

From https://arrow.apache.org/faq/: "Parquet files cannot be directly operated on but must be decoded in large chunks... Arrow is an in-memory format meant for direct and efficient use for computational purposes. Arrow data is... laid out in natural format for the CPU, so that data can be accessed at arbitrary places at full speed."
Yes. But parquet is now based on Apache Arrow.
Parquet is not based on Arrow. The Parquet libraries are built into Arrow, but the two projects are separate and Arrow is not a dependency of Parquet.
Arrow has definitely influenced the design of Parquet, they’re meant to compliment each other.