Hacker News new | ask | show | jobs
by stdbrouw 749 days ago
"Good enough" makes it sound like barely a step up from a CSV file. I'd say its support for various encodings [1] including a great default (dictionary + run length encoding on the indices) and compression algorithms that can be set for each individual column, columnar access, partitioning, a parallelized reader out of the box, in-memory filtering and other ops concurrently with loading in the data (thanks to Arrow) etc. etc. are all really wonderful when working with medium-sized data.

[1] https://parquet.apache.org/docs/file-format/data-pages/encod...

1 comments

Agreed. On a scale of 10 in terms of current technology, CSV is a 1 while Parquet is 7. ORC is maybe 7.2. But parquet is far more ubiquitous than ORC (I’ve never seen ORC in prod but I also have limited sample sizes)

I’m sure there are more advanced formats.