|
|
|
|
|
by pauldix
2053 days ago
|
|
Yeah, Parquet is awesome. One of the things we really want to do here is to push DataFusion (the Rust based SQL execution engine) to work on Parquet files, but to push down predicates and other things and operate on the data while it's compressed. You pay such a high overhead marshalling that data into an Arrow RecordBatch. Best thing ever is to work with the Parquet file and not even decompress the chunks that you don't need. Of course, this assumes that you're writing summary statistics as part of the metadata, which we plan to do. |
|
Improving our stats writing could yield a lot of benefits. I'll open JIRAs for this in the next few days.