|
|
|
|
|
by MSM
2011 days ago
|
|
EDIT: After looking into it, it seems like Spark calls both things predicate pushdowns (eliminating unnecessary row group reads via the statistics AND pushing the predicates down to the lowest possible level). You're right, I'm wrong! >Parquet files contain min/max metadata for all columns. When possible, entire files are skipped, but this is relatively rare. This is called predicate pushdown filtering. A nitpick, but I wouldn't call this predicate pushdown, it's partition (or segment) elimination. A predicate being pushed down potentially allows files to be skipped through this process though |
|