|
|
|
|
|
by FridgeSeal
720 days ago
|
|
Parquet is designed with predicate push-down in mind. Partitions are laid out on disk, and then blocks within files are laid out so that consumers can very, very easily narrow in on which files they need to read, before doing anymore IO than a list, or a small metadata read. Once you know what you are reading, many parquet/arrow libraries will support streaming reads/aggregations, so the client doesn’t need to load the whole working set in memory. |
|
For all others you'll need to download all columns you are filtering or selecting.