|
|
|
|
|
by davesque
751 days ago
|
|
If the parquet file includes any row group stats, then I imagine DuckDB might be able to use those to avoid scanning the entire file. It's definitely possible to request specific sections of a blob stored in S3. But I'm not familiar enough with DuckDB to know whether or not it does this. |
|
Parquet pushdowns combined with Hive structuring is a pretty good combination.
There are some HTTP and Metadata caching options in DuckDB, but I haven't really figured out how and when they really making a difference.