|
|
|
|
|
by LunaSea
836 days ago
|
|
It might be getting better, but the examples are currently so egregious that it's tough to keep giving DuckDB a chance. Example of a query that should never, ever, out-of-memory, but absolutely will in the latest DuckDB: COPY
(
SELECT
rs.my_int,
rs.my_bigint
FROM
READ_PARQUET('s3://some/folder/my-large-files-*.parquet')
AS rs
)
TO
'/my/home/folder/my-large-file.parquet'
(
FORMAT PARQUET,
ROW_GROUP_SIZE 100000,
COMPRESSION 'ZSTD'
)
;
This query should simply read the two column series selected based on the parquet metadata and then stream the data to the disk.And yet it will try to load data in memory before crashing. |
|
There were some recent fixes: https://github.com/duckdb/duckdb/issues/10737