| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by LunaSea 836 days ago

It might be getting better, but the examples are currently so egregious that it's tough to keep giving DuckDB a chance.

Example of a query that should never, ever, out-of-memory, but absolutely will in the latest DuckDB:

  COPY
    (
      SELECT
        rs.my_int,
        rs.my_bigint
      FROM
        READ_PARQUET('s3://some/folder/my-large-files-*.parquet')
        AS rs
    )
  TO
    '/my/home/folder/my-large-file.parquet'
    (
      FORMAT PARQUET,
      ROW_GROUP_SIZE 100000,
      COMPRESSION 'ZSTD'
    )
  ;

This query should simply read the two column series selected based on the parquet metadata and then stream the data to the disk.

And yet it will try to load data in memory before crashing.

1 comments

cmdlineluser 836 days ago

Does it fail on nightly?

There were some recent fixes: https://github.com/duckdb/duckdb/issues/10737

link