Hacker News new | ask | show | jobs
by elmolino89 1692 days ago
Not really DuckDB-Wasm question but DuckDB:

I got a data sets probably not suitable for loading into a memory table (close to 1000M rows CSV). I did split it into 20M rows chunks, read one by one into a DuckDB temporary table and exported as parquet.

SELECT using glob prefix.*.parquet where mycolumb=foobar does work but can be a bit faster. Apart from sorting the input to parquet CSVs, what can he done? The CSV chunks were already sorted.