Hacker News new | ask | show | jobs
by ekzhu 1658 days ago
TLDR: Arrow got an SQL interface provided by DuckDB.

So you have a new way to run SQL on Parquet et al through DuckDB -> Arrow -> Parquet. Of course, you still need to watch out for memory usage of your SQL query if it contains JOINs or Window functions because the integration is designed for streaming rows.

1 comments

You could already run SQL on Parquet with DuckDB (even on Java). I believe it already used Arrow under the hood to read the Parquet files, could be wrong tho, but this is a more memory performant integration which is great.
DuckDB has its own Parquet reader
DuckDB can read parquet directly - however the interesting bit is that the results of SQL queries can be returned as arrow objects into python for further processing (by pyarrow/pandas).