Hacker News new | ask | show | jobs
by riku_iki 962 days ago
I had the same idea in my mind, but I am struggle to understand efficient implementation.

Say, I have large table A, which I want to group by. I can chunk it into small tables, for example:

for i in range(64): execute('create table_%i as select * from A where hash(id) % 64 = %i')

but this would mean scanning source table 64 times which is likely not efficient, I am wondering if duckdb has some functionality which can do more efficient chunking?

1 comments

It's not doing a full table scan of the entire source table 64 times -- it's only scanning the id column (DuckDB + Parquet is a columnar store) and retrieving the corresponding rows from other columns. It's pretty fast.