|
|
|
|
|
by wenc
961 days ago
|
|
Breaking it up. Instead of running a query for the entire year, run it month by month and stitch the final results back together. Or if you have a unique string ID, calculate an integer hash using hash(ID) % 50 to get 50 chunks which you can process separately without OOMing. A basic assumption is that all the chunks are independent of each other. Chunking is essentially temporary partitioning to fit your processing limitations. |
|
Say, I have large table A, which I want to group by. I can chunk it into small tables, for example:
for i in range(64): execute('create table_%i as select * from A where hash(id) % 64 = %i')
but this would mean scanning source table 64 times which is likely not efficient, I am wondering if duckdb has some functionality which can do more efficient chunking?