|
|
|
|
|
by CraigJPerry
214 days ago
|
|
At 650tb it's not a memory bound problem: working memory requirements 1. Assume date is 8 bytes
2. Assume 64bit counters
So for each date in the dataset we need 16 bytes to accumulate the result.That's ~180 years worth of daily post counts per gb ram - but the dataset in the post was just 1 year. This problem should be mostly network limited in the OP's context, decompressing snappy compressed parquet should be circa 1gb/sec. The "work" of parsing a string to a date and accumulating isn't expensive compared to snappy decompression. I don't have a handle on the 33% longer runtime difference between duckdb and polars here. |
|