Hacker News new | ask | show | jobs
by LatexWriter 682 days ago
Your article does not mention how much runtime improvement you have observed, can you share those numbers ?
1 comments

With the 2-pass strategy, we can write arbitrary row group sizes while using a fixed amount of memory, with probably 100-200 MiB of overhead for the parquet file processing, depending on how large the metadata is for the scratch file. without the 2 pass strategy, the amount of memory is proportional to the size of the row group.