Hacker News new | ask | show | jobs
Memory Efficient Data Streaming to Parquet Files (estuary.dev)
28 points by danthelion 689 days ago
1 comments

Your article does not mention how much runtime improvement you have observed, can you share those numbers ?
With the 2-pass strategy, we can write arbitrary row group sizes while using a fixed amount of memory, with probably 100-200 MiB of overhead for the parquet file processing, depending on how large the metadata is for the scratch file. without the 2 pass strategy, the amount of memory is proportional to the size of the row group.