|
|
|
|
|
by RobinL
1261 days ago
|
|
Yes - save to parquet. From the OP: "Why not just persist the data to disk in Arrow format, and thus have a single, cross-language data format that is the same on-disk and in-memory? One of the biggest reasons is that Parquet generally produces smaller data files, which is more desirable if you are IO-bound. This will especially be the case if you are loading data from cloud storage like such as AWS S3. Julien LeDem explains this further in a blog post discussing the two formats: >> The trade-offs for columnar data are different for in-memory. For data on disk, usually IO dominates latency, which can be addressed with aggressive compression, at the cost of CPU. In memory, access is much faster and we want to optimise for CPU throughput by paying attention to cache locality, pipelining, and SIMD instructions. https://www.kdnuggets.com/2017/02/apache-arrow-parquet-colum..." |
|