|
|
|
|
|
by RobinL
1262 days ago
|
|
I'm not an expert in the nuts and bolts of Arrow, but I think you have two options: - Save to feather format. Feather format is essentially the same thing as the Arrow in-memory format. This is uncompressed and so if you have super fast IO, it'll read back to memory faster, or at least, with minimal CPU usage. - Save to compressed parquet format. Because you're often IO bound, not CPU bound, this may read back to memory faster, at the expense of the CPU usage of decompressing. On a modern machine with a fast SSD, I'm not sure which would be faster. If you're saving to remote blob storage e.g. S3, parquet will almost certainly be faster. See also https://news.ycombinator.com/item?id=34324649 |
|