Hacker News new | ask | show | jobs
by mhall119 2053 days ago
Parquet is the persistent storage format for when the data is written to disk or object storage. Arrow is the in-memory format and a set of tools built around working with it.

So InfluxDB IOx will use both, Arrow in-memory for fast access, and Parquet on storage for persistence.

1 comments

Ok, I didn't realize that Arrow was in-memory and not also on-disk. As serializing between the two didn't make sense to me. I would have thought that Arrow would also be an on-disk format (mmap'd) so that there would be little to no conversion losses.

Why convert to an on-disk format (Parquet) and not save the in-memory representation to storage directly?