Hacker News new | ask | show | jobs
by TechBro8615 1875 days ago
> unless "disk" is super fast and thus more likely memory, and your data is ephemeral, you probably shouldn't

Can you elaborate why Arrow is not a good format for storing to disk? If you’re using it for in-memory querying, why would you not want to also serialize it directly to disk instead of using some intermediary format?

1 comments

Stability: The format is still evolving

Performance: Arrow does not do significant compression. Feather started adding it, but that adds even more change risk. Parquet/ORC/Arrow are all fairly similar, so until Arrow catches up and stablizes, I'd stick w/ Parquet/ORC. We do GPU stuff, and get in-GPU decompression already, so that's been a win/win.