Hacker News new | ask | show | jobs
by mempko 1261 days ago
I opted to store feather for one particular reason. You can open it using mmap and randomly index the data without having to load it all in memory. Also the data I have isn't very compressible to begin with, so the cpu cost vs data savings of parquet don't make sense. This only makes sense in that narrow use case.
2 comments

I'm doing the same. It's also quite nice for de-duplication, a lot of operations on our data happen on a column basis, and we need to assemble tables that are basically the same, except for one or two computed columns. I usually store all columns in a separate file, and assemble tables on the fly, also memory-mapped. Quite happy with being able to do that. Not sure how easy that would be with parquet.
As someone new to Arrow/columnar DB's, do you mind sharing what kind of data makes sense to use Arrow for, but isn't very compressible?