Hacker News new | ask | show | jobs
by andenacitelli 795 days ago
Can we just all converge on Parquet + Arrow and call it a day please? Too much effort being put into 1..N ways to solve a problem that would be better put towards a single standard.

We work with Parquet + Arrow every day at $DAYJOB in a ML and Big Data context and it's been great. We don't even think we're using it to its fullest potential, but it's never been the bottleneck for us.

1 comments

How is the data schema description language btw? I haven't used either yet.
Haven't used it directly myself. We mostly just use it for DataFrame crunching via pandas and/or polars (our usage is mixed) which tends to benefit nicely from columnar access.
Check out DFLib (https://dflib.org)