Hacker News new | ask | show | jobs
by tomkwong 2075 days ago
Storing larger data sets in CSV format is a recipe for disaster. As tech industry we should really come together With a standard binary format for data exchange. Maybe Arrow?
2 comments

Arrow is designed for in-memory processing. It can be saved on disk so you can open it directly (memory map) but it's not a great storage format. Parquet or ORC is a better choice, but they don't have as much tooling for import/export. CSV is just the simplest way to transfer data.

You might be interested in DuckDB though which trying to create a new standard for passing datasets: https://duckdb.org/

why not just pass around sqlite databases?