Storing larger data sets in CSV format is a recipe for disaster. As tech industry we should really come together With a standard binary format for data exchange. Maybe Arrow?
Arrow is designed for in-memory processing. It can be saved on disk so you can open it directly (memory map) but it's not a great storage format. Parquet or ORC is a better choice, but they don't have as much tooling for import/export. CSV is just the simplest way to transfer data.
You might be interested in DuckDB though which trying to create a new standard for passing datasets: https://duckdb.org/
You might be interested in DuckDB though which trying to create a new standard for passing datasets: https://duckdb.org/