|
|
|
|
|
by rkaveland
822 days ago
|
|
Author here. I see now that the title is too controversial, I should have toned that down. As I mention in the conclusion, if you're giving parquet files to your user and all they want to know is how to turn it into Excel/CSV, you should just give them Excel/CSV. It is, after all, what end users often want. I'm going to edit the intro to make the same point there. If you're exporting files for machine consumption, please consider using something more robust than CSV. |
|
People are busy; instead of hinting "something more robust than CSV", mention the alternatives and show a comparison (load time/search time/compression ratio) summary graph. (Where is the knee of the curve?)
There's also an implicit assumption to each use-case about whether the data can/should fit in memory or not, and how much RAM a typical machine would have.
As you mention, it's pretty standard to store and access compressed CSV files as .csv.zip or .csv.gz, which mitigates at least trading off the space issue for a performance overhead when extracting or searching.
The historical reason a standard like CSV became so entrenched with business, financial and legal sectors is the same as other enterprise computing; it's not that users are ignorant; it's vendor and OS lock-in. Is there any tool/package that dynamically switches between formats internally? estimates comparative file sizes before writing? ("I see you're trying to write a 50Gb XLSX file...") estimates read time when opening a file? etc. Those sort of things seem worth mentioning.