Hacker News new | ask | show | jobs
by nuc1e0n 1039 days ago
Large CSV files do occur 'in the wild'. Whether they should or not is beside the point. Sometimes CSV is the only option to import or export data from ancient 'Enterprise' horror systems, purely because it was easy for the original developers to implement. Excel's CSV support has been demonstrated to not be fit for the purpose, as one of the other commenters here points out.

I'd not heard of parquet before today, but a cursory glance reveals it to be a stupid format. It's sold as 'smaller than csv', but size isn't the problem CSVs are solving. It's that with the CSV format it's trivial to output or read data. With Parquet it's not.

I'd imagine if you were storing data on a server it would be better to import it into a proper database rather than storing it as a file on something like S3. Even compressing a CSV file with gzip would reduce the file size similarly and in a more standardized way if that's what you really need to do.