Hacker News new | ask | show | jobs
by jdthedisciple 1036 days ago
Looks great, definitely bookmarked for the next time I'm gonna work w large csv files.

On a side note: Howcome MS failed so hard at solving this problem with Excel?

Considering it's old age and the budget behind it it would seem like Excel should be THE solution to anything to do with tabular data by now.

2 comments

>Considering it's old age and the budget behind it it would seem like Excel should be THE solution to anything to do with tabular data by now.

If they change anything about how the (very broken) way they handle CSV files in Excel, it will break so many things. So they dare not change it. In fact genes have been renamed because Excel is so broken but unlikely to change:

https://www.popularmechanics.com/technology/design/a33549357...

Maybe there is no real need for supporting large CSV files? Typically large amounts of data will be stored in a database (in which case you can query with SQL), or you will be using large-data oriented file formats like parquet. Excel's CSV support is just good enough for 99% of the real world use cases.
Large CSV files do occur 'in the wild'. Whether they should or not is beside the point. Sometimes CSV is the only option to import or export data from ancient 'Enterprise' horror systems, purely because it was easy for the original developers to implement. Excel's CSV support has been demonstrated to not be fit for the purpose, as one of the other commenters here points out.

I'd not heard of parquet before today, but a cursory glance reveals it to be a stupid format. It's sold as 'smaller than csv', but size isn't the problem CSVs are solving. It's that with the CSV format it's trivial to output or read data. With Parquet it's not.

I'd imagine if you were storing data on a server it would be better to import it into a proper database rather than storing it as a file on something like S3. Even compressing a CSV file with gzip would reduce the file size similarly and in a more standardized way if that's what you really need to do.

You'd hope so, but the UK government used Excel to manage some COVID data which it then lost because there were too many rows (65k+) for the format to handle.