Hacker News new | ask | show | jobs
by Pinus 1290 days ago
CSV looks deceptively simple. It is far too easy to just write(','.join(whatever)), which sort of works, until it doesn’t, and then someone, sometimes I, has to sort out the resulting mess. PLEASE use a proper CSV library (Python comes with a CSV module in the standard library), or at least implement the entire format according to the RFC from the outset, even if you think you won’t need it!
3 comments

Oh yes. CSVs are deceptively challenging especially if your use-case is from excel files to csv. Excel will happily convert a worksheet to csv, but it's a naive conversation. Headers that start on line 3, multi-line headers, inconsistent column counts, etc. It adds up really quickly!
I've also run into issues where I wrote some code that worked with csv input, and told users they could just export their data from excel. turns out excel doesn't export in utf-8 by default, we had some weird issues until we figured that out.
Another issue is passing those CSV files across international borders - a CSV file that works in the UK (commas as separators and a decimal point) may not be readable in Germany (semicolons as separators and decimal commas) without some configuration.
This is a matter of developer education. The correct way to create and parse CSV files is to use a third-party library. They can get complicated. A field in a CSV can contain commas and quotes. In some cases, a single field can contain a line-feed, and you'll need to ensure the parser you use supports that. This would allow an entire CSV file to be embedded inside the field of a CSV field. At a minimum, a parser must support Excel's default parser logic.

But, if you pick the right parser and generator, then you're ok with using it.

Right, but if you're picking CSV, you likely expect to interoperate with a provider that's not yourself. And then there's no way a parser can handle all CSV formats in the wild.

e.g. the example from my comment on the last CSV discussion (https://news.ycombinator.com/item?id=28223719)

What variant is this:

    1,5,Here is a string "" that does stuff,2021-1-1
What is the value of the third column?

Is this a CSV file without quoting? Then it's

    Here is a string "" that does stuff
Or is it a CSV file with double quote escaping? Then it's

    Here is a string " that does stuff
Yeah, but this is less of a problem if it's an internal API. You can stick to a stricter subset of the standard, and/or only handle the types of column values that you actually need

Still probably worth using a library, but it isn't a source of problems in my experience