It's not that CSV is hard to parse. It's that there's no guarantee that you'll get proper CSV. For example, it may literally be "CSV", without quotes. And that's fatal if values contain commas. I've even seen CSV with values that contain ``","``!
RFC4180 [1] standardizes CSV, but there are many implementations that don't read this 100% and unfortunately even more (including an extremely popular spreadsheet application) that don't write it.
If you are including CSV functionality in something you work on, please read and follow this (tiny) spec!
Whether Excel writes standard CSV or not depends on the user's locale settings. E.g. with a German locale you get a semicolon (;) as a separator which you can only change system-wide. However, apart from the changed seperator, it's still standard CSV, which still works with Python or SQLite (.separator ; .import foo.csv foo).
A bigger issue is that Excel tends to write large numbers in scientific notation, which is a common issue handling price lists. E.g. it'll turn EAN numbers into 6.2134e+11, losing most of the number. Then you have to go back to the XLS file and change the column type into text and exporting it again as CSV. As this is lossy you can't fix it when receiving such a file.
Something like the SQL Server Import/Export Wizard but being able to write SQLite files would be very handy.
The worst part about CSV is the locale-specific format used by some programs. For example when exporting to CSV a German Excel will use the comma as the decimal separator in numbers and a semicolon for the value separator. So you need to specify the exact format for each individual file you want to import.
Back when I worked at an NLP company, my boss used to say that he was "morally opposed to CSV".
When sharing tabular data in text format, he always preferred TSV because commas were everywhere in the material we were working with, but tab characters were really rare.
I've always been curious about the characters in ASCII for this, but I've never seen them used in the wild. Stuff like "Group Separator" (0x1D), "Record Separator" (0x1E) or "Unit Separator" (0x1F)
Is there a reason why nobody uses these? Did someone work out back in the 90s they were pure evil and we've just never used them since?
Those codes were originally for just such a purpose, but as others point out there was a bootstrapping problem. At this point, if Excel doesn’t support it, it’s not going to gain traction.
About 10 years ago I had to use those for a financial system integration. I was getting files that had been created on a mainframe, and whoever wrote it originally had the foresight to use those characters. Probably because they were based out of EMEA and understood that commas weren’t useful across national borders.
When I was working as a forensic data analyst, I felt pretty much the same. If you request CSV in discovery, there's no telling what you'll get. I mean, the data may come from custom COBOL, and then get reviewed by someone using Excel.
So yes, TSV. However, I've seen TSV with spurious tabs :(
Sometimes I ended up pushing for |-delimited data. Or even fixed-width format :)
But more likely is twisted creativity. It seems that some businesses are still using ancient systems, based on COBOL, AS/400, etc. There's resistance to changing legacy code. So when business changes require additional data fields, fields sometimes get subdivided. So a field that originally contained stuff like |foo| now contains stuff like |"foo","bar,baz"| or whatever. That works, because there's nothing like CSV in the data system. But when someone tries a CSV export, you get garbage.
Newlines in the middle of a quoted field will cause problems for a lot of tools. And Python's csv.DictReader gives an error when a delimited file has too few fields.