Hacker News new | ask | show | jobs
by zbanks 4700 days ago
Once your CSV (or TSV) files start having quoted fields, they become very tricky to parse using standard multi-purpose tools like sort, awk, & uniq.

It's hard enough when you have delimiters in quoted fields, but dealing with quoted newlines starts to become unreasonable, especially for line-based tools.

CSV files, as you say, are absolutely wonderful to create. Problems come up when you try to parse files other people write. Not everyone follows RFC 4180.

2 comments

Problems come up when you try to parse files other people write. Not everyone follows RFC 4180.

Plus you've got encodings. If you're accepting CSVs from users, they'll generally come from Excel, which will produce different encoding in different circumstances.

With tab separated values this is not a big problem in practice. On the other hand, you can sort tsv, but you can't sort quoted csv.
Why not? I ask in earnest.
sort -n gets confused by quotes.