Hacker News new | ask | show | jobs
by chasil 1739 days ago
Awk is not really very good at reading complex CSVs (as defined in RFC-4180), where newlines (record separators) can appear within quoted strings. It can be done, but sometimes it's tricky.

The PHP fgetcsv function has been more convenient when I have had more exotic examples.

If the CSV is simple, awk remains a very good tool.

2 comments

CSVs with quoted fields and imbedded newlines can be troublesome in awk. Years ago I had found a script that worked for me, I'm not sure but I think it was this:

http://lorance.freeshell.org/csv/

There's also https://github.com/dbro/csvquote which is more unix-like in philosophy: it sits in a pipeline, and only handles transforming the CVS data into something that awk (or other utilities) can more easily deal with. I haven't used it but will probably try it next time I need something like that.

if the csv is RFC-4180 then it can handle it[0]. the only caveat is that you can't disable FS="" correctly. but a gawk -i ./csv.awk -e '{print $5}' would work on most csv files I've tried.

https://raw.githubusercontent.com/Nomarian/Awk-Batteries/mas...