Hacker News new | ask | show | jobs
by exdsq 1743 days ago
I had to take large CSV files like {question, right_ans, wrong_ans1, wrong_ans2, wrong_ans3} and covert them into SQL insert files. Few caveats - some could be duplicates, some characters were not allowed, and some had formatting issues. The first issue was avoided by upserting, but the other two I used Awk and Sed for and put together a fairly robust script far quicker than if I reached for Python. I probably would have reached for Python if I realised how many edge cases there were but I didn't know that at the start so the script just sort of grew as I went along, but now they're my go-to tools for similar tasks.
2 comments

Awk is not really very good at reading complex CSVs (as defined in RFC-4180), where newlines (record separators) can appear within quoted strings. It can be done, but sometimes it's tricky.

The PHP fgetcsv function has been more convenient when I have had more exotic examples.

If the CSV is simple, awk remains a very good tool.

CSVs with quoted fields and imbedded newlines can be troublesome in awk. Years ago I had found a script that worked for me, I'm not sure but I think it was this:

http://lorance.freeshell.org/csv/

There's also https://github.com/dbro/csvquote which is more unix-like in philosophy: it sits in a pipeline, and only handles transforming the CVS data into something that awk (or other utilities) can more easily deal with. I haven't used it but will probably try it next time I need something like that.

if the csv is RFC-4180 then it can handle it[0]. the only caveat is that you can't disable FS="" correctly. but a gawk -i ./csv.awk -e '{print $5}' would work on most csv files I've tried.

https://raw.githubusercontent.com/Nomarian/Awk-Batteries/mas...

"""I probably would have reached for Python if I realised how many edge cases there were"""

This is the counter for all the "success" stories of awk users that walked away with an underspecced and underdeveloped 5 minute solution.

Most people reach for what they know best. I'm not sure it really proves anything about relative merits.