Hacker News new | ask | show | jobs
by ajanuary 2344 days ago
Garbage and poorly specified csv files are a fact of life and people have to deal with them all the time.

But if you want to be in a world where people only deal with well specified files like RFC 4180 (for some definition of well specified), your quick field pattern doesn’t conform. It doesn’t handle escaped double quotes or quoted line breaks. If you’re using your quick awk command to transform an RFC 4180 file into another RFC 4180 file you’ve just puked out the sort of garbage you were railing against.

While awk is a great tool if you’re dealing with a csv format with a predictable specification, and probably could be made to bend to the GP will with a little more knowledge, it gets trickier if you’re dealing with handling some of the garbage that comes up in the real world. What’s worse is the programming model leads you down the path of never validating your assumptions and silently failing.

I love awk for interactive sessions when I can manually sanity check the output. But if I’m writing something mildly complex that has to work in a batch on input I’ve never seen, I too would reach for ruby.