Hacker News new | ask | show | jobs
by justin_oaks 1748 days ago
I only recently learned Awk enough to be useful. But I still don't reach for it when I probably should.

What are the most common cases where you reach for Awk instead of some other tools?

I recently used it to parse and recombine data from the OpenVPN status file. That file has a few differently formatted tables in the same file. Using Awk, I was able to change a variable as each table was encountered, this I could change the Awk program behavior by which table it was operating on.

6 comments

Here is a script that I use to send SMTP mail, via the gawk networking extensions. I have a few different versions, but this is the most basic:

    #!/bin/gawk -f

    BEGIN { smtp="/inet/tcp/0/smtp.yourhost.com/25";
    ORS="\r\n"; r=ARGV[1]; s=ARGV[2]; sbj=ARGV[3]; # /usr/local/bin/awkmail to from subj < in

    print "helo " ENVIRON["HOSTNAME"]       |& smtp; smtp |& getline j; print j
    print "mail from: " s                   |& smtp; smtp |& getline j; print j
    if(match(r, ","))
    {
      split(r, z, ",")
      for(y in z) { print "rcpt to: " z[y]  |& smtp; smtp |& getline j; print j }
    }
    else { print "rcpt to: " r              |& smtp; smtp |& getline j; print j }
    print "data"                            |& smtp; smtp |& getline j; print j

    print "From: " s                        |& smtp; ARGV[2] = ""   # not a file
    print "To: " r                          |& smtp; ARGV[1] = ""   # not a file
    if(length(sbj)) { print "Subject: " sbj |& smtp; ARGV[3] = "" } # not a file
    print ""                                |& smtp

    while(getline > 0) print                |& smtp

    print "."                               |& smtp; smtp |& getline j; print j
    print "quit"                            |& smtp; smtp |& getline j; print j

    close(smtp) } # /inet/protocol/local-port/remote-host/remote-port
This allows me to bypass the local MTA (if present). The message ID is also returned, which can be useful to log.
I had to take large CSV files like {question, right_ans, wrong_ans1, wrong_ans2, wrong_ans3} and covert them into SQL insert files. Few caveats - some could be duplicates, some characters were not allowed, and some had formatting issues. The first issue was avoided by upserting, but the other two I used Awk and Sed for and put together a fairly robust script far quicker than if I reached for Python. I probably would have reached for Python if I realised how many edge cases there were but I didn't know that at the start so the script just sort of grew as I went along, but now they're my go-to tools for similar tasks.
Awk is not really very good at reading complex CSVs (as defined in RFC-4180), where newlines (record separators) can appear within quoted strings. It can be done, but sometimes it's tricky.

The PHP fgetcsv function has been more convenient when I have had more exotic examples.

If the CSV is simple, awk remains a very good tool.

CSVs with quoted fields and imbedded newlines can be troublesome in awk. Years ago I had found a script that worked for me, I'm not sure but I think it was this:

http://lorance.freeshell.org/csv/

There's also https://github.com/dbro/csvquote which is more unix-like in philosophy: it sits in a pipeline, and only handles transforming the CVS data into something that awk (or other utilities) can more easily deal with. I haven't used it but will probably try it next time I need something like that.

if the csv is RFC-4180 then it can handle it[0]. the only caveat is that you can't disable FS="" correctly. but a gawk -i ./csv.awk -e '{print $5}' would work on most csv files I've tried.

https://raw.githubusercontent.com/Nomarian/Awk-Batteries/mas...

"""I probably would have reached for Python if I realised how many edge cases there were"""

This is the counter for all the "success" stories of awk users that walked away with an underspecced and underdeveloped 5 minute solution.

Most people reach for what they know best. I'm not sure it really proves anything about relative merits.
Have found static builds of awk useful in low-dependency work. I bundled it with a windows installer to do some wrangling we needed at install time. Another time I was sending packages to a unix cluster, but did not have access myself. Used awk as part of the bootstrap for the package.

I used to write event-driven scripts off it - each line is a message, interpreted by awk. Something I was not able to get working with any of the awks I tried was where you append messages to the file as you are consuming it (this is kind of like code generation). I ended up doing this in python (https://github.com/cratuki/interface_script_py).

Anything that is command line based and needs small changes to text input can be done with awk. It is a very competent language for scripts.
I use it a lot to filter, slice, and dice CSV (or other delimited) or fixed-format files. Sometimes I'll use q[1] if my needs are more complex. Or awk piped to q. It can be used as a fairly decent report generator for plain-text or HTML reports.

An time I want to process a bunch of lines in a text file, awk is my first consideration.

[1] http://harelba.github.io/q/

From what I can tell, Awk really shines in two places, transformation and collation, both of which require some form of structured file. You can transform one structure into another and you can process record by record to some form of collation or summary.