Hacker News new | ask | show | jobs
by ts0000 1515 days ago
Interesting, for me it's the exact opposite.

I've tried a couple of times to get into awk, but still find the syntax arcane.

1 comments

I don't know; I wouldn't presume to tell you what you do or don't find arcane, but once I understood the somewhat unusual flow of awk ("for every line, check if the line matches this condition, and if it does run this block of code") I found it's quite easy to work with. It's "arcane" in the sense that it has an implicit loop and that it's a specialized language for a very limited class of problems, but I found that for this limited class of problem it's surprisingly effective.

  > an implicit loop
As an occasional awk user, I'd love if you expand on this. Maybe it will help clear things up for me. You're not referring to the fact that awk operates on every line independently, are you?
My mental image of awk has always been something along these lines:

    for line in readfile()
        for block in script:
            if block.match(line)
                run_block(block)
            end
        endfor
    endfor
Where the "for line in readfile()" is the "implicit loop", and the blocks are the "condition { .. }" blocks.

The actual flow is a little bit more complex and has some exceptions e.g. (BEGIN/END), but this is about the gist of it.

Thanks. Yes, I agree that my mental image is pretty much the same but it's nice to see it expressed in Python modulo end keywords ))
To expand on the other reply, there are a couple more implicit loops. There's a loop over all of the command line arguments/files, then a loop for every line in each of those files, then there is kind of a loop over the whitespace delimited fields of each of those lines. The main thing that helped me understand AWK was that every block in a script is just a pattern/action pair. When I saw snippets like

  ... | awk '{print $2}' 
I thought there was all this confusing syntax, but something like

  awk '/pattern/ {print}'
was more clear to me. In the first case, the empty pattern matches every line of the input, and the action is simply to print the second field of each line. Patterns can vary in complexity from the empty pattern to long chains of logical operators and regular expressions, such as /pattern/ in the second example. The outer quotes are just to prevent the shell from eating your dollar signs or other special characters. In a standalone AWK script you can write it like

  /pattern/ {
    print
  }
which also makes it look more like another language.

If you can get your hands on a copy of The AWK Programming Language, it's a pretty quick and pleasant read that helped everything make more sense to me. I do most of my data analysis for my research using AWK and really enjoy working with it.

  > The AWK Programming Language
I see it's public domain and discussed here on HN: https://news.ycombinator.com/item?id=13451454

I'll go over it, thank you very much for the suggestion.