Hacker News new | ask | show | jobs
by MaDeuce 2926 days ago
I was first exposed to Awk when I started work at Bell Labs in the late 80s. Until then, I'd been using either Lisp or C exclusively and was really blown away by how simple some things were in Awk. I used it with impunity to munge all sorts of data for input into fault prediction tools I was working on. Speed was never an issue for me, so I never explored the potential improvements offered by 'awkcc'. Although perl was becoming the new hotness at that time, Awk remained my goto tool for many years.

If you are interested in learning Awk, I highly recommend "The AWK Programming Language" by Aho, Kernighan, and Weinberger. It's about the same size as the original "The C Programming Language" and is equally well-written. Previously on HN: https://news.ycombinator.com/item?id=13451454

6 comments

>If you are interested in learning Awk, I highly recommend "The AWK Programming Language" by Aho, Kernighan, and Weinberger. It's about the same size as the original "The C Programming Language" and is equally well-written.

I'm a big fan of small utilities :) - as I sometimes say in my email sig; but more importantly, I'm a big fan of Kernighan et al, where by "et al" I mean the others from the core early Unix days, such as Dennis Ritchie, Rob Pike, Ken Thompson and many unnamed others, from whom I (and tons of others) learned about the Unix command-line (tools), the shell (scripting), and the Unix philosophy [1].

Had written this just a few weeks ago on HN, in the thread titled "Technical Writing: Learning from Kernighan", but worth repeating here in the context of this thread:

https://news.ycombinator.com/item?id=17163276

It's a list of his books. I guess many may not know of some of them - I know I didn't.

[1]:

The Unix Philosophy in One Lesson:

http://www.catb.org/esr/writings/taoup/html/ch01s07.html

Attitude Matters Too:

http://www.catb.org/esr/writings/taoup/html/ch01s09.html

As to one sentence in particular in ch01s09:

"If someone has already solved a problem once, don't let pride or politics suck you into solving it a second time rather than re-using."

I'd strongly argue it's overzealous. As much as I agree "reinventing the wheel" is dangerous, tempting, and can quickly spiral to yak shaving, but Unix itself, and all the good it brought, is a prime example of "solving [a problem] a second time" after Multics.

In other words, I'd restate it in Sage Speak™: "Don't do this. Except when you need to." ;P Or, just want to have fun :P

>But Unix itself, and all the good it brought, is a prime example of "solving [a problem] a second time" after Multics.

Not sure about that. I mean, I know it came after Multics and was inspired by it (due to some of the early Unix people having worked on Multics), including that the name was originally Unics (I've heard, as a word play on Multics, because it was originally written by one person or was originally a single-user OS, maybe), but I am not so sure that all the good it brought was from Multics. Likely Unix brought some new stuff too. Others who know better may be able to say more.

>In other words, I'd restate it in Sage Speak™: "Don't do this. Except when you need to." ;P Or, just want to have fun :P

Good one. A bit Zennish :) Check out one of ESR's other compilations, the Unix Koans of Master Foo, if not seen already ...

http://www.catb.org/esr/writings/unixkoans/introduction.html

Wasn't meaning that the good was from Multics. Just that Unix was after Multics, "solving [a similar problem] a second time". In fact, that Unix brought extra good doing this, actually strenghtens the thesis that reinventing the wheel may bear good fruit :)
Got you now, misunderstood earlier, sorry :)
I sort of agree. I didn't quote those sections in the sense of recommending that the advice in them be followed strictly, and to the letter. Also, ESR is known to talk a bit that way - sort of overzealous, as you put it. But that is part of the fun of reading his stuff. As long as one takes it with a pinch of salt, and common sense, it is okay, and one usually gets to learn something from his writings.
The awk book is a lot of fun. I just finished reading it after seeing it recommended here a few weeks ago. The highlights for me:

1. A simple interpreter for an awk-like language called qawk. qawk is like awk except that it allows for querying by field name rather than field number. For instance, it allows doing

  { print $country, $population, $capital }
instead of the more cryptic

  { print $1, $3, $5 }
2. An awk program that takes another awk program (in their example, a sorting algorithm) and outputs a version of that program modified to include profiling statements and an END section that outputs the results of those profiling statements to some file; then, another awk program that reads the data in that file and inserts that data back into the original awk program, thereby approximating where the hotspots are.

There's a lot more in the book besides these, but to me these are the coolest programs because they are the awk-iest, by which I mean that they loop over lines of input, split the fields of those lines, and then manipulate the fields. Some of the programs in the book don't do this; instead, they consist of a single large BEGIN block with typical for-loops, arrays, etc. Used in this way, awk is just yet another dynamic language.

Google doesn't know about qawk. A more general toolkit that does the same thing and (presumably) more:

http://github.com/dkogan/vnlog

Thank you, I remember wanting to follow up on these more abstract constructions in the book. They seemed to be leading me somewhere amazing and very computer science-y. Programs that take programs as input and generate new code to do that thing I wanted with some data files — I’m sure this will be useful if I put the time in.

Am I right that qawk was included as a program in the text? Did they ever follow up with further uses?

Yeah, the code is all in the book, and it works! Here's the main body of the qawk interpreter:

  BEGIN { readrel("relfile") }
  /./ { doquery($0) }
where

- relfile is a file containing the field attributes used in various database files,

- readrel is a function that parses the relfile and stores the fields in a dictionary, and

- doquery is a function that takes a qawk query, converts it to an awk query by replacing the field names with their corresponding field numbers, and then executes the awk command.

The whole thing runs about 60 lines.

Perl was basically written because Larry Wall found Awk's syntax to be a little too cryptic. In the language design business this is what we refer to as baby steps.

Also, Awk isn't great for making reports, which is why Perl 5 to this day has an awkward report creation system[1] that looks like some COBOL refugee instead of idiomatic perl code.

[1] https://perldoc.perl.org/perlform.html

From the link: "The lone dot that ends a format can also prematurely end a mail message passing through a misconfigured Internet mailer (and based on experience, such misconfiguration is the rule, not the exception). So when sending format code through mail, you should indent it so that the format-ending dot is not on the left margin; this will prevent SMTP cutoff."

If you're of a certain age and read that and grimace as you immediately understand why this would happen, it does rather put into perspective the misery of dealing with, say, Webpack configuration.

> From the link: "The lone dot that ends a format can also prematurely end a mail message passing through a misconfigured Internet mailer (and based on experience, such misconfiguration is the rule, not the exception). So when sending format code through mail, you should indent it so that the format-ending dot is not on the left margin; this will prevent SMTP cutoff."

Did email clients not handle "dot stuffing" back then? That is, if a line begins with a single dot, the client would automatically insert another dot right before it. Then, at the receiving end, the client would remove the extra dot at the beginning of the line.

> Perl was basically written because Larry Wall found Awk's syntax to be a little too cryptic.

That seems ridiculous; where is it substantiated?

When Wall posted Perl to comp.sources.unix for the first time, he wrote "If you have a problem that would ordinarily use sed or awk or sh, but it exceeds their capabilities or must run a little faster, and you don't want to write the silly thing in C, then perl may be for you."

Or rather, not Larry Wall, but the apparent newsgroup moderator added that text, lifting it from the Perl manual page.

Thus he was pitching it as something that performs faster than awk and sed, with a greater range of capabilities.

On pp. 381-382, my copy of Programming Perl (1992 printing of first edition) says he was trying to build a configuration management system for 6 Vax and 6 Sun machines, and he needed to solve some problems like file replication across a 1200 baud link and approvals. So he installed B-news, the Usenet news software at the time. Then he was asked to generate some reports and:

> News was maintained in separate files on a master machine, with lots of cross references between files. Larry's first thought was "Let's use awk." Unfortunately, awk couldn't handle opening and closing of multiple files based on information in the files. Larry didn't want to have to code a special-purpose tool. As a result, a new language was born.

So that's why it's the Practical Extraction and Reporting Language. He wanted to extract data from files and generate reports.

That was a little tongue in cheek but there is a comment in the Camel book about how he wrote perl because he was scared of Awk's parser.
Could have been a joke. His writing (and talks, like the "State of" ones, is full of them, some of them a bit subtle too :)
Ugh I’d forgotten about Perl reports. Gave up on that circa 2001 when I discovered python!
I found awk after using perl (four was the new hotness) Before long I was trying to figure out just what in my daily work perl was suppose to be protecting me from
Does anyone out there still use format()? I've never used it in the my entire 18 years experience programming Perl.
>Also, Awk isn't great for making reports

Why?

IIRC there's no method of putting headers or footers on pages, nor page number without manually counting the lines yourself.

I could be wrong though. Awk is one of those tools like vi where you can use it for years and still be discovering new features.

>IIRC there's no method of putting headers or footers on pages, nor page number without manually counting the lines yourself.

Good point. The BEGIN and END patterns do work for global headers and footers, totals, etc., but not for per-page stuff. You can do it yourself with some extra awk code, but yes, you have to write it. Not difficult, though. I guess it was not designed to be a Crystal Reports-like reporting tool, with report bands and what not.

>I could be wrong though. Awk is one of those tools like vi where you can use it for years and still be discovering new features.

Agreed :) Not only new features, even new uses for existing features, because, although it is a sort of DSL or little language (but a programmable one), the area it is applicable to, pattern matching and data processing of many kinds, is vast.

As many others pointed out in this thread, the fact that the reading of input is built-in to it (whether from standard input or files given as command-line arguments), saves you a bit of boiler-plate code each time (cumulatively) you write an awk program using that feature. So does the pattern-action model, with those two defaults for missing pattern or action (match all lines, or print). And again as others have said, Perl, Ruby, etc. have that feature too (the first one).

Use the `pr' utility to paginate your output.
> Aho, Kernighan, and Weinberger

That must be where the name comes from, right?

Aho taught the programming languages course at my Uni, and he loved to tell stories about Bell Labs, and this Brian guy in particular. ‘Let me tell you, I’d walk into Brian’s office and say, Brian, you really messed that language up...’ The entire semester I was like who the f is this Brian guy? Years later I was like oh shit, Brian is Brian Kernighan, and the language is C! And I realized I missed a freaking awesome opportunity to ask Al Aho about some serious heads in CS.
Yes.
ha! TIL
Note that's the same Kernighan of "Kernighan & Ritchie" fame. Very influential feller.
Yeah, that I knew, also Aho from dragon book :)
and the same aho of dragon book ('compilers') fame.
It's fitting. Their names, and it's also awkward AF to write. The only better example is probably Brainfuck.
I've heard about _The Awk Programming Language_ but could get my hands only on _The GNU Awk User's Guide_ by Arnold Robbins, since it's available as Info pages on my Linux box. It's also free online

https://www.gnu.org/software/gawk/manual/gawk.html

It's a pretty good book which teaches you both techniques and the nitty-gritty. I recommend this.

> I've heard about _The Awk Programming Language_ but could get my hands only...

it is available from internet archive, so i guess a legit copy.

Arnold Robbins' book is great, I really liked it and I really recommend it for those interested to learn awk.
I'm not a coder and awk was my first more serious or more complete effort to learn a programming language. What I did notice afterwards was that the syntax and semantics of C also became a lot more clearer to me.

I think Kernighan also stated somewhere that awk was (also) designed as a helper-tool to learn C.

All in all, I think it's a great first language, even if you're initially more compelled by Lisp family languages. If you have time to learn only one language, then awk is not a bad choice; It'll open many doors in the Unix/KISS-world.

I can also state that "The Awk programming language" is, among other things, an excellent introduction to computer science or "the way programmers think" in general. A remarkably well written book for general audience.