Hacker News new | ask | show | jobs
by phkahler 1744 days ago
I never use Awk until last year. I wanted to monitor an embedded device with little more than bustbox and python on it. There was quite a bit of information in the log files (I had already written a custom log file viewer with some highlighting) but I wanted to monitor in real-time. Somehow I decided to use Awk to monitor the tail of the log file and do realtime bar-graphs by generating appropriate cursor control sequences. In the end I had about 50 lines of Awk to upload to the board and run a command to pipe the log into it - very minimally invasive and very informative.

Would recommend learning Awk with some kind of real-world use of your own. BTW it reminded me of using XSLT which I think is another often overlooked "good thing".

1 comments

The biggest reason to learn AWK, IMO, is that it's on pretty much every single linux distribution.

You might not have perl or python. You WILL have AWK. Only the most minimal of minimal linux systems will exclude it. Even busybox includes awk. That's how essential it's viewed.

Something fun in that regard, speaking of minimal...the TRS-80 Color Computer community now has a version of awk that runs on NitrOS-9, a variant of OS-9/6809 originally written for the Hitachi 6309. (64K address space, no separate I and D space.)
I'm curious what linux distros don't have either some version of perl or python.

I like awk, mind, but this is not necessarily (IME) a good argument for it.

The POSIX specification includes awk, but not perl or python. The world of UNIX and UNIX-likes is larger than just Linux distributions. Depending on the utility you plan on building and the platforms you expect it to run on, it may be wiser to reach for awk than other PLs.
Modern BSDs, macOS, and Solaris certainly have Perl and Python. (iOS and Android don't, but they don't have awk either.) What other Unixes are you thinking of? AIX, HP/UX, IRIX, UnixWare, etc. should be considered retrocomputing at this point and not relevant to modern compatibility discussions.

Linux distros based on busybox, as mentioned elsewhere in this thread, are a more compelling reason for considering awk than considerations involving other Unixes.

When it comes to Python on macOS, the only version that’s installed by default is the deprecated copy of Python 2.7 that’s slated to be removed in the future. For Python 3, you need to install the developer tools. (/usr/bin/python3 ships with the OS, but is just a stub that runs the developer tools version if installed or prompts you to install it otherwise).

It’s not hard to install, but it’s not guaranteed to already be installed on every system.

You can install python and perl on BSDs, but its different than awk, where its part of the core OS and guaranteed to be there without needing to install extra stuff.
Wasn't awk added to android in 9?
> The world of UNIX and UNIX-likes is larger than just Linux distributions

The post I was replying to specifically said Linux distros.

Anything busybox-based. I'm not sure busybox awk is very complete, either.
> I'm curious what linux distros don't have either some version of perl or python.

I imagine that DamnSmallLinux or TinyCoreLinux possibly don't have them by default. Their focus is to be as small as possible in order to download quickly and fit in a USB drive or CD. Their small size was more important back when speeds were slower and drives were smaller. They were also good for when you had a limited number of storage options and you wanted the running OS to fit completely in RAM (back when RAM was smaller).

I don't think I ever ran TinyCore without immediately connecting it to the Internet to grab a bunch of packages. Puppy Linux included Perl in its base install at one time (I don't know if it still does), and Damn Small Linux was supposed to have a cut-down version of Perl included as well.
Python definitely not, though.
Yeah, but if you are happy to program in Perl, that's basically every major Linux distro covered. Anything using DEB or RPM packaging, any machine with Git installed (which includes Windows), plus the ones I already mentioned, already have access to Perl. This is a formidable installed base with no effort needed to install a runtime.
If you’re using DamnSmallLinux etc I’d imagine you can package your own awk quite easily! Perl would require a lot more packages. But all you need to do is copy a couple binaries right?
Haven't used these distros since a decade or so ago.

Not sure why I'd have to package awk. Busybox's is probably sufficient for most uses, if the need ever arised, which I don't think it normally does when using these distros.

Agreed. Not having enough space for awk would be daaaaamn small indeed.
The better question might be "which Linux distro's don't have perl or python installed by default" as a lot of people are working on systems where they can't just add additional packages.

Perl has been getting cut from minimal builds of distro's for a while. Default installed version of python is a bit of a crap-shoot, nevermind which modules you might happen to have available.

You'll find this a lot in the embedded space. As well, you'll see a bunch of docker images that don't have perl/python.
Building a Docker image gives basically full freedom over the choice of a runtime. If your Dockerized application is written in Java or Python or PHP or C#, why not just write the tooling and scripts in the same language too? Or at least install a suitable runtime just for the scripts? Or if starting from an empty container, why not build the script into a statically-linked binary to be placed next to the application?
Typically, you want docker images as slim as possible. Both to make it faster to distribute and to prevent attacks if something escapes your application. The less in the image, the less exploitable your image is.

Beyond keeping the images slim, the times I'd reach for awk when dealing with a docker container would be when I'm debugging problems within that container. I might need to do some quick text parsing or finagling in order to troubleshoot why the application is sucking.

I'd rather not need to upload a Java script into my docker container just for quick troubleshooting.

I agree on the slimness of Docker images, but if you e.g. have some kind of video or photo CMS written in PHP, then any housekeeping or export scripts etc are better off being written in PHP as well (or even integrated into the application) given how close they're already bound with the rest of the application.

For anything beyond that, I would very greatly prefer to have "black box", extremely verbose log dumps and database dumps that I could analyse over at my actual dev machine, or a good debugger that lets me step through the code to figure out what's going wrong.

I do realise that not all languages have good tooling, or that some people prefer to use `printf` style debugging, so it may not apply to all.

A nice thing about awk vs. Perl/Python: there's a small focused set of things to learn. Once you learn them you're done.

This suggests an opening for a Perl/Python intro focused on the exact same tasks, admittedly. That seems more realistic for Perl -- unless there's someone who writes Python one-liners at the shell?

I don't think true python "one liners" are a thing, but the awkward thing about awk is sits in this place where what you are doing is complicated enough you need awk, but simple enough you need a one liner? Those cases have been exceedingly few and far between for me enough that every time I want to reach for awk I have to go lookup how to do anything more complex than printing fields. That completely defeats the point of the quick one liner.

May as well open up vim, write my 7 lines of python, and run it. Because I use it everyday and didn't have to look anything up it ends up far faster. Then when I am done I either delete it, throw it in a scripts directory, or make it part of some existing infrastructure repo. Now if I keep it because I used python it is much more readable than the awk 1 liner would have been.

I have tried in earnest to memorize awk's idiosyncrasies multiple times now. By the time I go to use what I learned the last time it is months later and I have forgot enough I need to go look stuff up.

So in a way, here I am: The guy that writes "one liners" in python.

I think that is a good point, that often writing a short python script is usually the best solution.

I use awk (and python) daily at work. I work with a lot of flat files, and I use awk when I am doing data quality checks. One of the "sweet spots" it hits for me is when I need to group data by value, or other relatively simple aggregations.

Yeah, it's a different world from when I learned Awk. You might enjoy the (very short) book by the creators just because it's a great focused expression of the Unix way. But nobody needs to learn it.
Perl is sometimes "better awk".
perl actually has a one-liner way of invocation that's modeled after awk.

For example, to print the first field of a line work the default delimiter could be accomplished in perl by running:

    perl -ane 'print $F[0], "\n"'
Where $F[0] is the equivalent of $1 in awk.
IMO, unless you're doing embedded work or building minimal containers, you'll pretty much always have access to a decent runtime (or several).

Python: almost every conventional server. Python dependencies are so ubiquitous that you aren't likely to find a Linux install without it.

Perl: every DEB and RPM machine, and anything with Git installed. You can't really escape it, unless you're embedded.

PowerShell (yeah, I know): every Windows machine from XP onwards (though usable only from 7 onwards), and some Linux computers if installed.

Java: lots and lots of places will have this available.

Dockerized runtime of your choice: not ubiquitous, but I expect more and more developer machines and servers to gain Docker or Docker-like container support.

There really isn't any reason to stick to AWK, unless you're working directly on embedded devices or just like using it.