Hacker News new | ask | show | jobs
by binary_ninja 1091 days ago
Awk has always been a language that I loved but I have struggled to use besides quick jobs for parsing text files. I understand it is meant to be use for exactly that, but the fact that is simple, fast and lightweight sometimes makes me want to do something more with it, but when I start trying to do something besides parsing text I find that it starts becoming awkward (pun intented?).
6 comments

> but the fact that is simple, fast and lightweight

I see awk as a DSL to be honest. Yes, it can be used as a general purpose language, but that quickly becomes, as you say, awkward :D

Like many DSLs, it is simple, fast and lightweight as long as it is used for it's intended purpose. Once you start using it for something else, these advantages evaporate pretty quickly, because then you have to essentially work around the DSL design to get it to do what you want.

DSL == Domain Specific Language?
Yes
One simple thing I do with awk is to create a command processor: read one line at a time and do things on my data as a response. This is very useful because you can make your command as powerful as needed and call other unix tools as a result.
Do you have an example of this that is available somewhere?
I find it pretty nice for writing simple preprocessors. For example I have one which takes anything between two marker lines and pipes it through a command (one invocation per block). Awk has an amazing pipe operator which lets you do something like this:

    ... {
        print $0 | "command"
    }
"command" is executed once, and the pipe is kept open until closed explicitly by close("command"), at which point the next invocation will execute it again. The command string itself acts as a key for the pipe file descriptor.

And of course, no mention of awk is complete without the "uniq" implementation, which beats the coreutils uniq in every way possible (by supporting arbitrary expressions as keys and not requiring sorted input):

    !a[$0]++
I had no idea about this "keep the pipe open" behaviour. I thought it would spawn the binary on every print statement and thus didn't consider it in the past. But now...
This is exactly why I moved from AWK to Perl for these quick jobs a couple of years ago. If you stick to an AWK-like subset, Perl is also simple, fast and lightweight. If you want to grow your scripts (and you have a lot of discipline) Perl – in contrast to AWK – gives you enough noose to hang^W^W^W^Wthe tools you need.
Perl? Wow. Is that better than bash, python or even nodejs? Why write in Perl over these? Serious question, was propaghandized to hate Perl.
I write bash python and nodejs all day, and have no professional history with Perl.

One day while avoiding working on something important, I spent half a day learning Perl in order to implement something related to a build tool that was being used in the important thing I was avoiding.

I was blown away. It's a really delightful language. Its big downfall is that it makes it feel good to do something "clever."

Perl is a joy to write, and a devil to read. I liked it, and wish I had started my career earlier so I could have enjoyed Perl in its heyday.

I have similar feelings about Ruby.

You need to make sure that you write the clever bits clearly. Maybe add a comment. It takes some discipline, but isn't hard.

In fact, Perl remains remarkably robust if you stack clever tricks on top of each other.

The same shortcut syntax that people complain about does make perl really handy for one-time tasks where you're iterating on ideas. Lots of features there that make that easy. One example:

  #!/usr/bin/perl
  while (<>) {
      # various processing here
      # $ARGV is set to either "-" for piped input, or the current filename
      # $_ is the data of the current line
  } 
That (<>) construct accepts data from stdin, redirection or file(s) named as arguments and iterates over the data. There's lots of things like that throughout the language.
And you can avoid even that minor boilerplate with the -n or -p flag. It even supports BEGIN and END like awk.
> Perl? Wow. Is that better than bash, python or even nodejs? Why write in Perl over these?

It depends on scale.

If you have some quick parsing to do, then awk will get you started quickly, but as you expand your experimentation on what you want to extract/manipulate, it may not be easy to add onto the awk beginnings of your "one liner".

But if you start with awk-like† syntax but invoking it with Perl, then if you find you have to expand, Perl has more elbow room.

The intention is not to 'go big', which those other languages may be better at, but to more easily 'start small'.

† IIRC, Larry Wall wanted a utility that had awk/(s)ed-like syntax for text manipulation, just 'with more'.

Have you ever tried to dig a hole? What tool did you use?

- Want to cut through and move loam, compost, sandy, and compacted soil? You're gonna want a rounded shovel.

- Want to break up rocky, clay soil? A pick mattock will penetrate deep, breaking up soil, shattering smaller rocks, and is used as a lever to uproot. A tiller is a faster method but disturbs the soil more.

- Want to dig a narrow, deep hole? An augur will quickly break up rocks and soil in a shaft and move them upwards.

What do you use the Perl tool for?

- Quickly and efficiently open files, read line by line, analyze text, and perform any kind of operation you can think of, with complex data structures, objects and modular code, using very few lines of code.

- Executing external commands with a shell, returning their output, and making complex yet short programs easily with arguments to the interpreter from a command line.

Perl can do sh/awk/sed and a bunch more at once.
Absolutely. It is comparable to python in some ways, but makes it much easier to write quick one-liners using regexes and data manipulation, and to scale those up to real programs. It fills the gap between bash scripts using awk, grep and sed, and C/java/C#. Compared to bash scripting, perl is a real programming language. The documentation and library ecosystem are excellent, backwards compatibility is legendary, yet it supports modern Unicode. The syntax is weird, but try it for a bit, read the man pages, it's not that hard. The OO system is weirder, and I wouldn't make complex class hierarchies in it, but it is usable.
I like how Awk is just a single executable. A single-executable Perl that includes only the core library would be great. There is Microperl [0, 1], but no idea how well it compiles with more up-to-date Perl versions.

0: https://github.com/bentxt/microperl-standalone

1: Original article from 2000 by the author Simon Cozens: https://www.foo.be/docs/tpj/issues/vol5_3/tpj0503-0003.html

Perl better? maybe or maybe not.

It can be very useful and they are pretty robust. I often found Perl scripts running for years and years without issues at different companies.

My main issue with Perl-scripts is that they often are not "readable" by anybody but the original creator. Which of course left the company. (not a fault of Perl itself tough)

But your millage may vary and any script can be made (un)readable.

I've always found it weird that people bash on Perl relentlessly for being hard to read and then turn around and praise Rust's syntax when it is full of stuff like this:

    fn print_d(t: &'static impl Display) {
>> My main issue with Perl-scripts is that they often are not "readable" by anybody but the original creator.

Anyone writing Perl scripts like this should not be trusted with any programming language.

Perl scripts are no less readable than bash scripts or Awk scripts. This is because so much of Perl was written to do the same work as bash, awk, sed, and the other related Unix text processing command line programs, but all under one roof.

Don't believe me? Take a look for yourself:

https://learn.perl.org/

http://blob.perl.org/books/impatient-perl/iperl.htm

Perl can also be hilariously unreadable: https://www.foo.be/docs/tpj/issues/vol4_3/tpj0403-0017.html
>> Perl can also be hilariously unreadable: https://www.foo.be/docs/tpj/issues/vol4_3/tpj0403-0017.html

Most programming languages can be obfuscated. That does not mean people write code in those programming languages like that:

C: https://www.ioccc.org/

Javascript: view-source:https://www.google.com/

The truth is that insulting Perl is considered stylish by some, so many people do despite knowing little to nothing about Perl and having never used it.

However, if you want Perl to be hilariously unreadable, why not write it in Latin:

https://metacpan.org/dist/Lingua-Romana-Perligata/view/lib/L...

Or Klingon:

https://metacpan.org/pod/Lingua::tlhInganHol::yIghun

There's a limited problem domain where it's unquestionably the best. Perl beats awk and bash at their own game on their home turf. That's the best way to put it. It's faster, has more shortcuts, less warts, more power, and more readability when well written, and while aged and not huge by modern standards, CPAN (like pypi or npm) is incredible for a hyper-powered awk and bash mash-up for those tasks at the edge of of that limited problem domain. It's installed almost everywhere, so almost always available.

That stuff is just awkward and painful in Python by comparison.

I don't write Perl code, but its CLI has been a very good way to replace sed with something decent. sed not supoorting Perl regex syntax, the most commonly kind of regex out there by large, is frankly disappointing. Even grep was able to put it together and add the -P switch. But sed is still stuck in the prehistoric syntax of ERE ("Extended Regular Expressions", as described in man pages) which e.g. instead of \d for a digit, use [[:digit:]], a syntax present in... zero? other tools or programming environments.
Better than BASH? Mostly. Better than Python, subjective as you would have to use them both yourself. I lean towards Perl as I like sigils to denote things. I have nothing against Python though. Both are typically installed as a default now. I have never used nodejs for sys admin work.
Perl is super-specialized at reporting (that's in fact the "r" in Perl). In particular there's a bunch of extremely useful implicitly defined variables that take their context from your place in a line-by-line loop through a text file.
Perl is a great language, but please listen to this old perl programmer's advice:

1. You can write totally unreadable perl. It is probably the single worst language in this regard most programmers will run into. Be careful to make your code readable.

2. Keep your amount of perl small. 200-300 lines is a good bit of it.

So for quick bang it out scripts that want to parse text etc... perl is great. For writing a major application, not so much.

One other advantage is that Perl will be found in the base install of almost any unix-like system. Python, nodejs, even bash may not.
When discussing such languages, I would like to point out that Raku is also an option.
I have found a handful of unconventional applications for awk -- I once needed a tiny pcm pulsewave generator, and awk was surprisingly decent for the job [1].

Aside from that I've mostly been using it for quick statistics [2], but it quickly moves into perl territory...

1: https://github.com/9001/asm/blob/hovudstraum/etc/bin/beeps#L...

2: https://ocv.me/doc/unix/oneliners/#965bfcb8

It's a language for creating quick alternative views from line- and column-oriented text streams. That means, take the output of another tool and represent it in a different way.
I use awk mostly for one-liners and resort to Python when I need more than a few lines of code.