| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by brutos 3440 days ago

Like the sibling post to this I also have a bioinformatics background and awk is one of the most important tools in my toolkit. It is just so extremely fast. A few years ago I was laughing about a friend that did a lot of simple data transformation in awk and told him to save his time use python instead. He challenged me to do the same task in python. It took me longer to write and it was magnitudes slower. That was eye opening.

It is useful to be aware about the different awk implementations. For example mawk. That awk version is stupidly fast. GNU awk is already extremely fast, but if I work on a > 200gb file, I just prefix the m and the task will be done even faster. It however does not have all the features of gawk. The differences in regex support are the most painful.

A very useful feature of the GNU version of awk is the debugger. Super useful to be able to step line wise through a awk script.

If I ever have too much time I would like to take a look at that experimental llvm-awk (lawk) someone started and get it production ready. Sadly is seems abandoned currently.