Hacker News new | ask | show | jobs
by sbi 4174 days ago
Probably not, but in some cases sed and AWK might be faster. I am a big fan of AWK. It is limited in some ways that make it impractical to use for really serious programs, but it is very expressive. Look at [1, 2].

[1]: http://c2.com/doc/expense/

[2]: http://www.pement.org/awk/awk1line.txt

1 comments

I have tested converting sed/awk lines to perl in a few base scripts that worked on a fairly large amount of data. Oddly enough, in every case, perl 5.18 performed at LEAST 1.5x faster, sometimes as much as 3.5x faster. Obviously anecdotal evidence, but recent versions of Perl seemed to have gained some good speed.
I've had a similar experience, with Perl up to about 7x faster. I had sed in a few data-mangling pipelines because I assumed simpler=faster, but replacing it with Perl was either a wash or a speedup in every case. This with the versions of perl and sed in Debian (looks like it's GNU sed), so ymmv with other seds.

The case where I saw a 7x speedup was doing many-times-per-line, fixed-string search/replace on a file consisting of very long lines (an SQL dump where some lines had >1m characters). Perl was IO-bound (so presumably would've been even faster if I'd had better disks), while sed was CPU-bound at a pretty low fraction of the possible IO performance.

Could have to do with charset handling.

For the really complex stuff, there's rejit[0]. I wonder if LuaJIT would work; these tools also need IO tricks to be fast.

[0] https://lwn.net/Articles/589009/

Rafe Colburn from Etsy wrote about this performance oddity: http://rc3.org/2014/08/28/surprisingly-perl-outperforms-sed-...

EDIT to manage expectations: the article doesn't explain why, it just provides benchmarks and one commenter made a suggestion about character handling. More insight still welcome :)

There are many different versions of awk: gawk, (BSD) nawk, mawk, etc. I think OS X uses nawk, but mawk is reputedly faster. Gawk is definitely slower than both mawk. I'm not surprised that Perl is faster though.
> I used the default OS X versions of these tools. The versions were Perl 5.16.2, Awk 20070501, and some version of BSD sed from 2005