| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Zhyl 2739 days ago

Many of my issues will be specific to the environment. The thing that made me use the first perl one-liners is because I was used to using 'rename' in the command line and I was able to emulate this with perl more easily (using a regex substitution) than 'mv'.

The second and more pivotal one liner was when I needed to create a primary key to ensure I wasn't getting any dupes later down the pipeline (a requirement that came about as some of the other contractors on the project were messing with my exports and then blaming my files for their conflicting numbers). My first approach was to use the MD5 tool included in GIT bash, but this was prohibitively slow, taking 45 mins to loop through each line of a 4mb file. Not sure if there was a performance penalty on the box I was using, but the perl one liner did the job in a couple of seconds for all the files I had.

After that it mostly became a question of maintainability, with the performance being a boosting factor. I found my perl deduping one liner was faster than either 'sort -u' or 'sort | uniq'. I was able to add new features and account for weird issues more easily. Regex was better than sed, syntax was more flexible than awk, there were more filtering options than grep.

Plus it was good fun. Perl has so many things that just feel so nice and tidy compared to bash.

Having said that, I still use Unix pipelines extensively, they just tend to be for ad hoc queries rather than for my automated reports.

1 comments

gronne 2739 days ago

All right - thanks for the elaboration. The perfomance diff sounds interesting. I might check it out at some point :)

link