Hacker News new | ask | show | jobs
by madmax96 2705 days ago
I agree with the sentiment, but my critique applies so generally that it must be noted: if a command accepts a filename as a parameter, you should absolutely pass it as a parameter rather than `cat` it over stdin.

For example, you can write this pipeline as:

    grep '^x' foo.txt \
        | sed 's/a/b/g' \
        | awk '{print $2}' \
        | wc -l > bar.txt
This is by no means scientific, but I've got a LaTeX document open right now. A quick `time` says:

    $ time grep 'what' AoC.tex
    real    0m0.045s
    user    0m0.000s
    sys     0m0.000s

    $ time cat AoC.tex | grep what
    real    0m0.092s
    user    0m0.000s
    sys     0m0.047s
Anecdotally, I've witnessed small pipelines that absolutely make sense totally thrash a system because of inappropriate uses of `cat`. When you `cat` a file, the OS must (1) `fork` and `exec`, (2) copy the file to `cat`'s memory, (3) copy the contents of `cat`'s memory to the pipe, and (4) copy the contents of the pipe to `grep`'s memory. That's a whole lot of copying for large files -- especially when the first command grep in the sequence usually performs some major kind of reduction on the input data!
1 comments

In my opinion, it's perfectly fine either way unless you're worried about performance. I personally tend to try to use the more performant option when there's a choice, but a lot of times it just doesn't matter.

That said, I suspect the example would be much faster if you didn't use the pipeline, because a single tool could do it all (I'm leaving in the substitution and column print that are actually unused in the result):

    awk '/^x/{gsub("a","b");print $2; count++}END{print NR}' foo.txt