|
|
|
|
|
by madmax96
2705 days ago
|
|
I agree with the sentiment, but my critique applies so generally that it must be noted: if a command accepts a filename as a parameter, you should absolutely pass it as a parameter rather than `cat` it over stdin. For example, you can write this pipeline as: grep '^x' foo.txt \
| sed 's/a/b/g' \
| awk '{print $2}' \
| wc -l > bar.txt
This is by no means scientific, but I've got a LaTeX document open right now. A quick `time` says: $ time grep 'what' AoC.tex
real 0m0.045s
user 0m0.000s
sys 0m0.000s
$ time cat AoC.tex | grep what
real 0m0.092s
user 0m0.000s
sys 0m0.047s
Anecdotally, I've witnessed small pipelines that absolutely make sense totally thrash a system because of inappropriate uses of `cat`. When you `cat` a file, the OS must (1) `fork` and `exec`, (2) copy the file to `cat`'s memory, (3) copy the contents of `cat`'s memory to the pipe, and (4) copy the contents of the pipe to `grep`'s memory. That's a whole lot of copying for large files -- especially when the first command grep in the sequence usually performs some major kind of reduction on the input data! |
|
That said, I suspect the example would be much faster if you didn't use the pipeline, because a single tool could do it all (I'm leaving in the substitution and column print that are actually unused in the result):