Hacker News new | ask | show | jobs
by aoe 5064 days ago
A little off-topic, why didn't you just do "xargs wc -l"?
3 comments

Because of this:

     Any arguments specified on the command line are given
     to utility upon each invocation, followed by some
     number of the arguments read from the standard input
     of xargs.  The utility is repeatedly executed
     until standard input is exhausted.
In other words, if the argument list is too long, xargs will chunk the input and call the program repeatedly. This means that on very large projects, 'wc -l' will be called several times on subsets of the files, and you will get an incorrect total listed at the bottom.

putting a 'cat' in between fixes that. 'cat a b; cat c d' is equivalent to 'cat a b c d', so the chunking doesn't matter. And wc -l can just read from stdin without worrying about how many files there are.

Continuing on this off-topic, this construct is not optimal and can possibly break if one of the filenames has some funky characters (spaces, line breaks, carriage returns, unprintable characters, ...). This happens (unfortunately) more often than you think, and it's a very good idea to learn how to used these commands defensively.

Basically you have two choices, depending on the Unix you're using:

- find ... -print0 | xargs -0 wc -l

- find ... -exec wc -l {} +

The second one is defined by POSIX and, as far as I know, works on every Unix except for OpenBSD (who only implemented this feature starting version 5.1 in 2012).

The first one is non-POSIX so several Unices do not implement -print0.

To the best of my knowledge, GNU find should work fine with both.

More info: http://unix.stackexchange.com/a/41745/4098

Ah! Didn't realize wc provided a total. I had used the "cat | wc" bit for scripts to just grab the number. Thank you.