Any arguments specified on the command line are given
to utility upon each invocation, followed by some
number of the arguments read from the standard input
of xargs. The utility is repeatedly executed
until standard input is exhausted.
In other words, if the argument list is too long, xargs will chunk the input and call the program repeatedly. This means that on very large projects, 'wc -l' will be called several times on subsets of the files, and you will get an incorrect total listed at the bottom.
putting a 'cat' in between fixes that. 'cat a b; cat c d' is equivalent to 'cat a b c d', so the chunking doesn't matter. And wc -l can just read from stdin without worrying about how many files there are.
Continuing on this off-topic, this construct is not optimal and can possibly break if one of the filenames has some funky characters (spaces, line breaks, carriage returns, unprintable characters, ...). This happens (unfortunately) more often than you think, and it's a very good idea to learn how to used these commands defensively.
Basically you have two choices, depending on the Unix you're using:
- find ... -print0 | xargs -0 wc -l
- find ... -exec wc -l {} +
The second one is defined by POSIX and, as far as I know, works on every Unix except for OpenBSD (who only implemented this feature starting version 5.1 in 2012).
The first one is non-POSIX so several Unices do not implement -print0.
To the best of my knowledge, GNU find should work fine with both.
putting a 'cat' in between fixes that. 'cat a b; cat c d' is equivalent to 'cat a b c d', so the chunking doesn't matter. And wc -l can just read from stdin without worrying about how many files there are.