Hacker News new | ask | show | jobs
by imglorp 4913 days ago
That whole "find -ls | awk" is wicked slow anyway; try wc and xargs...

  $ time find -ls | awk '{s += $7} END {print s}'
  15970582120

  real	0m27.721s
  user	0m1.256s
  sys	0m1.780s

  $ time find | xargs wc -c 2> /dev/null | tail -1
  604260969 total

  real	0m0.332s
  user	0m0.068s
  sys	0m0.204s
3 comments

The standard disclaimer on find | xargs: you should use -print0 and -0 to avoid problems with files with whitespace in their names, i.e.

   $ find -print0 | xargs -0 wc -c 2> /dev/null | tail -1
(Also, many uses of find | xargs can be replaced with -exec cmd {} \; or -exec cmd {} +, e.g.

  $ find -exec wc -c {} + 2> /dev/null | tail -1
although this isn't much faster in this case.)
You sure that's not because of memory swapping? Once warmed up the awk command is much faster for me.

Also, the results are different - though I'm too lazy to figure out why right now :)

You need to filter out directory entries.

    find -type f -ls|awk '{s += $7} END {print s}'
    find -type f -print0 | xargs -0 wc -c | tail -1
    find -type f -exec wc -c {} + | tail -1
I am not familiar with awk, what's the 's+= $7'? What is the argument 2 passed to wc? Why is the produced output different? What am I missing here?
s += $7 means "add the content of the 7th column to the total"
Makes sense, thanks!