So, when I do linux.git$ strace -fc fd -j1 --no-ignore --hidden --exclude .git --type file --type symlink>/dev/null
strace: Process 24100 attached
strace: Process 24101 attached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ------------------
61.34 0.382389 21243 18 2 futex
26.50 0.165192 2 74307 write
4.72 0.029454 3 9754 getdents64
2.22 0.013846 2 4884 openat
2.03 0.012677 2 4886 close
2.02 0.012566 2 4884 newfstatat
0.96 0.005960 5 1002 clock_gettime
...
------ ----------- ----------- --------- --------- ------------------
100.00 0.623428 6 99900 6 total
which suggests that the fd slow down (on Linux, anyway) may be `fd` doing unbuffered writes to stdout (probably to avoid interleave/mixing of multi-threaded output) and not be about the way the file tree recursion is written. The fix is probably for `fd -j1` to do buffered output (and ideally never clone/fork). (The MThread may be unproductive in a different way than my initial futex guess.)My guess (without looking at the code - honestly Rust kind of hurts my eyeballs) is that this unbuffered mode is not even really guaranteed/sound anyway. If you pipe the output, e.g. `fd -jN|cat>out` with `N>1` somewhere and any deep hierarchy pathname output is bigger than PIPE_BUF you hit trouble. Once those pipe writes lose atomicity, threads can be put to sleep and re-awakened in various orders and wind up interleaving their pipe writes with or without userspace-level buffering. (filename max)*(open fd default limit)=255*1024 =~ 4*PIPE_BUF. So the "full path print out" may still be capable of being > PIPE_BUF and so still interleave producing non-existent pathnames (which would be so long they would need special care like chdir + chdir... + stat to even verify non-existence). You would very likely have to synthesize a file tree to trigger this failure mode, of course. I say all the above in terms of a robust tool. In terms of benchmarking walks, it is much easier to just do like `dents dstats` and use the walk to produce a depth histogram instead of mega/gigabytes of path output. If you have some burning need to be multi-threaded it is even easy to histogram in parallel and then merge at the end and the histograms should almost always fit in private L2 CPU caches. Could always be a total count instead of depth histo. AFAIK, GNU find has no `-print`-like `-count` operator to compare, but I'd bet the C internals make such not so hard to add. (Yes, you could use `-exec..+`, but this might be so slow as to make the benchmark questionable.) Parenthetically, I wonder if any stdlib in any prog.lang does the "chdir loop + stat|other" type of work for users with such pathologically long pathnames for all the file system calls. Seems unlikely, but not impossible, and easy to know when it is needed once a "string knows its length". Food for thought. |