Hacker News new | ask | show | jobs
by barrkel 651 days ago
If you have a lot of files, consider find piped to xargs with -P for parallelism and -n to limit the number of files per parallel invocation.

Only a tiny bit more complex but often an order of magnitude faster with today's CPUs.

Use -print0 on find with -0 on xargs to handle spaces in filenames correctly.

GNU parallel is another step up, but xargs is generally always to hand.

2 comments

Thanks! Gippity did suggest the xargs approach as an alternative, but I found that

find [...] - exec [...] {} +

as opposed to

find [...] - exec [...] {} \;

worked fine and was performant enough for my use-case. An example command was

find . -type f -name "*.html" -exec sed -i '' -e 's/\.\.\/\.\.\/\.\.\//\.\.\/\.\.\/\.\.\/source\//g' {} +

which took about 20s to run

One can express your sed in less Leaning Toothpick Syndrome[1] via:

  find . -type f -name "*.html" -exec sed -i '' -e 's|\.\./\.\./\.\./|../../../source/|g' {} +
Using "/" as the delineation character for "s" patterns that include "/" drives me batshit - almost as much as scripts that use the doublequote for strings that contain no variables but also contain doublequotes (looking at you, json literals in awscli examples)

If your sed is GNU, or otherwise sane, one can also `sed -Ee` and then use `s|\Q../../../|` getting rid of almost every escape character. I got you half way there because one need not escape the "." in the replacement pattern because "." isn't a meta character in the replacement space - what would that even mean?

1: https://en.wikipedia.org/wiki/Leaning_toothpick_syndrome

Parallel is nice when doing music conversion with ffmpeg.