Hacker News new | ask | show | jobs
by ssddanbrown 1030 days ago
Love finding a good use-case of parallel as an easy way to gain massive time savings, especially on the modern high-threaded CPUs of today. Most recently found it useful when batch-compressing large jpeg images to smaller webp files, via use with find and ImageMagick:

   find ./ -type f -iname '*.jpg' -size +1M -print0 | parallel -0 mogrify -format webp -quality 80 {}
3 comments

Xargs is a nearly drop in replacement and probably already installed by default in most distros. You may need the -n 1 (one file per) and -P to parallelize.

  xargs -n 1 -P 8
find + xargs has become my go-to "process files in parallel". Tho now I'm wondering if I should be using `-n` instead of `-L`

    #!/usr/bin/env bash
    set -e

    main() {
      if [ "$1" = "handle-file" ]; then
        shift
        handle-file "$@"
      else
        find . \
          -type f \
          -not -path '*/optimized/*' \
          -print0 \
          | xargs \
            -0 \
            -L 1 \
            -P 8 \
            -I {} \
            bash -c "cd \"$PWD\" && \"$0\" handle-file \"{}\""
      fi
    }

    handle-file() {
      echo "handle-file $1 ..."
    }

    main "$@"
Huh, I hadn't seen the -L before. Looks pretty similar.
Actually, parallel is a drop in for xargs as xargs has been around longer. Parallel has a few big improvements:

* Grouped output (prevents one process from writing output in the middle of another's output) * In-order output (task a output first, task b output second even though they ran in parallel) * Better handling of special characters * Remote execution

More here: https://www.gnu.org/software/parallel/parallel_alternatives....

You should batch compress to JPEG XL too with cjxl --lossless_jpeg=1 --quality=80 --effort=9 {} {/.}.jxl (or magick)
Any particular reason to use -print0 and pipe instead of -exec?
-exec would not be parallel, pipe to parallel makes it parallel