Hacker News new | ask | show | jobs
by ole_tange 3588 days ago
GNU Parallel is extremely sluggish because it does all sort of different things behind your back: It buffers output on disk (so the from different jobs are not mixed and you are not limited by the amount of RAM - it will even compress the buffering if you are short on disk space), it checks if the disk is full for every job (so you do not end up with missing output), it gives every process its own process group (so the process with children can be killed reliably with --timeout and --memfree), and a lot of other stuff.

It lets you code your own replacement string (using --rpl), and lets you make composed commands with shell syntax:

    myfunc() { echo joe $*; }
    export -f myfunc
    parallel 'if [ "{}" == "a" ] ; then myfunc {} > {}; fi' ::: a b c
It does not need a special compiler, but runs on most platforms that have Perl >=5.8. Input can be larger than memory, so this:

    yes `seq 10000` | parallel true
will not cause your memory to run full.

You can read a lot more about the design in `man parallel_design` and see the evolution of overhead time per job compared to each release on: https://www.gnu.org/software/parallel/process-time-j2-1700MH...

In other words: Treat GNU Parallel as the reliable Volvo that has a lot of flexibility and will get the job done with no nasty corner case surprises.

It is no doubt possible to make a better specialized tool for situations where the overhead of a few ms per job is an issue and where you neither need brakes, seatbelts nor airbags. xargs is an example of such a tool, and you can have both GNU Parallel and xargs installed side by side.