I've never used GNU Parallel. But could someone explain to me the value add vs GNU xargs -P/--max-procs? From the examples at the top, it seems like those could be achieved with xargs.
The value add is that you don't need to do the `xargs --max-procs N`, yourself. By default N is 1 for xargs. For parallel, the default is N = number of CPUs.
Additionally, you can run a series of unrelated commands that aren't from a list/piped in with parallel using the `--` syntax:
`parallel -j 3 -- ls df "echo hi"`
You can limit system load using parallel, which as far as I know isn't possible with xargs: `parallel -l L` where L is the average system load you want to remain beneath.
A couple months ago, I parallelized execution of thousands of slow batch jobs on a fleet of remote servers. With parallel that was one command, including estimated time to completion and retries for failed jobs. It was awfully nice not to need to install or setup anything or spend time coding built-in features. Once it was done, I will almost certainly never run that exact operation again.
I normally use xargs for simple things and if it’s a regular business operation I’d setup a task queue but there’s a fair amount of work in the middle where it’s nice to have a solid tool with most of the features you could want built in and tested.
It has some more granular control over "pasting" in values. For example, you can use {} for the arg value itself, or you can use {.} for just what's before the extension, or you can use {/} for the basename, or {/.} for the basename without extension, etc. You can also get progress bar, ETA, etc.
I use both. xargs does a simple job reasonably well; if I'm just typing on the command line it often is the tool I use. parallel has many, many more options and ways to turn output from script A into parallel invocations of script B in multiple machines. Parallel is also handy just for parsing filenames; it's become my default tool for manipulating a stream of filenames and then running commands on them. The parallel part is just a nice plus.
I'll say that field separation / null termination is a bit annoying for xargs/find etc-but more so perhaps for novice users of shell. I do like shell pipelines, but quoting can be nearly.
To get an exhaustive answer you really need to go through the command line options as described in its man pages. xargs is good for simple stuff as long as you avoid some gotchas,but this really does a whole lot more. A lot more than anyone can do justice in a comment. AWK with gnu parallel is surprisingly potent combination.
Additionally, you can run a series of unrelated commands that aren't from a list/piped in with parallel using the `--` syntax:
`parallel -j 3 -- ls df "echo hi"`
You can limit system load using parallel, which as far as I know isn't possible with xargs: `parallel -l L` where L is the average system load you want to remain beneath.