Hacker News new | ask | show | jobs
by SPBS 1035 days ago
xargs is more useful because it's posix so you can always guarantee it to be there (whereas with GNU Parallel you probably have to reach for a package manager to install it first). The ergonomics are worse though, as usual.
5 comments

The entirety of GNU Parallel is just one Perl program. It could be copied over and used in a pinch. The installation itself is very simple and no special dependencies or privileges are needed.
Except Perl isn't always present by default either (e.g. in Arch Linux or FreeBSD).
There are also many Linux distributions that do not install by default all the POSIX utilities, but only the minimal set that is needed to bootstrap the system.

On all such systems, it is very easy for the user to install any missing POSIX utility, but it is also easy to install any non-POSIX GNU utility.

So not even xargs is certain to exist by default on all systems.

Moreover, POSIX xargs is restricted to execute sequentially all processes.

Any use of xargs for parallel execution is non-POSIX, so in that case there is no reason to not use "parallel" instead.

    parallel --embed > parallel.sh
 
Then store that in your source repo and use it wherever shells are used!
On Debian 11.7:

   $ parallel --embed > parallel.sh
   Unknown option: embed
[edit] Ran it in Ubuntu 22.04, it does output a bash script ... which still depends on Perl.
isn't perl always installed?
It is not, at least on FreeBSD and NetBSD.
Would this taint the other code in your repo with the GPL? I'd guess it would depend on how it is distributed.
If you're running on your private build infra, it's fine. If you're pushing that repo to somewhere public, it's now GPL.
To install parallel, first run parallel --embed?
See my comment above, there's a shell version you can store in your project repository and use wherever you want with zero installation!

https://news.ycombinator.com/item?id=37208250

Indeed, xargs can be a better option, but it has trouble doing some tasks efficiently.

For example, translating a large list of IPv4 ranges into a standard format for a firewall rule-set parser:

cat ~/blacklist.p2p | parallel --ungroup --eta --jobs 20 "ipcalc {} | sed '2!d' " | grep -Ev '^(0.|255.|127.)' >> ~/blacklist_p2p_converted

Makes an annoyingly slow task tolerable, as parallel doesn't block while fetching to preserve order. We probably should rewrite this to be more efficient, but this task is run infrequently.

Happy computing =)

Last time I checked (which was a few years ago, admittedly), some popular ystem's xargs were too old to support parallelism -- Mac in particular.
This is not the case I think, xargs on mac supports parallel, and does so back to 10.9 or older
GNU Parallel has been created precisely for solving some deficiencies of xargs.

While there are cases when it makes sense to stick to what is specified by POSIX, there are also cases when the POSIX specification is so obsolete that using POSIX instead of some free ubiquitous programs is a big mistake.

Among these latter cases are writing scripts for a POSIX shell instead of writing them for bash and using xargs instead of parallel.