Hacker News new | ask | show | jobs
by caro11ne 1527 days ago
> You also seem to have missed (twice!) the main thrust that [GNU Parallel is slow]

I did not miss that. I did not comment on it because I agree and so does GNU Parallel.

man parallel:

    BUGS
    [...]
       Speed
           Startup

           GNU parallel is slow at starting up - around 250 ms the
           first time and 150 ms after that.

           Job startup

           Starting a job on the local machine takes around 10 ms.
           This can be a big overhead if the job takes very few ms
           to run. Often you can group small jobs together using -X
           which will make the overhead less significant. Or you
           can run multiple GNU parallels as described in EXAMPLE:
           Speeding up fast jobs.
And man parallel_alternatives:

   DIFFERENCES BETWEEN parallel-bash AND GNU Parallel
       [...]
       parallel-bash is written in pure bash. It is really fast
       (overhead of ~0.05 ms/job compared to GNU parallel's ~3
       ms/job). So if your jobs are extremely short lived, and
       you can live with the quite limited command, this may be
       useful.
And https://www.gnu.org/software/parallel/

    Over the years GNU parallel has gotten more safety features (e.g. no silent data loss if the disk runs full in the middle of a job). These features cost performance. This graph shows the relative performance between each version.
I really do not care how fast you can produce wrong output. I care how fast you can produce correct output, and I am do not care about a specialized solution that only works for one single specialized task.

I can make a specialized solution that is faster than your specialized solution:

    $ $tm true
It gives the same output as your example, and it is way faster. But do you really feel that is a fair comparison? If you say no, then by your own arguments, I can claim you are "moving the goal posts".

> For the curious, perf ratios are actually even worse for the high volume "grep t" example made safe (21.8X slower rather than 19.4X slower - on my test machine). xargs --process-var-slot (around since 2010) is enough of a hint for anyone actually curious and there is real value to having someone solve that little puzzle themselves for their own use cases. Doing all your homework for you can take something away.

Or it might just be that your solution is not safe at all, or only works on very specialized input on your system.

I have already shown that I can do a specialized version faster than your specialized version.

As long as you do not show your work, your speed claim is just that: a claim with no evidence.

What can be asserted without evidence can also be dismissed without evidence.

> I have already highlighted 4 risks (no Perl, invisible /tmp filling default, non-drop-in xargs, slower than serial)

"No Perl": I have only once used a system without Perl: It was on an embedded system, where space was a premium. If you use a package manager to install parallel, Perl will be installed for you automatically.

"Invisible filling /tmp": I really like that behaviour, because no matter how GNU Parallel is killed, there are no files to clean up. But each to his own.

"non-drop-in xargs": Your evidence here is good and I concur, though I never hit those incompatibilites myself (apart from -n1 which is what I normally want anyway).

"slower than serial": For short-lived jobs, yes (and if your jobs are short-lived and you can live with the limitations then parallel-bash seems to be faster than xargs). In general, no. Try "seq 0.1 0.1 10 | time parallel -j 50 sleep"

I had hoped your critique would show there is a better way of running jobs in parallel. So far I can only say I am disappointed.