| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by caro11ne 1529 days ago

I find this scary:

    $ export LC_ALL=C
    $ $tm xargs -0P1 grep $lb t < ../f      |sort |md5sum
    7.05user 9.88system 0:24.53elapsed 69%CPU (0avgtext+0avgdata 2344maxresident)k
    0inputs+0outputs (0major+4960minor)pagefaults 0swaps
    8ef2c658a70bb38438e59421231246b9  -
    $ $tm xargs -0P16 grep $lb t < ../f      |sort |md5sum
    10.16user 36.62system 0:18.30elapsed 255%CPU (0avgtext+0avgdata 2332maxresident)k
    0inputs+0outputs (0major+4980minor)pagefaults 0swaps
    c8ebf840e54ec8b5a49e159eda09e63f  -
    $ $tm parallel -X0P16 grep $lb t < ../f      |sort |md5sum
    16.97user 33.94system 0:16.36elapsed 311%CPU (0avgtext+0avgdata 51624maxresident)k
    0inputs+2069296outputs (0major+169409minor)pagefaults 0swaps
    8ef2c658a70bb38438e59421231246b9  -

It greps for lines containing t, sorts the lines and computes a hash.

Note how "xargs -P16 grep" gives the wrong answer. The output from parallel matches exactly the lines from "xargs -P1". With "-k" the lines are even in the same order (sorting removed):

    $ $tm xargs -0P1 grep $lb t < ../f      |md5sum
    7.03user 9.30system 0:16.32elapsed 100%CPU (0avgtext+0avgdata 2332maxresident)k
    0inputs+0outputs (0major+5023minor)pagefaults 0swaps
    d89b45188602c9bb08026dc2892cfa75  -
    $ $tm parallel -kX0P16 grep $lb t < ../f      |md5sum
    18.21user 36.03system 0:10.26elapsed 528%CPU (0avgtext+0avgdata 65396maxresident)k
    0inputs+2069344outputs (0major+154929minor)pagefaults 0swaps
    d89b45188602c9bb08026dc2892cfa75  -

I have not analyzed the output but I think the error is caused by the issue described here: https://mywiki.wooledge.org/BashPitfalls#Non-atomic_writes_w...

How anyone would ever use "xargs -P16 grep" is beyond me. I honestly do not care how fast I can get an answer, if I cannot trust the answer is correct.

I can see someone claimed they could build a safe parallel grep, but seemed not to do so: https://news.ycombinator.com/item?id=30890780#30913304 It would have been interesting to see.

1 comments

cb321 1529 days ago

You are just moving goalposts from "grep -l" to "grep t". The "grep -l" should be reliable by virtue of line buffering and Linux kernel source path names being shorter than PIPE_BUF (which yes, you do have to know|check - much less to know than a >5000 line man page). While I could address the moved goalposts, I already mentioned xargs --process-slot-var [1] elsethread and, in my experience, goalpost movers are never satisfied.

[1] https://unix.stackexchange.com/questions/449224/how-can-i-ge...

link

caro11ne 1528 days ago

So you knew of this limitation, but failed to mention it?!

Wow. Just wow.

I thought you were trying to show a general way to run jobs in parallel in a safe, reliable way that was faster than GNU Parallel.

You failed to do that.

Instead you showed that it is possible to run jobs faster than GNU Parallel, but in a way that is neither safe nor reliable.

(Or more correctly: If you know the exact limitations of the kernel of the OS you are currently using, it will be reliable in certain situations - but not in general).

I will pick reliable results over speed any day, thank you.

I hope the person, who mentioned safe parallel grep, will show how it is done: https://news.ycombinator.com/item?id=30891634 because I will definitely not be using your solution.

link

cb321 1527 days ago

I made no "in general" applicability claim and have, on the contrary, explicitly acknowledged assumptions & limitations various times in this thread which you write as if you have read. Mentioning every limitation is impractical. GNU parallel also doesn't work "in general" (e.g. no Perl interpreter). My concrete benchmark was safe/reliable in context - until you added bugs. Adding bugs to GNU parallel examples is also easy - I already did one by accident.

Your link to atomic writes and your bug addition seemed pretty targeted/informed. The GNU grep --line-buffer was also in plain sight in my benchmark. Presumably one must know pipes to know when GNU parallel's own --line-buffer is helpful. So, your "wow" outrage seems fake/off point and this GNU parallel sales pitch of "background knowledge free lunch" seems more false.

You also seem to have missed (twice!) the main thrust that, unless I am missing some other --go-fast flag, GNU parallel is so slow on this common task that the easy serial method is much faster even with 16 cores given to GNU parallel. GNU parallel would have to be over 3X faster for your correctness concerns to even matter relative to serial xargs. People don't usually use parallelism to slow things down - unless maybe they blindly use GNU parallel.

For the curious, perf ratios are actually even worse for the high volume "grep t" example made safe (21.8X slower rather than 19.4X slower - on my test machine). xargs --process-var-slot (around since 2010) is enough of a hint for anyone actually curious and there is real value to having someone solve that little puzzle themselves for their own use cases. Doing all your homework for you can take something away. If you are too confused and a paying customer of Ole's, have him update his docs to be less unfair to xargs. (Also no need to link to my own posts and refer to them so generically. I only have one account, as per HN guidelines which, as a brand new account, you should maybe familiarize yourself with. [1])

As to why GNU parallel is so slow - Dunno. Took just as long with that -u flag to allow mixing output. Perl/Python programs are often 20..500X slower than compiled to native code. Python at least has like 10 ways to compile it. Curious if it was only code search, I tried that x10000000 example in the xargs comparison doc and got only 1.2X scale-up over serial xargs with 8 whole cores which also seems really slow/bad. So, GNU parallel slowness seems like probably a common problem.

This isn't just "complaining". I have already highlighted 4 risks (no Perl, invisible /tmp filling default, non-drop-in xargs, slower than serial). The oddball nagware license creates at least a 5th/6th legal/financial risk. Not sounding so safe to me. Being giant with many features generically creates more "accidental attack surface". So, there may be many more buried in GNU parallel. Different tools, different risks.

[1] https://news.ycombinator.com/newsguidelines.html

link

caro11ne 1527 days ago

> You also seem to have missed (twice!) the main thrust that [GNU Parallel is slow]

I did not miss that. I did not comment on it because I agree and so does GNU Parallel.

man parallel:

    BUGS
    [...]
       Speed
           Startup

           GNU parallel is slow at starting up - around 250 ms the
           first time and 150 ms after that.

           Job startup

           Starting a job on the local machine takes around 10 ms.
           This can be a big overhead if the job takes very few ms
           to run. Often you can group small jobs together using -X
           which will make the overhead less significant. Or you
           can run multiple GNU parallels as described in EXAMPLE:
           Speeding up fast jobs.

And man parallel_alternatives:

   DIFFERENCES BETWEEN parallel-bash AND GNU Parallel
       [...]
       parallel-bash is written in pure bash. It is really fast
       (overhead of ~0.05 ms/job compared to GNU parallel's ~3
       ms/job). So if your jobs are extremely short lived, and
       you can live with the quite limited command, this may be
       useful.

And https://www.gnu.org/software/parallel/

    Over the years GNU parallel has gotten more safety features (e.g. no silent data loss if the disk runs full in the middle of a job). These features cost performance. This graph shows the relative performance between each version.

I really do not care how fast you can produce wrong output. I care how fast you can produce correct output, and I am do not care about a specialized solution that only works for one single specialized task.

I can make a specialized solution that is faster than your specialized solution:

    $ $tm true

It gives the same output as your example, and it is way faster. But do you really feel that is a fair comparison? If you say no, then by your own arguments, I can claim you are "moving the goal posts".

> For the curious, perf ratios are actually even worse for the high volume "grep t" example made safe (21.8X slower rather than 19.4X slower - on my test machine). xargs --process-var-slot (around since 2010) is enough of a hint for anyone actually curious and there is real value to having someone solve that little puzzle themselves for their own use cases. Doing all your homework for you can take something away.

Or it might just be that your solution is not safe at all, or only works on very specialized input on your system.

I have already shown that I can do a specialized version faster than your specialized version.

As long as you do not show your work, your speed claim is just that: a claim with no evidence.

What can be asserted without evidence can also be dismissed without evidence.

> I have already highlighted 4 risks (no Perl, invisible /tmp filling default, non-drop-in xargs, slower than serial)

"No Perl": I have only once used a system without Perl: It was on an embedded system, where space was a premium. If you use a package manager to install parallel, Perl will be installed for you automatically.

"Invisible filling /tmp": I really like that behaviour, because no matter how GNU Parallel is killed, there are no files to clean up. But each to his own.

"non-drop-in xargs": Your evidence here is good and I concur, though I never hit those incompatibilites myself (apart from -n1 which is what I normally want anyway).

"slower than serial": For short-lived jobs, yes (and if your jobs are short-lived and you can live with the limitations then parallel-bash seems to be faster than xargs). In general, no. Try "seq 0.1 0.1 10 | time parallel -j 50 sleep"

I had hoped your critique would show there is a better way of running jobs in parallel. So far I can only say I am disappointed.

link