|
|
|
|
|
by caro11ne
1528 days ago
|
|
So you knew of this limitation, but failed to mention it?! Wow. Just wow. I thought you were trying to show a general way to run jobs in parallel in a safe, reliable way that was faster than GNU Parallel. You failed to do that. Instead you showed that it is possible to run jobs faster than GNU Parallel, but in a way that is neither safe nor reliable. (Or more correctly: If you know the exact limitations of the kernel of the OS you are currently using, it will be reliable in certain situations - but not in general). I will pick reliable results over speed any day, thank you. I hope the person, who mentioned safe parallel grep, will show how it is done: https://news.ycombinator.com/item?id=30891634 because I will definitely not be using your solution. |
|
Your link to atomic writes and your bug addition seemed pretty targeted/informed. The GNU grep --line-buffer was also in plain sight in my benchmark. Presumably one must know pipes to know when GNU parallel's own --line-buffer is helpful. So, your "wow" outrage seems fake/off point and this GNU parallel sales pitch of "background knowledge free lunch" seems more false.
You also seem to have missed (twice!) the main thrust that, unless I am missing some other --go-fast flag, GNU parallel is so slow on this common task that the easy serial method is much faster even with 16 cores given to GNU parallel. GNU parallel would have to be over 3X faster for your correctness concerns to even matter relative to serial xargs. People don't usually use parallelism to slow things down - unless maybe they blindly use GNU parallel.
For the curious, perf ratios are actually even worse for the high volume "grep t" example made safe (21.8X slower rather than 19.4X slower - on my test machine). xargs --process-var-slot (around since 2010) is enough of a hint for anyone actually curious and there is real value to having someone solve that little puzzle themselves for their own use cases. Doing all your homework for you can take something away. If you are too confused and a paying customer of Ole's, have him update his docs to be less unfair to xargs. (Also no need to link to my own posts and refer to them so generically. I only have one account, as per HN guidelines which, as a brand new account, you should maybe familiarize yourself with. [1])
As to why GNU parallel is so slow - Dunno. Took just as long with that -u flag to allow mixing output. Perl/Python programs are often 20..500X slower than compiled to native code. Python at least has like 10 ways to compile it. Curious if it was only code search, I tried that x10000000 example in the xargs comparison doc and got only 1.2X scale-up over serial xargs with 8 whole cores which also seems really slow/bad. So, GNU parallel slowness seems like probably a common problem.
This isn't just "complaining". I have already highlighted 4 risks (no Perl, invisible /tmp filling default, non-drop-in xargs, slower than serial). The oddball nagware license creates at least a 5th/6th legal/financial risk. Not sounding so safe to me. Being giant with many features generically creates more "accidental attack surface". So, there may be many more buried in GNU parallel. Different tools, different risks.
[1] https://news.ycombinator.com/newsguidelines.html