|
|
|
|
|
by joehl
2517 days ago
|
|
Maybe I misunderstand, but: In R allocating and copying huge return vectors for small changes on input vectors ruins performance. My understanding of Julia was that one advantage over R is the possiblity to mutate input vectors in place. Referring to the psort! example for Julia's multithreading I wonder why "We simply need to allocate one initially-empty array per thread" which to me looks like allocating huge vectors for small changes down the call recursion (and wastes memory by the number of threads). If Julia can mutate psort!s first argument global vector 'v' from multiple threads, why can't Julia also mutate a single allocated 'temp' argument global vector? Bonus: then psort! could just merge from one argument to the other and save copying back on each recursion level (along the lines of Segewick's nocopy mergesort)? |
|
I found out myself: Julia can mutate a single temp buffer across all threads, and implementing Sedgewicks nocopy sort (with ping-pong merge) is faster, particularly when passing down preallocated buffer memory to the single threaded sorting Routine once N <= BIG_THRESHOLD. The following are measurements run with JULIA_NUM_THREADS=4 on a Intel(R) Core(TM) i5-4300M CPU @ 2.60GHz, 2594 MHz, 2 physical cores, 4 logical cores, replication code follows. Note that not only runtime of nocopy mergesort ist better, it also causes less garbage collection time. Conclusion: it would make sense for Julia's sorting interface to allow passing in preallocated buffer memory and to use nocopy mergesort behind the curtain. Of course true parallel sorting also requires a parallel merge.
Best