| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by longemen3000 2309 days ago
	In Julia, where the paralleization options are explicit (SIMD, AVX, threads or multiprocessing), it always depends on the load, for small operation (around 10000 elements) a single thread is faster only for the thread spawning time (around 1 microsecond). And there is the issue of the independent Blas threaded model, where the Blas threads sometimes interfere with Julia threads... In a nutshell, parallelization is not a magical bullet, but is a good bullet to have at your disposal anyway

2 comments

ChrisRackauckas 2309 days ago

> And there is the issue of the independent Blas threaded model, where the Blas threads sometimes interfere with Julia threads

Julia has composible multithreading, and using that model fixed composing FFTW threads with Julia's. This can be done to OpenBLAS as well, and IIRC there is a PR open for it.

link

longemen3000 2309 days ago

Yeah, I'm waiting for that PR haahah

link

The_rationalist 2309 days ago

Do you know if Julia will add OpenMP support? It's clearly the way to go for offloading to hardware in a productive way.

link

dagw 2309 days ago

Julia is actually initially had OpenMP backed parallelism (ParallelAccelerator.jl), but they're moving away from OpenMP towards a novel and native task parallelism framework more inspired by things like Cilk[0].

[0] https://julialang.org/blog/2019/07/multithreading/

link

eigenspace 2309 days ago

I don't know about "clearly the way to go". I think Julia's parallelism models have proven themselves to be very robust, performant and composeable, moreso than OpenMP as far as I'm aware.

link

The_rationalist 2309 days ago

How can I annotate an existing loop to offload it on the GPU Inclusive OR on AVX IOR on cpu cores. Without this ability, in practice I use far less parallelism.

link

dagw 2309 days ago

This is currently no official solution in Julia that I'm aware of. However there are several people working on it and a few experimental solution are under active development

https://github.com/JuliaDiffEq/AutoOffload.jl

https://juliagpu.gitlab.io/GPUifyLoops.jl/

link

ChrisRackauckas 2309 days ago

AutoOffload is something different, where it's trying to do linear algebra in a way that auto-offloads to GPUs or heterogeneous. GPUifyLoops is correct for this answer, and its next incarnation is KernelAbstractions.jl. These auto-construct GPU kernels and such from loops.

link