Hacker News new | ask | show | jobs
by ChrisRackauckas 3050 days ago
You weren't doing the same thing. Julia's `pmap` and `@parallel` are multiprocessing. These will parallelize across multiple computers, like multiple nodes of a cluster. It has much larger scaling potential (it's more like MPI) but at the cost of a larger overhead (like MPI). It for example was used to achieve >1 petaflops in the Celeste.jl application on the Cori supercomputer.

Cython's parallelism is via OpenMP which is shared memory multithreading. Of course multithreading is faster, but it's restricted to a single computer. Julia does have multithreading as well via `Threads.@threads`. This is shared memory and restricted to a single computer just like Cython, and will have a lot lower overhead than `pmap` and `@parallel`. If you want to directly compare something to Cython's parallelism, this is what you should be looking at.

On a side note, it looks like Cython doesn't have any native multiprocessing or multinode parallelism that would be the direct comparison to `pmap` or `@parallel`.

1 comments

Thanks! From the documentation and all the threads I searched in discourse, it was never clear to me that @parallel and pmap where aiming towards the direction you just described.

I did try Threads.@threads, but the overhead was way too high. I might look into it again soon.

Threads.@threads is still consider a bit experimental and there are a few performance pitfalls that one can stumble into. I talked about this for a bit in http://slides.com/valentinchuravy/julia-parallelism, but if you still have issues after reading that feel free to reach out on https://discourse.julialang.org and we will figure out what is going on.
Very technical presentation, but it contains nuggets of that I can't wait to try once I get home! Do you have a blog that lays this out in a way that is aimed at the general Julia programmer?
Working on it :) I will probably announce it on twitter (@vchuravy) once I managed to find time to finish it.
unfortunately slides.com is blocked at work....
The problem isn't overhead but that there's a performance bug that one can easily hit with multithreading right now, which is why it's labelled experimental. When that's fixed hopefully you'll be happy :). A function barrier fixes it, but it's a little nasty. This is probably the bug I want fixed most, but since it's not syntax breaking it's a slated for v1.x and not v1.0.