Hacker News new | ask | show | jobs
by hatmatrix 105 days ago
Do you have an idea whether these are specific types of problems that is giving Julia poorer performance? From what I recall, people were reporting better speeds with Julia than with Numba (e.g., [1]). My impression was that you are basically able to bring more of your code to LLVM with Julia than Numba, so it would make sense.

[1] https://gerritnowald.wordpress.com/2022/10/03/simulating-rot...

1 comments

Thank you for the article! We're mainly interested in floating-point performance and energy consumption w/r/t to solving differential equations and tridiagonal systems of equations, while running on a 128-core compute node. Our current results will likely only be presented in May, but here are last year's results: https://www.cs.uni-potsdam.de/bs/research/docs/papers/2025/l...

Our Julia code is parallelised with FLoops.jl, but so far Numba has shown surprising performance benefits when executing code in parallel, despite being slower when executed sequentially. Therefore I can imagine that Julia might yield better results when run in a regular desktop environment.

Are you using this code for Julia?

https://github.com/JuliaParallel/rodinia/tree/master/julia_m...

It was touched 9 years ago, but maybe you have ported it to current standards. I don't think we had multithreading at that time, only multiprocessing.

Is your Julia implementations available somewhere? (Sorry if it is in your paper but I missed it). I vaguely remembered in the past that working with threads leaded to some additional allocations (compared to the serial code). Maybe this is also biting us here?

The source code is available here: https://gitup.uni-potsdam.de/bsvs/public/hpc-benchmark-game

As far as I know the code was ported to use @floops, with minor optimisations in addition to that.

I think it's quite possible that it's an allocation issue, that's something we're looking into, although I don't have any specific results for Julia yet.

Are you using Polyester.jl? Large numbers of threads are not optimized with Base threads usage due to GC interactions + the hierarchical threading adds overhead vs "unsafe" thread techniques which don't support the worksharing. Polyester is thus required to get very low overhead threading matching performance of non-worksharing scenarios.
I have a small benchmark program doing tight binding calculations of carbon nanostructures that I have implemented in C++ with Eigen, C++ with Armadillo, Fortran, Python/numpy, and Julia. It's been a while since I've tested it but IIRC all the other implementations were about on par, except for python which was about half the speed of the others. Haven't tried with numba.

To bring Julia performance on par with the compiled languages I had to do a little bit of profiling and tweaking using @views.

https://gitlab.com/jabl/tb

The JuliaParallel/rodinia repo says that the focus of those benchmarks is the CUDA versions. I suspect that the CPU versions have not had much optimization effort spent on them. Julia isn't a magic wand, but you can usually get within a factor of 2 of C++ with similar effort.
Cluster environment with virtualized cores may cause slower performance of Julia's parallel code. People recommend Threadpinnig.jl to solve the issues.
That really seems very unlike what everyone else is seeing. There really is no reason why Julia should be slower than numba...