| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hatmatrix 152 days ago
	Do you have an idea whether these are specific types of problems that is giving Julia poorer performance? From what I recall, people were reporting better speeds with Julia than with Numba (e.g., [1]). My impression was that you are basically able to bring more of your code to LLVM with Julia than Numba, so it would make sense. [1] https://gerritnowald.wordpress.com/2022/10/03/simulating-rot...

1 comments

galdauts 152 days ago

Thank you for the article! We're mainly interested in floating-point performance and energy consumption w/r/t to solving differential equations and tridiagonal systems of equations, while running on a 128-core compute node. Our current results will likely only be presented in May, but here are last year's results: https://www.cs.uni-potsdam.de/bs/research/docs/papers/2025/l...

Our Julia code is parallelised with FLoops.jl, but so far Numba has shown surprising performance benefits when executing code in parallel, despite being slower when executed sequentially. Therefore I can imagine that Julia might yield better results when run in a regular desktop environment.

link

Alexander-Barth 151 days ago

Are you using this code for Julia?

https://github.com/JuliaParallel/rodinia/tree/master/julia_m...

It was touched 9 years ago, but maybe you have ported it to current standards. I don't think we had multithreading at that time, only multiprocessing.

Is your Julia implementations available somewhere? (Sorry if it is in your paper but I missed it). I vaguely remembered in the past that working with threads leaded to some additional allocations (compared to the serial code). Maybe this is also biting us here?

link

galdauts 147 days ago

The source code is available here: https://gitup.uni-potsdam.de/bsvs/public/hpc-benchmark-game

As far as I know the code was ported to use @floops, with minor optimisations in addition to that.

I think it's quite possible that it's an allocation issue, that's something we're looking into, although I don't have any specific results for Julia yet.

link

ChrisRackauckas 151 days ago

Are you using Polyester.jl? Large numbers of threads are not optimized with Base threads usage due to GC interactions + the hierarchical threading adds overhead vs "unsafe" thread techniques which don't support the worksharing. Polyester is thus required to get very low overhead threading matching performance of non-worksharing scenarios.

link

jabl 151 days ago

I have a small benchmark program doing tight binding calculations of carbon nanostructures that I have implemented in C++ with Eigen, C++ with Armadillo, Fortran, Python/numpy, and Julia. It's been a while since I've tested it but IIRC all the other implementations were about on par, except for python which was about half the speed of the others. Haven't tried with numba.

To bring Julia performance on par with the compiled languages I had to do a little bit of profiling and tweaking using @views.

https://gitlab.com/jabl/tb

link

jondea 152 days ago

The JuliaParallel/rodinia repo says that the focus of those benchmarks is the CUDA versions. I suspect that the CPU versions have not had much optimization effort spent on them. Julia isn't a magic wand, but you can usually get within a factor of 2 of C++ with similar effort.

link

dandanua 152 days ago

Cluster environment with virtualized cores may cause slower performance of Julia's parallel code. People recommend Threadpinnig.jl to solve the issues.

link

Certhas 151 days ago

That really seems very unlike what everyone else is seeing. There really is no reason why Julia should be slower than numba...

link