Hacker News new | ask | show | jobs
by etik 2192 days ago
There is a recent effort [1] to provide low-level support for faster operations by transforming user code to take advantage of a compiler's instruction set, memory packing, etc. This is being expanded upon to essentially provide a Julia-native BLAS. Some of the benchmarks are even competitive with or beat Intel MKL (calibrate that statement appropriately to your level of trust in benchmarks). I wouldn't count out a Julia ARPACK implementation just yet.

[1] LoopVectorization: https://github.com/chriselrod/LoopVectorization.jl Announcement post and discussion: https://discourse.julialang.org/t/ann-loopvectorization/3284...

2 comments

You're probably confusing ARPACK with BLAS / LAPACK.

Pure julia ARPACK already exists, e.g. https://github.com/haampie/ArnoldiMethod.jl/.

A competive BLAS-gemm is implemented here https://github.com/YingboMa/MaBLAS.jl/blob/master/src/gemm.j... (single-threaded).

A LAPACK-like library could be https://github.com/JuliaLinearAlgebra/GenericLinearAlgebra.j...

I was mostly referring to the parent comment's suggestion that low level numerical libraries wouldn't benefit from a pure Julia implementation, specifically to the statement that it was still better to write optimized C/FORTRAN and call from Julia. Indeed, MaBLAS, which you linked, is built on top of LoopVectorization.jl.

I don't know how well ArnoldiMethod.jl compares with ARPACK, but if there is a gap my suggestion is simply that these recent developments might help bridge it :)

FWIW, MaBLAS currently does not depend on LoopVectorization.jl, the code to generate kernels is all handwritten.
This is great, a lot of the performance in BLAS comes from memory management. This does not solve my problem though. Sometimes you want to manually control memory, and Julia does not make it easy to do so. In particular when interfacing with BLAS/LAPACK though Julia, memory management is quite ugly, mainly due to the need for allocating work arrays, and a pure Julia implementation of BLAS is unlikely to fix this. I don't think writing eg ARPACK performantly in Julia is imposible, just painful to the point that writing it in FORTRAN starts to make sense (to me).
> I don't think writing eg ARPACK performantly in Julia is impossible

Not sure what you mean, ARPACK just wraps a bunch of calls to LAPACK and BLAS, that's all. It does not have any low-level linalg kernels of its own. Also, its main bottleneck is typically outside of the algorithm, namely matrix-vector products. ArnoldiMethod.jl is pretty much a pure Julia implementation of ARPACK without any dependency on LAPACK (only BLAS when used with Float32/Float64/ComplexF32/ComplexF64). Finally note that you can easily beat ARPACK by adding GPU support, since they don't provide it.

The person I was responding to seemed to have read my comment and thought I had an issue with raw performance. My point above is that writing iterative code that makes many LAPACK calls becomes difficult to write in Julia because there is no way to manage the memory in this situation other than ccall-ing everything yourself, at which point I would rather write FORTRAN. I work on eigenvalue solvers, so it is all more or less just wrapping a bunch of calls to BLAS/LAPACK. As you say, the main bottleneck is in the BLAS calls anyways, but the excess allocation in Julia can really slow you down. ARPACK is maybe a bad example because its all just mat-vec, when you need to do something like compute an SVD every iteration, thats where you run into issues.