|
Though, the classic dense methods have been implemented and optimised and ported to death with LAPACK/BLAS, haven't they. What you're talking about is parallelisation, moving to GPU, and modern (combinatorial) methods for sparse systems, and that's fairly cutting edge, and not trivial to implement/port. You'll need to have a pretty good understanding of the language and its paradigmatic use, and of linear algebra, and of modern computer architecture. However, I assume that you're right in that there might still be some low-hanging fruit. BTW, Julia is an awesome language with excellent LA support, and it's nice in that most algorithms are coded in Julia itself (unlike the two-language situation in Python, Clojure (AFAIK), etc.) |
Honestly, it might be tricky, but implementing matrix operations is not rocket science either. I find it incredible that so many projects rely on NVidia's proprietary libraries for doing this on GPU. Maybe there is some secret juice that I just don't know about, but it seems to me there can't be that many optimisation shortcuts for matrix multiplications and the like that require intimate, secret knowledge of the hardware.