Hacker News new | ask | show | jobs
by dragandj 3187 days ago
Can you give an example in popular Julia library(ies) that do that, and, very important, to speed comparisons? Are those operations close to the speed found in MKL for example?
1 comments

Well if you define a new type, and * and + operations for it, you will get matrix multiplication for free in native julia. I think you can do similar things for a few other algorithms in native julia. I don't think it will be MKL speed though because MKL uses cache size information etc. and is very optimized. Of course julia uses BLAS/Lapack etc for the types it can.

GenericSVD.jl is an interesting example of how you can factorize a matrix of quaternions by just defining appropriate functions for quarternions and running using the generic SVD algorithm (it's just an example of what I talked about above, the library is not really used by anyone I think).

Knet.jl is a fairly mature deep learning library that really demonstrates the power of the language. You simply define your forward and loss functions, and then you are set. The function gets auto diferentiated and you go ahead and do SGD. There really isn't anything else you need to mess with. It also works seamlessly with GPU arrays without you having to touch the forward and loss functions.

When I hear "you just do X", I become sceptical of that instantly :)
As a note, this feature isn't exclusive to Julia. For example, Eigen implements something similar and the requirements for doing so are specified here:

https://eigen.tuxfamily.org/dox-devel/TopicCustomizing_Custo...

The reason to do so is to use a more exotic type that gives additional information. In the simplest case, we may want to use quad precision, which BLAS and LAPACK do not support by default (but it is possible to hack in). Alternatively, it's a good way to run automatic differentiation through the solve, or use infinite precision libraries like GMP, or use interval arithmetic.

So, yes, it's right to be skeptical, but it is possible, some libraries support it, and it's a useful trick to have available.

It's a useful trick, but I was talking about comparison with what MKL and cuBLAS and other Nvidia libraries do. For the float/double/half implementations of "standard" operations that 95% use cases fall into, of which there are hundreds if not thousands of, I haven't seen even a close match.

Of course, if you need something special, there are various techniques. My preference for those things is to skip the middleman and code them in CUDA/OpenCL kernels directly...

OTOH, I'm interested in machine learning applications rather than physics/engineering. Double is an overkill here, float is the sweet spot, and some techniques use half or even less precision. Quad-precision may just not be the case that I need, but for people who do, I suppose they'd have their ways of solving that.