Hacker News new | ask | show | jobs
by geezerjay 3187 days ago
> Honestly, it might be tricky, but implementing matrix operations is not rocket science either.

Actually, it's very hard to implement efficient algebraic matrix-matrix and matrix-vector operations, although naive implementations are very easy to pull off. You're fooling yourself if you believe you can whip out an implementation for basic BLAS-(1|2|3) kernels that matches the performance of properly tuned implementations. Implementing a kernel whose performance doesn't stray too far from the hardware's capacity takes a lot of knowledge and work on low-level details such as cache hierarchies and its impact on the memory access performance. Floating point operations actually take a back-seat to memory access, as they represent a small fraction of the operations being performed by the kernel (IIRC, in sparse matrix operations the proportion of fp operations is only about 1-in-7) and the bulk of the implementation is focused on memory access that minimizes cache misses. Therefore, to even be in a position to implement an acceptable matrix-vector or matrix-matrix kernel you need to have a solid understanding on how particular performances handle memory. This isn't trivial, and it's one of the reasons why articles about implementing X on a GPU, even if X is a classic algorithm, are accepted and published by specialized publications.