Hacker News new | ask | show | jobs
by silentvoice 4309 days ago
Matrix multiplication is one of the most abused computational kernels when showing off cache locality and vectorization optimizing compilers. Unfortunately very few scientific codes consist of massive matrix-matrix multiplies, and even more unfortunately quite a few of them require many vector additions and dot products - operations which are memory bound and confound the performance of scientific codes which make even the cleverest use of BLAS. Your CPU may be able to churn out a bajillion gigaflops on a matrix-matrix multiply, but once you get to the vector adds and dot products you just can't feed that FLOPS hungry beast fast enough to keep up the gains.