|
|
|
|
|
by dsharlet
894 days ago
|
|
This is really overstating how hard it is to compete with matrix multiply libraries. The main reason those libraries are so big and have had so much work invested in them is their generality: they're reasonably fast for almost any kind of inputs. If you have a specific problem with constraints you can exploit (e.g. known fixed dimensions, sparsity patterns, data layouts, type conversions, etc.), it's not hard at all to beat MKL, etc... if you are using a language like C++. If you are using python, you have no chance. It isn't even necessarily that different from a few nested loops. Clang is pretty damn good at autovectorizing, you just have to be a little careful about how you write the code. |
|