|
|
|
|
|
by Schneekatze
4795 days ago
|
|
This is true, even though we would rather switch to Armadillo due to it's easier handling and better high-level behaviour. Right now the linear algebra library we use -ublas- has the same behaviour as Eigen for BLAS1 type expressions. So it tries to generate optimal (non-SSE) code. Only for BLAS2 and 3 we fall back to the ATLAS-routines which has the same performance as Eigen on the interesting problem sizes. //small edit
In the end it is not so interesting whether the BLAS1-type expressions are fast as they make up < 1% of run time performance. The big chunks are the data processing inside the matrix-matrix multiplications of the Neural Networks and similar entities. |
|
Another thing about code generation, I am also using a hacked version of Eigen as well in a project I'm working on that can do the tanh and derivative of the tanh so the NN activations go quite abit faster since you can generate vectorized code for the whole calculation that will visit the memory location exactly once. While true the calculation of the weight updates is the most time spent, I saw 3-4x speedup in the activation code doing it in a single operation due to better memory access patterns and less loop iterations. Better memory access patterns can also have synergistic effects on other code because there is less cache pollution happening. By being fast and loose and introducing a few other copies of the matrix data in my case, my performance falls off a cliff when it no longer fits in the cpu cache nicely. 10x difference in the particular case I am remembering.
As always performance is part art, part science and perhaps it won't matter as much for the general case, but for my specific implementation and my matrix sizes Eigen has made a measurable difference for me compared to other solutions.