Hacker News new | ask | show | jobs
by stephencanon 1577 days ago
That said, it _is_ optimizable, it's just that:

- the gains are the sort of typical 2-8x speed improvements from vectorization, not the multiple-orders-of-magnitude gains that you can get on dense GEMM.

- the absolute number of flops performed is O(n^2) rather than O(n^3) for GEMM, so even if you could make tridiagonal operations infinitely fast, that optimization effort would be better spent on even small speedups to the O(n^3) work that probably comprises other parts of your algorithm.