| HN Mirror

That's not what GP is talking about. Those algorithms are the sub-cubic ones, the ones that are O(x^2.something). GP is talking about an algorithm that is still O(n^3), but is not just a straight translation of the matrix multiplication formula into code, but rather does things in a way that is more friendly to the CPU cache, resulting in significant speed gains.