Hacker News new | ask | show | jobs
by adgjlsfhk1 262 days ago
The tricky parts with Strassen are that it requires some fairly large changes to your looping strategy, and that it decreases accuracy, It also only helps once you are compute rather than bandwidth bound, and GPUs have lots of compute.
1 comments

> only helps once you are compute rather than bandwidth bound

Asymptotically, I don't think Strassen performs Theta(n^3) memory operations in sub-n^3 time.