Hacker News new | ask | show | jobs
by inglor 590 days ago
Can you explain why you did the naive algorithm here and not any of the fast matrix multiplication ones that trade multiplications for more additions? Just for educational purposes or is there a performance benefit in the technique?
2 comments

Because those algorithms are generally not worth implementing even though their algorithmic complexity is theoretically lower.
at least on my m2, the compiled kernel ends up using fast math anyways so using WGSL's fma didn't change anything about the actual kernel that gets run
inglor is probably referring to Strassen or Coppersmith–Winograd.
Last I checked the extra mems really hurt on a lot of cases especially for the more complex ones, but I'm no expert.
oh in that case it was because i didn't know about them :) something to try next!