Hacker News new | ask | show | jobs
by camel-cdr 885 days ago
I was thinking about that, but matmul is so too hardware specific (cache sizes, and the like), and I'm not confident I can get an implementation that can get close to max performance.
1 comments

getting big matmul good is hard, but up to about 100x100 you don't need to worry about cache sizes, it's all about the microkernel