Hacker News new | ask | show | jobs
by deepnotderp 3037 days ago
You use the loop based GEMM kernel and inject the loop counters as the input size.
1 comments

L can be as small as 1 and bigger than 512. For small L it makes sense to do different optimizations than large L. A loop based GEMM doesn’t help with that.