Hacker News new | ask | show | jobs
by vlejd 203 days ago
Interestingly enough, we found that cublas is not that well optimized for some less common consumer GPUs, specifically 3090. We saw that is didn't really achieve it's full potential for a lot of different matrix shapes, probably because of poor tuning. Interestingly enough, out kernel does not have any parameters, and it was able to outperform cublas even in setting where it has no right to do so.

Regarding patterns, we tested mainly random matrices and ones created by Wanda pruning. 2:4 sparsity (commonly used structure) will have same results as random matrix (probably even better). Interestingly enough, block sparsity could have very close to a worst case scenario with our format, because it promotes disproportional long sequences of zeroes.

Regarding other usecases, we are looking into it, but most common ones we found are usually for much smaller sparsity <1%. If you know about some other use case that is in the 30-90 range, let us know.

1 comments