Hacker News new | ask | show | jobs
by walrus 2227 days ago
I'm just speculating (and haven't read the paper yet), but it may be possible to achieve similar speedups on GPUs by pruning the smallest 20% of blocks of size ≥K×K to produce block-sparse weights[0], rather than pruning the smallest 20% of weights.

[0] https://openai.com/blog/block-sparse-gpu-kernels/