|
|
|
|
|
by walrus
2227 days ago
|
|
I'm just speculating (and haven't read the paper yet), but it may be possible to achieve similar speedups on GPUs by pruning the smallest 20% of blocks of size ≥K×K to produce block-sparse weights[0], rather than pruning the smallest 20% of weights. [0] https://openai.com/blog/block-sparse-gpu-kernels/ |
|