| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by madlag 1244 days ago

May I add another method: block fine-pruning of transformers (pruning while fine-tuning) ?

https://arxiv.org/abs/2109.04838

Using blocks allows to keep good performence on GPUS, while giving some flexibility in the pruning pattern. And when removing entirely empty rows and columns the pruned matrices are actually pretty dense, so competitive with structured pruning for speedup, but less "aggressive" on the network during the pruning process. Disclaimer: I am the main co-author.

1 comments

binarymax 1244 days ago

This looks super interesting! Thanks for the weekend reading :)

link