|
|
|
|
|
by madlag
1244 days ago
|
|
May I add another method: block fine-pruning of transformers (pruning while fine-tuning) ? https://arxiv.org/abs/2109.04838 Using blocks allows to keep good performence on GPUS, while giving some flexibility in the pruning pattern. And when removing entirely empty rows and columns the pruned matrices are actually pretty dense, so competitive with structured pruning for speedup, but less "aggressive" on the network during the pruning process.
Disclaimer: I am the main co-author. |
|