Hacker News new | ask | show | jobs
by nwoli 685 days ago
One elegant approach for this I’ve found is this https://github.com/mit-han-lab/gan-compression They basically train an “all in one” network from which you can extract small or large models afterwards (with optional additional finetuning to improve the selected channel size combinations)
1 comments

Ahh that's an interesting paper I must of missed that one - thanks for the link. I think another paper that recently got a lot of hype has been the Matroyshka representation learning paper -- essentially training models with different parameters and output embedding sizes at the same time, basically distillation during training rather than post-training (https://arxiv.org/abs/2205.13147).