| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nwoli 685 days ago
	One elegant approach for this I’ve found is this https://github.com/mit-han-lab/gan-compression They basically train an “all in one” network from which you can extract small or large models afterwards (with optional additional finetuning to improve the selected channel size combinations)

1 comments

idontknowmuch 684 days ago

Ahh that's an interesting paper I must of missed that one - thanks for the link. I think another paper that recently got a lot of hype has been the Matroyshka representation learning paper -- essentially training models with different parameters and output embedding sizes at the same time, basically distillation during training rather than post-training (https://arxiv.org/abs/2205.13147).

link