| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by idontknowmuch 684 days ago
	Ahh that's an interesting paper I must of missed that one - thanks for the link. I think another paper that recently got a lot of hype has been the Matroyshka representation learning paper -- essentially training models with different parameters and output embedding sizes at the same time, basically distillation during training rather than post-training (https://arxiv.org/abs/2205.13147).