|
|
|
|
|
by idontknowmuch
684 days ago
|
|
Ahh that's an interesting paper I must of missed that one - thanks for the link. I think another paper that recently got a lot of hype has been the Matroyshka representation learning paper -- essentially training models with different parameters and output embedding sizes at the same time, basically distillation during training rather than post-training (https://arxiv.org/abs/2205.13147). |
|