Y
Hacker News
new
|
ask
|
show
|
jobs
by
cooljoseph
839 days ago
You can also have another model "mentor" a new model you are teaching to speed up training. You don't have to start from scratch with zero knowledge. This is done a lot in what are called distillations.
2 comments
eru
839 days ago
You can also re-use a lot of the infrastructure. Eg you can re-use your training data.
link
fnordpiglet
839 days ago
This came out a little bit ago, my open question is if this approach can be used to port weights between architectures like this.
https://arxiv.org/abs/2402.13144
link