Y
Hacker News
new
|
ask
|
show
|
jobs
by
eoerl
1299 days ago
It is not typically possible to blend models like that, since the training process is (lateral) order insensitive, as far as the model goes.
2 comments
liuliu
1299 days ago
I thought so too until found that there are quite a bit of literatures nowadays about "merging" weights, for example, this one:
https://arxiv.org/pdf/1811.10515.pdf
and also the OpenCLIP paper.
link
ShamelessC
1299 days ago
Is that still the case when all models have a common ancestor (i.e. finetuned) and haven’t yet overfit on new data?
link