Y
Hacker News
new
|
ask
|
show
|
jobs
by
DonsDiscountGas
1 day ago
I didn't know model merging like that was possible. (Obviously possible from a pure software standpoint but I'm surprised it's effective)
3 comments
bwhitty
1 day ago
As another poster above linked, it’s been shown to be effective since 2022:
https://arxiv.org/abs/2203.05482
link
nightpool
1 day ago
it works because Nex N2 is also a derivative of the original base Qwen model. If it was two completely unrelated models it wouldn't work.
link
baobabKoodaa
15 hours ago
A few years back these used to be called "Frankenstein models"
link
hypercube33
1 day ago
Even merging models with themselves as shown here in the post how they got to the top of hugging face with two gpus
link