Hacker News new | ask | show | jobs
by kristianp 939 days ago
How does Goliath-120b improve on llama2-70b by just combining two of them?

https://huggingface.co/alpindale/goliath-120b?text=Hi.

> An auto-regressive causal LM created by combining 2x finetuned Llama-2 70B into one.

1 comments

I.. don't know. Even the creator of the model doesn't know why it worked out so well.

It really is better (at reasoning) than the 70b models when I use it. Though some people reported that it makes spelling mistakes.

P.S. This doesn't always work out well, people have tried swapping different layers randomly and it makes the models incoherent.