Hacker News new | ask | show | jobs
by idonotknowwhy 933 days ago
I.. don't know. Even the creator of the model doesn't know why it worked out so well.

It really is better (at reasoning) than the 70b models when I use it. Though some people reported that it makes spelling mistakes.

P.S. This doesn't always work out well, people have tried swapping different layers randomly and it makes the models incoherent.