Hacker News new | ask | show | jobs
by filterfiber 911 days ago
I know the hugging face leaderboard isn't wildly accurate.

But the top models right now are almost all under 70B. Most are 7B, and the top is 10B. If the benchmarks are even remotely accurate then this is rather wild.

Apparently multiple groups found different "secret sauces", names upstage and whatever UNA is?

1 comments

I mean this isn’t too surprising that smaller models do better. I imagine transformers are as prone to overfitting as any statistical data model. Also there is probably some selection bias: bigger models are more expensive and there are just less people training and iterating with them
There are orders of magnitude fewer people playing with large (>40B) parameter models than the small ones, which means even fewer people finetuning those models.

I can’t imagine this is anything but selection bias.

> which means even fewer people finetuning those models.

Finetunes rarely led to "Top 5 performance" for the small ones. Previously the top 10+ were all 70B, with maybe a few 30B in there. There were nearly no 13B's, let alone 7B.

The Zephyr-7b-β was one of the best 7B mistral 0.1 finetunes the past month and a half, and that didn't beat most 70B's.

Even at 7B there are few foundational models as even those take a relatively large amount of money. The only decent one for months has been 7B mistral which again didn't come that close to 70B performance.