There are orders of magnitude fewer people playing with large (>40B) parameter models than the small ones, which means even fewer people finetuning those models.
I can’t imagine this is anything but selection bias.
> which means even fewer people finetuning those models.
Finetunes rarely led to "Top 5 performance" for the small ones.
Previously the top 10+ were all 70B, with maybe a few 30B in there. There were nearly no 13B's, let alone 7B.
The Zephyr-7b-β was one of the best 7B mistral 0.1 finetunes the past month and a half, and that didn't beat most 70B's.
Even at 7B there are few foundational models as even those take a relatively large amount of money. The only decent one for months has been 7B mistral which again didn't come that close to 70B performance.
Finetunes rarely led to "Top 5 performance" for the small ones. Previously the top 10+ were all 70B, with maybe a few 30B in there. There were nearly no 13B's, let alone 7B.
The Zephyr-7b-β was one of the best 7B mistral 0.1 finetunes the past month and a half, and that didn't beat most 70B's.
Even at 7B there are few foundational models as even those take a relatively large amount of money. The only decent one for months has been 7B mistral which again didn't come that close to 70B performance.