|
|
|
|
|
by christkv
7 hours ago
|
|
For now I suspect however that the gigantic models are not needed and you will be able to do pretty much what you need in a specific domain with 120b or lower. There is so much trash in the frontier models. I don't need all the world's slam poetry for my coding tasks for example. |
|
Model capability is a function of model size. Raising the bar raises model performance in every domain.
An "idiot savant" model that's overtrained for a specific domain would beat a generalist model of the same size. But scale the generalist up enough, and it'll trounce the specialist. Removing poetry data from a model training mix doesn't give you much - it might even cost you some performance - and "idiot savant" approach of overtraining for a domain has a hard ceiling.
So far, it seems like there's some equivalent of "g factor" in LLMs - a broad "intelligence" value that performance across many diverse domains correlates with. And, as a rule, larger models have more of it.