|
|
|
|
|
by Vachyas
68 days ago
|
|
What you described sounds plausible (expected, even). But >Raw parameter counts stopped increasing almost 5 years ago Really? 5 years ago? Until just about 3 years ago OpenAI's latest offering was only ChatGPT 3.5 Most of the models people talk about now didn't even exist 3 years ago let alone 5. Even now, I don't know if parameter count stopped mattering or just matters less For example, I have no idea if the new Mythos is MoE but I'm pretty sure it's more parameters. |
|
>Even now, I don't know if parameter count stopped mattering or just matters less
Models in the 20b-100b range are already very capable when it comes to basic knowledge, reasoning etc. Improving the architecture, having better training recipes helped decrease the required parameter count considerably (currently 8b models can easily beat the 175b strong GPT3 from 3 years ago in many domains). What increasing the parameter count currently gives you is better memorization, i.e. better world knowledge without having to consult external knowledge bases, say, using RAG. For example, Qwen3.5 can one-short compilable code, reason etc. but can't remember the exact API calls to to many libraires, while Sonnet 4.6 can. I think what we need is split models into 2 parts: "reasoner" and "knowledge base". I think a reasoner could be pretty static with infrequent updates, and it's the knowledge base part which needs continuous updates (and trillions of parameters). Maybe we could have a system where a reasoner could choose different knowledge bases on demand.