|
|
|
|
|
by redox99
10 days ago
|
|
Qwen 35B isn't even remotely close to the big models. It's just people over hyping small models. Ignore the benchmarks they are almost meaningless. If you want something comparable you need the trillion parameter open models like deepseek. |
|
At some point there's diminishing returns and your coding LLM performs worse because you encoded useless stuff like Pokemon combinations or languages you don't speak into its parameter space.
The "smartness" of the model comes from RLHF post-training, which is orthogonal to model size.
Also, if you're using an agentic harness a much better approach is to let the model control its own context. If you ever reach a point where your coding LLM needs to know about Pokemon, just give it a web search tool and let it google the Pokemons.