| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nullbio 10 days ago
	I know the big labs like to pretend that their models are trillion parameter. But how likely is that really to be the case when Qwen 3.6 35B A3B gets so close to their performance? Seems that with the best research applied, best training data, they'd be able to top the charts with a 60B model quite easily.

2 comments

MisterKent 10 days ago

They want people to believe they have massive models, that is effectively their moat at this point.

Because if they don't imply that size is needed for every task, they'll end up tanking their valuations.

https://blog.nilesh.io/post/ai-profit-race

link

redox99 10 days ago

Qwen 35B isn't even remotely close to the big models. It's just people over hyping small models. Ignore the benchmarks they are almost meaningless.

If you want something comparable you need the trillion parameter open models like deepseek.

link

otabdeveloper4 10 days ago

Number of parameters doesn't make the model smarter, it just makes it know more stuff out of the box.

At some point there's diminishing returns and your coding LLM performs worse because you encoded useless stuff like Pokemon combinations or languages you don't speak into its parameter space.

The "smartness" of the model comes from RLHF post-training, which is orthogonal to model size.

Also, if you're using an agentic harness a much better approach is to let the model control its own context. If you ever reach a point where your coding LLM needs to know about Pokemon, just give it a web search tool and let it google the Pokemons.

link

redox99 10 days ago

That's just... not true. Just compare any open model which is trained with the same recipe but multiple sizes.

link

oneshtein 9 days ago

You can compare models at OpenRouter site. Qwen 3.6 dense is in top 24% for coding.

link

otabdeveloper4 9 days ago

> Just compare any open model which is trained with the same recipe but multiple sizes.

That's exactly what I did.

link