|
|
|
|
|
by noodletheworld
42 days ago
|
|
Having tried it. Qwen is really good. Also, generally, it makes sense. 8B models are generally not very good^. That this 8B model is decent is impressive, but that it could perform on par with a good model 4 times as large is a daydream. ^ - To be polite. The small models + tool use for coding agents are almost universally ass. Proof: my personal experience. Ive tried many of them. |
|
The geometric mean rule of thumb for MoE models is that the intelligence level of an MoE model with T total parameters and A active parameters is roughly equivalent to that of a dense model with sqrt(A*T) parameters. For qwen3.6-35B-A3B, that equivalent size is 10.24B, spitting distance of an 8B model. Good training can make up the 28% difference in size.