|
|
|
|
|
by omneity
310 days ago
|
|
Qwen3 32B is a dense model, it uses all its parameters all the time. GPT OSS 20B is a sparse MoE model. This means it only uses a fraction (3.6B) at a time. It’s a tradeoff that makes it faster to run than a dense 20B model and much smarter than a 3.6B one. In practice the fairest comparison would be to a dense ~8B model. Qwen Coder 30B A3B is a good sparse comparison point as well. |
|
When people talk about sparse or dense models, are they spare or dense matrices in the conventional numerical linear algebra sense? (Something like a csr matrix?)