|
|
|
|
|
by selcuka
310 days ago
|
|
> GPT OSS 20B is a sparse MoE model. This means it only uses a fraction (3.6B) at a time. They compared it to GPT OSS 120B, which activates 5.1B parameters per token. Given the size of the model it's more than fair to compare it to Qwen3 32B. |
|