|
|
|
|
|
by selcuka
309 days ago
|
|
> In the case of gpt-oss 120B that would means sqrt(5*120)=24B. That's actually in line with what I had (unscientifically) expected. Claude Sonnet 4 seems to agree: > The most accurate approach for your specific 120B MoE (5.1B active) would be to test it empirically against dense models in the 10-30B range. |
|