|
|
|
|
|
by martinald
6 hours ago
|
|
Yes 32B dense is a weird one to choose. But in reality, 32B dense is very similar* to 32B activated on MoE in terms of inference costs. And I highly suspect eg Opus is around that level of active params. A 284ba13b model at scale, is almost certainly cheaper to serve than a 32b dense model. *as you can shard the model across multiple GPUs at scale. but in reality you have some loss of efficiency from GPU coordination and expert routing |
|