| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by martinald 6 hours ago

Yes 32B dense is a weird one to choose.

But in reality, 32B dense is very similar* to 32B activated on MoE in terms of inference costs. And I highly suspect eg Opus is around that level of active params.

A 284ba13b model at scale, is almost certainly cheaper to serve than a 32b dense model.

*as you can shard the model across multiple GPUs at scale. but in reality you have some loss of efficiency from GPU coordination and expert routing

1 comments

breput 5 hours ago

That's good information. I couldn't possibly even start to run even DeepSeek Flash on my system, but also if you're assuming multiple GPUs, that is going to affect the napkin math.

link