| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bick_nyers 423 days ago
	The general rule of thumb when assessing MoE <-> Dense model intelligence is SQRT(Total_Params*Active_Params). For Deepseek, you end up with ~158B params. The economics of batch inferencing a ~158B model at scale are different when compared to something like Deepseek (it is ~4x more FLOPS per inference after all), particularly if users care about latency.