| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by phkahler 632 days ago
	>> MoE model with 52 billion activated parameters means its more comparable to a (dense) 70b model and not a dense 405b model Only when talking about how fast it can produce output. From a capability point of view it makes sense to compare the larger number of parameters. I suppose there's also a "total storage" comparison too, since didn't they say this is 8bit model weights, where llama is 16bit?