| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by naasking 65 days ago
	MoE isn't inherently better, but I do think it's still an under explored space. When your sparse model can do 5 runs on the same prompt in the same time as a dense model takes to generate one, there opens up all sorts of interesting possibilities.