| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Alifatisk 300 days ago
	I think it's because of a combination between the MoE model architecture and the inference done in large batches and run in parallel