| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bufo 1149 days ago
	There is a difference. We train with large batch sizes these days. The ANE silicon size is tiny and can't do the large matrix multiplications for big LLMs with or without a batch size higher than 1. Meaning that it cannot saturate the RAM bandwidth and that you're better using off the much bigger GPU on the Apple die.