| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lostdog 482 days ago
	LLM training depends on centralization. You want to do a global update of all your weights as quickly as possible. Distributing the weight updates and synchronizing occasionally let's the weights drift around aimlessly, and is very inefficient. To optimize LLM training, you want to put as many GPUs as close together with the fastest interconnect you can build.