| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by necroforest 1003 days ago
	LLMs are trained in parallel. The model weights and optimizer state are split over a number (possibly thousands) of accelerators. The main bottleneck to doing distributed training like this is the communication between nodes.