| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aspenmartin 12 days ago
	It's more than the raw hardware, it's the interconnect and communication between the hardware at scale. These models are trained on hundreds of thousands of GPUs today. You _will_ start to see cross-datacenter training runs but this needs to efficiently decide when and how to communicate across datacenter, which bears a very high cost compared to intra-datacenter communication.