| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by semitones 1180 days ago
	The main reason an arbitrarily distributed set of compute nodes cannot give you good performance for training a model (even if you have an immodest number of nodes), is that the latency of the inter-node communication will be a massive bottleneck. GPU cloud providers shell out big bucks for ultra fast intra-DC networking via infiniband and the like, and the networking is paid attention to as much (if not more sometimes) than the capabilities of the nodes themselves.