| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by FuckButtons 653 days ago
	AFAIK, the main bottleneck on training is memory bandwidth. Distributed gpu compute has multiple orders of magnitude less than an equivalent number of GPUs colocated, because they don’t share a physical bus, but have a network connection instead. This work improves on that, but the fundamental limitations remain.