| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rerx 2655 days ago
	State of the art deep learning models are becoming larger and larger and at some point it makes sense to distribute them over multiple GPUs because they would not fit into a single GPU's memory. At the same time training can be sped up dramatically by blowing up the mini batch size in a synchronized training regime, again requiring multiple GPUs. So the trend is towards "model parallelism" and "data parallelism" at the same time. Once you need more GPUs than you can put on a single PCI Express bus, you need a fast interconnect between servers. Infiniband seems to be the best solution at this time. Nvidia GPUs can already communicate ridiculously fast with remote GPUs via RDMA if there is an Infiniband connection. It makes a lot of sense for Nvidia to push into this direction to provide integrated solutions.