| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ImprobableTruth 849 days ago
	IME NVLink would be overkill for this. Model parallelism means you only need bandwidth to transfer the intermediate activations (/gradients + optimizer state) at the seams and inference speed is generally slow enough that even pcie x8 won't be a bottleneck.