Hacker News new | ask | show | jobs
by ImprobableTruth 849 days ago
IME NVLink would be overkill for this. Model parallelism means you only need bandwidth to transfer the intermediate activations (/gradients + optimizer state) at the seams and inference speed is generally slow enough that even pcie x8 won't be a bottleneck.