|
|
|
|
|
by ImprobableTruth
849 days ago
|
|
IME NVLink would be overkill for this. Model parallelism means you only need bandwidth to transfer the intermediate activations (/gradients + optimizer state) at the seams and inference speed is generally slow enough that even pcie x8 won't be a bottleneck. |
|