You actually can split a model [0] onto multiple GPUs even without NVLink, just using the PCIe for the transfers.
Depending on the model the performance is sometimes not all that different. I believe for solely inference on some models the speed difference may barely be noticeable, where for other training activities it may make 10+% difference [1]
Depending on the model the performance is sometimes not all that different. I believe for solely inference on some models the speed difference may barely be noticeable, where for other training activities it may make 10+% difference [1]
[0] https://pytorch.org/tutorials/intermediate/model_parallel_tu...
[1] https://huggingface.co/transformers/v4.9.2/performance.html