You actually can split a model [0] onto multiple GPUs even without NVLink, just using the PCIe for the transfers.
Depending on the model the performance is sometimes not all that different. I believe for solely inference on some models the speed difference may barely be noticeable, where for other training activities it may make 10+% difference [1]