NVLink is what makes multiGPU work. It lets the GPUs talk to each other across a high bandwidth (600 Gbps), low latency link. Tensorflow and PyTorch both support it, among other things. It's not this weird thing that's a side note, the interconnect between nodes is what makes a supercomputer super. You don't hear about it much because you don't hear about a lot of details of supercomputer stuff in mainstream media.
Thank you, but this doesn't really answer OPs or my question. Is NVLink required if you want to run an LLM model which exceeds the memory of a single GPU? What are the benchmark comparisons with and without it?
I've heard that NVLink helps with training, but not so much with inferencing.