Hacker News new | ask | show | jobs
by zcbenz 340 days ago
In the absence of hardware unified memory, CUDA will automatically copy data between CPU/GPU when there are page faults.
3 comments

There is also NVLink c2c support between Nvidia's CPUs and GPUs that doesn't require any copy, CPUs and GPUs directly access each other's memory over a coherent bus. IIRC, they have 4 CPU + 4 GPU servers already available.
Yeah NCCL is a whole world and it's not even the only thing involved, but IIRC that's the difference between 8xH100 PCI and 8xH100 SXM2.
This seems like it would be slow…
Matches my experience. It’s memory stalls all over the place, aggravated (on 12.3 at least) there wasn’t even a prefetcher.