Hacker News new | ask | show | jobs
by aseipp 1032 days ago
You could already do that with Unified Memory which has existed for a while and IIRC supported paging and swapping, assuming you `cudaMalloc` and `cudaFree` appropriately for your allocations.

This is not a change to "features" but a change to the programming model. You now never need to ever write cudaMalloc or cudaFree, you can just use any allocator or tool. This means more off the shelf code will just work when used with CUDA. So now your io_uring buffers can be shared with the GPU trivially, for example, or mmap'd pages that a library gave you, or whatever.

The programming model is one of the things Nvidia does significantly better than any competitor. Single source model + HMM is a big step up from something like OpenCL in productivity and correctness.

On Grace Hopper chips, HMM is granular down to the cache line (64 bytes); on x86 systems I believe they said it's (of course) a 4k page granularity.

1 comments

mmap weights directly from a file seems to be new (I think). Need to check notes to remember whether you can already do that with some cuda* API.
Yeah, I think a good simple litmus test for this is "can I directly call mmap(2) on a file, and then launch a kernel on that mmap'd memory, with no extra steps, and it works as I expect it to". With these newer features in CUDA, the answer to that is "yes you can."