|
|
|
|
|
by aseipp
1032 days ago
|
|
You could already do that with Unified Memory which has existed for a while and IIRC supported paging and swapping, assuming you `cudaMalloc` and `cudaFree` appropriately for your allocations. This is not a change to "features" but a change to the programming model. You now never need to ever write cudaMalloc or cudaFree, you can just use any allocator or tool. This means more off the shelf code will just work when used with CUDA. So now your io_uring buffers can be shared with the GPU trivially, for example, or mmap'd pages that a library gave you, or whatever. The programming model is one of the things Nvidia does significantly better than any competitor. Single source model + HMM is a big step up from something like OpenCL in productivity and correctness. On Grace Hopper chips, HMM is granular down to the cache line (64 bytes); on x86 systems I believe they said it's (of course) a 4k page granularity. |
|