Could be, but I'd like to hear more information about what it actually entails.
My gut feeling is that it's kind of like Z compression, but using the high amount of privileged software (basically a whole RTOS) they run on the GPU to dynamically allocate pages so that scare quotes "vram" allocations don't require giant arenas.
If that's the case, I'm not sure that ML will benefit. Most ML models are pretty good about actually touching everything they allocate, in which case, lazy allocations won't help you much and may actually get in the way startup latency.
My gut feeling is that it's kind of like Z compression, but using the high amount of privileged software (basically a whole RTOS) they run on the GPU to dynamically allocate pages so that scare quotes "vram" allocations don't require giant arenas.
If that's the case, I'm not sure that ML will benefit. Most ML models are pretty good about actually touching everything they allocate, in which case, lazy allocations won't help you much and may actually get in the way startup latency.