Hacker News new | ask | show | jobs
by AaronFriel 3821 days ago
There's performance overhead to doing everything, but one way to simulate zero'd virtual memory is to simply map them to a zero page and, when a full page write occurs to simply write the page, and when a partial write occurs to zero the rest.

I am not familiar with GPU internals enough, but my understanding is that the GPU should be smart enough to know that a given texture or framebuffer will occupy n full pages, and so when either is written in its entirety, the zeroing only has to occur at the edges. (I would assume that the write would start on a page, but I don't know anything about GPU internals.)

Caveat emptor: I will reiterate I know very little about memory internals. It seems like a bigger issue is that GPU memory is not virtualized and all users get access to the same memory. It's as if three decades of understanding the utility of virtual memory were forgotten.

2 comments

> It seems like a bigger issue is that GPU memory is not virtualized and all users get access to the same memory. It's as if three decades of understanding the utility of virtual memory were forgotten.

I think you're forgetting the most important thing here - GPUs are meant to be fast. Virtualization will add like what, an order of magnitude to the access times?

GPUs have had MMUs for a while (though they don't recover from page faults the same way CPUs do, I don't believe).
Last I checked, they pretty much don't recover from page faults, they just abort whatever "program" you're trying to run, so you can't really use the MMU for clever things like demand paging. But that's not the point of the GPU's MMU in the first place.
> Last I checked, they pretty much don't recover from page faults, they just abort whatever "program" you're trying to run

Correct. When the GPU page faults, it causes a CPU interrupt and the driver will handle the interrupt. It's not possible to resume execution on a GPU in a timely manner so the only option is to terminate the process that caused the page fault.

> so you can't really use the MMU for clever things like demand paging

Recent GPU generations support "sparse" or "tiled" memory where the GPU can detect if a load or a store would access non-resident memory and then act accordingly. This requires a specialized shader and some CPU-side logic to actually stream in the memory. This can be used to on-demand paging for textures and buffers as well as implement workarounds to reduce visual artifacts from streaming.

GPUs don't have a page fault handler; when there's a page fault, it's an unrecoverable crash. Accordingly, zero-on-allocate (or potentially zero-on-free, but that makes assumptions about startup and teardown that may not be true) is the only way to do it.