Hacker News new | ask | show | jobs
by revelation 4200 days ago
VRAM is not mapped to the same memory space as your normal RAM and it is not directly accessible via regular CPU instructions. It's wholly owned by the GPU, and that's who the CPU has to talk to to use it.

This is in fact a (if not the) major limiting factor to expanded use of GPUs for general purpose calculations: you always have to copy input and results between video RAM and normal RAM.

1 comments

Yes, it's owned by the GPU, but you can map it to the regular space. In fact, this is exactly how textures and other data gets loaded to the video card. See http://en.wikipedia.org/wiki/Memory-mapped_I/O
True -- you can use (in OpenCL) clEnqueueMapBuffer to get something that looks like memory-mapped IO, but the consistency guarantees are different from regular host-based MMIO. Specifically, if you map a GPU buffer for writes, there's no guarantee on what you'll get when you read that buffer until you unmap the region. (You can think of it as buffering up writes in host memory until you unmap the region, at which point it's DMAed over to the GPU.)

See the "Notes" section in https://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/c...

In other words, OpenCL supports this very limited buffer interface due to compatibility issues, i.e. this kind of MMIO is the lowest common denominator that has to be implemented by any GPU claiming OpenCL compatibility. Although, this does not preclude most desktop discrete GPUs from mapping their whole internal VRAM onto host's memory address space through PCI bus. It seems to be the common mechanism for a host to access VRAM in the modern ATI and NVidia GPUs from what I understood after skimming through several technical documents. It is, by the way, as far as I can tell, the main reason behind the infamous 'memory hole' in 32-bit Windows OS's (inability to use more than 2.5-3G of RAM). So, I guess, the correct answer to my initial question as to why it's not possible to use tmpfs with VRAM would be because that will require special memory allocation made in VRAM. Meaning, a patch to tmpfs code that can properly allocate memory in VRAM buffer would suffice if we are willing to limit compatibility to 64-bit x86 architecture with AMD/NVidia GPUs.