| Isn't there a way around this? When coding for graphics stuff writing to GPU mapped memory people usually take pains to turn off compiler optimizations that might XOR memory against itself to zero it out or AND it against 0 and cause a read, and other things like that. https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-... > Even the following C++ code can read from memory and trigger the performance penalty because the code can expand to the following x86 assembly code.
C++ code: Copy *((int*)MappedResource.pData) = 0;
x86 assembly code: Copy AND DWORD PTR [EAX],0
> Use the appropriate optimization settings and language constructs to help avoid this performance penalty. For example, you can avoid the xor optimization by using a volatile pointer or by optimizing for code speed instead of code size.I guess mmapped files still may need a read to know whether to do copy on write, where mapped memory for the CPU in that case is specifically marked for upload only and gets something flagged that writes it regardless of if there is a change, but mmap maybe has something similar? (edit: this seems to say nothing similar is possible with mmap on x86 https://stackoverflow.com/questions/31014515/write-only-mapp... but how does it work for GPUs? Something to do with fixed pci-e support on the cpu (base address register https://en.wikipedia.org/wiki/PCI_configuration_space)? |
https://insujang.github.io/2017-04-27/gpu-architecture-overv... has a pretty good visual of what's doing what for GPU DMA. You can imagine much of what happens here is almost pure software for mmap'd files.