Hacker News new | ask | show | jobs
by cma 1979 days ago
Isn't there a way around this? When coding for graphics stuff writing to GPU mapped memory people usually take pains to turn off compiler optimizations that might XOR memory against itself to zero it out or AND it against 0 and cause a read, and other things like that.

https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-...

> Even the following C++ code can read from memory and trigger the performance penalty because the code can expand to the following x86 assembly code. C++ code:

    Copy *((int*)MappedResource.pData) = 0;
x86 assembly code:

    Copy AND DWORD PTR [EAX],0
> Use the appropriate optimization settings and language constructs to help avoid this performance penalty. For example, you can avoid the xor optimization by using a volatile pointer or by optimizing for code speed instead of code size.

I guess mmapped files still may need a read to know whether to do copy on write, where mapped memory for the CPU in that case is specifically marked for upload only and gets something flagged that writes it regardless of if there is a change, but mmap maybe has something similar?

(edit: this seems to say nothing similar is possible with mmap on x86 https://stackoverflow.com/questions/31014515/write-only-mapp...

but how does it work for GPUs? Something to do with fixed pci-e support on the cpu (base address register https://en.wikipedia.org/wiki/PCI_configuration_space)?

5 comments

The answer is that it works pretty similarly, but GPUs usually do this in specialized hardware whereas mmap'ing of files for DMA-style access is implemented mostly in software.

https://insujang.github.io/2017-04-27/gpu-architecture-overv... has a pretty good visual of what's doing what for GPU DMA. You can imagine much of what happens here is almost pure software for mmap'd files.

You'd need a way to indicate when you start and end overwriting the page. You need to avoid the page being swapped out mid-overwrite and not read back in. You'd also pay a penalty for zeroing it when it gets mapped pre-overwrite. The map primitives are just not meant for this.
I think on Linux there's madvise syscall with "remove" flag, which you can issue on memory pages you intend to completely overwrite. I have no idea on performance or other practical issues.
Oracle's JVM allocates your maximum heap size at startup, but these pages aren't actually assigned to either swap space or RAM pages until the first time they're written to (or read, but unless there's a bug in the JVM, it's not reading uninitialized memory), which triggers a page fault.

If the heap usage was high, and drops enough (maybe also needs to stay low for some time period), then Oracle's JVM will release some of the pages back to the OS using madvise, so they go back to using neither RAM nor swap space. On the one hand, the JVM should avoid repeatedly releasing pages back to the OS and then page faulting them back in moments later, but on the other, it shouldn't hold on to pages forever just because it needed them for a short time.

This has no relevance to parent's issue: how to avoid reads that cause expensive page faults when writing to a file-backed page.
Yea, I misread "issues" as "uses". Sorry.
As other have said, you need hardware support to do this similarly to how GPUs do it.

That being said, that hardware support exists with NVDIMMs.

I believe GPU solves this by having read only and write only buffers in the rendering pipeline.