Hacker News new | ask | show | jobs
by pstrateman 3821 days ago
There's performance overhead to clearing memory.

If you were them would you take the performance hit?

6 comments

GPUs have loads of memory bandwidth. I can't imagine a framebuffer taking more than a few microseconds to clear.

For example, Nvidia claims that the GTX 980 has a memory bandwidth of 223 GB/s. (1920 * 1080 * 3)/223e9 = 27us. Clearing all 4GB of VRAM would take 4/223 = 18ms. This would have a negligible impact on user experience in most cases.

I guess the driver could also erase memory in the background as soon as it is deallocated, with zero user impact.

Performance is important because people use benchmarks to choose between vendors.

And even if it's <0.1%, there is strong "optimization mentality" in those companies (because perf matters) so it's unlikely to happen in the current climate.

It's high bandwidth but also high latency. If a page was cleared every time it was allocated, it would cause very unpredictable performance because the CPU would have to tell the GPU to clear memory and then wait for the GPU to finish before any other operations on the buffer could be done. This definitely isn't something that should happen for every allocated page. It might be acceptable if this happened only to pages previously used by other processes. But it would still be unpredictable and could cause unwanted stalls in the middle of a game session, for example.

Also note that memory bandwidth is typically the bottleneck in modern games.

The best place to do this would be in the browser, clearing out any textures and buffers before deallocating them if the contents are deemed private.

>because the CPU would have to tell the GPU to clear memory and then wait for the GPU to finish before any other operations on the buffer could be done

If you're doing write-only operations, the CPU can queue them behind the clear. If you do a read, then the CPU has to wait whether you clear or not.

Latency doesn't matter. Clearing can be slotted in with other operations, such as first use.

As the article points out, every modern OS clears (main) memory before handing it over to a new process. The cost is often mitigated a bit by using spare CPU cycles to zero out free pages, and by keeping a buffer of such pages. You only need to pause to zero pages if you have sustained 100% CPU usage for a long time - and that's pretty rare on most machines. GPUs probable even more so.

GPUs generally have less memory, many fewer (but larger) allocations, and way higher memory bandwidth than CPUs, so it shouldn't be a problem for them to do this.

That's a loaded question.

If I (with training in how to design an OS and the risks of handing nonzeroed pages to another process) were them? It'd be part of my standard process for designing a memory repurposing library. But I can 100% understand how this mistake gets made; I wouldn't be surprised if it wasn't an explicit performance decision.

There's performance overhead to doing everything, but one way to simulate zero'd virtual memory is to simply map them to a zero page and, when a full page write occurs to simply write the page, and when a partial write occurs to zero the rest.

I am not familiar with GPU internals enough, but my understanding is that the GPU should be smart enough to know that a given texture or framebuffer will occupy n full pages, and so when either is written in its entirety, the zeroing only has to occur at the edges. (I would assume that the write would start on a page, but I don't know anything about GPU internals.)

Caveat emptor: I will reiterate I know very little about memory internals. It seems like a bigger issue is that GPU memory is not virtualized and all users get access to the same memory. It's as if three decades of understanding the utility of virtual memory were forgotten.

> It seems like a bigger issue is that GPU memory is not virtualized and all users get access to the same memory. It's as if three decades of understanding the utility of virtual memory were forgotten.

I think you're forgetting the most important thing here - GPUs are meant to be fast. Virtualization will add like what, an order of magnitude to the access times?

GPUs have had MMUs for a while (though they don't recover from page faults the same way CPUs do, I don't believe).
Last I checked, they pretty much don't recover from page faults, they just abort whatever "program" you're trying to run, so you can't really use the MMU for clever things like demand paging. But that's not the point of the GPU's MMU in the first place.
> Last I checked, they pretty much don't recover from page faults, they just abort whatever "program" you're trying to run

Correct. When the GPU page faults, it causes a CPU interrupt and the driver will handle the interrupt. It's not possible to resume execution on a GPU in a timely manner so the only option is to terminate the process that caused the page fault.

> so you can't really use the MMU for clever things like demand paging

Recent GPU generations support "sparse" or "tiled" memory where the GPU can detect if a load or a store would access non-resident memory and then act accordingly. This requires a specialized shader and some CPU-side logic to actually stream in the memory. This can be used to on-demand paging for textures and buffers as well as implement workarounds to reduce visual artifacts from streaming.

GPUs don't have a page fault handler; when there's a page fault, it's an unrecoverable crash. Accordingly, zero-on-allocate (or potentially zero-on-free, but that makes assumptions about startup and teardown that may not be true) is the only way to do it.
Yes. Security and correctness should always come before performance. Performance first is the thinking that got us security vulnerabilities everywhere.
It's also the thinking that got us Doom. ;) There are use cases where performance trumps security; the only issue here is that "multi-app semi-trusted computing environment" isn't one of them.
What security compromises do you claim Doom made to achieve greater performance?
I don't think the parent is claiming there are, the point is that Doom wouldn't have been possible without coding with performance as a priority, but that's "ok" because in a lot of applications there isn't a permissions differential to worry about.
I'm curious about this as well.
Interested in details about this; where were Doom's major security issues? I love reading about Carmack/Doom development in general, maybe I've missed the part where caution was thrown to the wind and security was ignored.
Seems like there never were security issues. At least, none that were talked about widely. I think that should be expected for a single player game. Almost all games are hack-able in some way of course but hacking a single player game is mostly an exercise in replay-ability.

On a related note: Doom apparently does contribute to security proof of concepts though. http://www.techtimes.com/articles/15606/20140916/security-ex...

Which makes me wonder if the non-clearing memory issue exists for the printer's video driver and whether that could be used to retrieve something like a saved password or ssh key.

[citation needed]
It only needs to do it when first handing out a particular bit of memory to a particular process. The vast majority of the time, a process will be receiving memory it's had before (and no clearing is required). When not, it will be initializing the memory as part of the creation step (and no clearing is required), or it will be doing something it's not going to be doing all that often, such as creating a whole new frame buffer (and the clearing isn't a problem). I'm not convinced this would be a huge performance hit. Modern GPUs are not exactly slow at clearing memory either.

Maybe the system doesn't pass enough information through to the driver to let it determine this, though...