Hacker News new | ask | show | jobs
by SturgeonsLaw 3812 days ago
> there is no workable solution

Forgive my ignorance (not a graphics programmer), but why can't the drivers simply clear the buffer before handing it off to another application?

2 comments

Pretty sure it has to do with benchmarks and the cut-throat competitive environment GPU-manufacturers exist in where you cut all corners to proclaim "We're the fastest!"

Not zeroing a buffer cuts a big constant out of overhead. If you know which of the benchmarks will fail if you don't zero the buffer, you code in an "exception" so the benchmark doesn't fail and other applications act wonky. This isn't the "first time" nVidia has been caught doing this, see:

http://www.geek.com/games/is-nvidia-cheating-on-benchmarks-5...

Apparently, AMD also partakes in benchmark-specific ''optimizations''[1]. Transparency is why many of us push for open source drivers.

http://www.cdrinfo.com/Sections/News/Details.aspx?NewsId=288...

Both AMD and NVidia drivers have special code paths for different applications. I don't think it's anything sinister since these are mostly fixes for the game bugs and the rest are resolutions for API ambiguities.

To give an example, consider the difference between memcpy() and memmove(). On most systems memcpy() is as same as memmove() in the sense it works even when the source and destination overlap. Then you decide to optimize memcpy and to prevent bugs like this https://bugzilla.redhat.com/show_bug.cgi?id=638477 you will need to set a flag USE_MEMMOVE_INSTEAD_MEMCPY for every app that you know to memcpy between overlapped regions. You could call this "cheating" or could be a reasonable person and say something like this https://bugzilla.redhat.com/show_bug.cgi?id=638477#c129 instead.

As for the original question. I am not an expert on the windows driver model but have written some GPU drivers and can tell that a) memory release is asynchronous i.e. you cannot reuse the memory until the GPU finishes using it and b) clearing graphics memory from CPU over the PCIe is slow and drivers, in general, do not program GPU on their own. Taking these into account, it seems the driver is not well positioned to do this and this is a task for the OS instead.

"I don't think it's anything sinister since these are mostly fixes for the game bugs and the rest are resolutions for API ambiguities." I think it is big problem. It is the same as forcing intel to change their CPU to workaround bugs in your application.
This analogy is pretty spot on, actually. It's a result of a long process of software evolution that went awry and this is a big reason why we need new APIs like Mantle, Vulkan and DirectX 12.

See this fascinating post:

http://www.gamedev.net/topic/666419-what-are-your-opinions-o...

You mean something like this https://en.wikipedia.org/wiki/A20_line ?
One could argue that address overflow above 1MB was not a bug, but a feature of the early real-mode CPUs and hence (ab)using it wasn't really a bug either.

Probably even Intel didn't anticipate protected mode with its 24 bit address bus when designing the 8086. 1MB was enough for everyone at this time.

This probably won't happen, but it seems that games programmers are a large cause of problems for driver writers. Having to workaround bugs in games is bad for everyone.

Games Studios, IMO, should be made to fix their bugs themselves. They all have patching mechanisms these days, so it's not like it isn't impossible, or even unfeasible.

Not having this much problems fixed in the API is being currently worked on with DX12 and Vulcan. The point being removing a huge bunch of the abstraction provided by dx/opengl and thus forcing the dev to write more sensible code.

Currently the engine developer in graphics programming writes something and in reality he has no way of knowing what actually happens on the hardware (the API is just too high level to able to really know much). From there it is the hardware providers job to take out their own debugging tools and make sure correct things happen by having a custom code path in the driver.

It's a bit of the opposite, actually. There was a great article posted here (titled "Why I'm excited for Vulkan") where they explain how proprietary "tricks" GPU vendors use account for much of the necessity for game specific driver updates and optimizations. Game patches are to game bugs what driver updates (or "game profiles") are to what?

Lower level APIs like DX12 and Vulkan remove the competitive advantage vendor dependent performance creates, so well-coded games can perform consistently with lower overhead across ranges of hardware without having to rely on vendors to patch in the shortcuts through their drivers.

Currently, it's like filming a movie with IMAX specifications, then finding out that at different cinema chains it played with quality aberrations because their projectors didn't truly follow IMAX spec. The chains can fix it, but you're already getting blamed for the movie's issues. However, for a little money, on your next film they offer to work closely with you to ensure it shows the way you intended in their theaters. And no, they can't just tell you how to fix it-- their projection technology is a trade secret.

Eh given the constraints as you spell them out it seems a clear at the release moment driven by a shader could work.
This is probably because my explanation is very brief. I don't see how a shader (a program running on the GPU) can detect that the OS has killed a process and initiate a clear.
shader is a program executed by the gpu and can manipulate the memory, driver can create a fake surface out the freed memory and run the shader on it (which would avoid the need of zeroing the memory from the cpu trough the pcie)
I don't get it, though. How can the reason be due to benchmarks / performance seeking?

The driver simply has to zero the buffer when the new OpenGL / graphics context is established. It's once per application establishing a context, not per-frame (the application is responsible for per-frame buffer clearing and the associated costs). At worst this would lengthen the amount of time a GPU-using application takes to start up and open new viewports, but that hardly seems like it would matter or even register on any benchmarks.

The thing is it probably not once per application. I'd imagine using multiple frame buffers in an application is actually quite common and could change quite often while an application is running; especially in complex applications like games. It's probably not enough of a hit to really justify not clearing the buffer but it's enough to make it noticeable in the benchmark race.
Across an entire computer system? For all applications? Even games? Sharing data across process boundaries is undesirable, but is something most computer users would accept if the alternative was reduced performance.

Why not just fix this in the browser? The real issue here is that this data isn't just being shared across processes but potentially with websites through malicious webgl.

If you can waste the time on allocating a buffer, you can waste the time on zeroing it. If you're in a hot loop you shouldn't be allocating giant chunks of memory.
It would be interesting to know how the cost of allocating a new fbo would compare to the cost of zeroing it out. My guess is that the cost of getting into the kernel to do the allocation in the first place would dominate, but by how much would be something neat to measure.
If it's costing a lot of time to clear a buffer, doesn't that tend to indicate that's something the video card manufacturer should design an enhancement or fix for?