Hacker News new | ask | show | jobs
by pandaman 3811 days ago
Both AMD and NVidia drivers have special code paths for different applications. I don't think it's anything sinister since these are mostly fixes for the game bugs and the rest are resolutions for API ambiguities.

To give an example, consider the difference between memcpy() and memmove(). On most systems memcpy() is as same as memmove() in the sense it works even when the source and destination overlap. Then you decide to optimize memcpy and to prevent bugs like this https://bugzilla.redhat.com/show_bug.cgi?id=638477 you will need to set a flag USE_MEMMOVE_INSTEAD_MEMCPY for every app that you know to memcpy between overlapped regions. You could call this "cheating" or could be a reasonable person and say something like this https://bugzilla.redhat.com/show_bug.cgi?id=638477#c129 instead.

As for the original question. I am not an expert on the windows driver model but have written some GPU drivers and can tell that a) memory release is asynchronous i.e. you cannot reuse the memory until the GPU finishes using it and b) clearing graphics memory from CPU over the PCIe is slow and drivers, in general, do not program GPU on their own. Taking these into account, it seems the driver is not well positioned to do this and this is a task for the OS instead.

3 comments

"I don't think it's anything sinister since these are mostly fixes for the game bugs and the rest are resolutions for API ambiguities." I think it is big problem. It is the same as forcing intel to change their CPU to workaround bugs in your application.
This analogy is pretty spot on, actually. It's a result of a long process of software evolution that went awry and this is a big reason why we need new APIs like Mantle, Vulkan and DirectX 12.

See this fascinating post:

http://www.gamedev.net/topic/666419-what-are-your-opinions-o...

You mean something like this https://en.wikipedia.org/wiki/A20_line ?
One could argue that address overflow above 1MB was not a bug, but a feature of the early real-mode CPUs and hence (ab)using it wasn't really a bug either.

Probably even Intel didn't anticipate protected mode with its 24 bit address bus when designing the 8086. 1MB was enough for everyone at this time.

Exactly my point. Intel was "forced" to fix a bug in software by changing its hardware. The A20 gate was not to prevent programs from accessing "HMA" it was to fix programs, which generated addresses above 0xfffff and expected it to wrap around.
This probably won't happen, but it seems that games programmers are a large cause of problems for driver writers. Having to workaround bugs in games is bad for everyone.

Games Studios, IMO, should be made to fix their bugs themselves. They all have patching mechanisms these days, so it's not like it isn't impossible, or even unfeasible.

Not having this much problems fixed in the API is being currently worked on with DX12 and Vulcan. The point being removing a huge bunch of the abstraction provided by dx/opengl and thus forcing the dev to write more sensible code.

Currently the engine developer in graphics programming writes something and in reality he has no way of knowing what actually happens on the hardware (the API is just too high level to able to really know much). From there it is the hardware providers job to take out their own debugging tools and make sure correct things happen by having a custom code path in the driver.

It's a bit of the opposite, actually. There was a great article posted here (titled "Why I'm excited for Vulkan") where they explain how proprietary "tricks" GPU vendors use account for much of the necessity for game specific driver updates and optimizations. Game patches are to game bugs what driver updates (or "game profiles") are to what?

Lower level APIs like DX12 and Vulkan remove the competitive advantage vendor dependent performance creates, so well-coded games can perform consistently with lower overhead across ranges of hardware without having to rely on vendors to patch in the shortcuts through their drivers.

Currently, it's like filming a movie with IMAX specifications, then finding out that at different cinema chains it played with quality aberrations because their projectors didn't truly follow IMAX spec. The chains can fix it, but you're already getting blamed for the movie's issues. However, for a little money, on your next film they offer to work closely with you to ensure it shows the way you intended in their theaters. And no, they can't just tell you how to fix it-- their projection technology is a trade secret.

Eh given the constraints as you spell them out it seems a clear at the release moment driven by a shader could work.
This is probably because my explanation is very brief. I don't see how a shader (a program running on the GPU) can detect that the OS has killed a process and initiate a clear.
shader is a program executed by the gpu and can manipulate the memory, driver can create a fake surface out the freed memory and run the shader on it (which would avoid the need of zeroing the memory from the cpu trough the pcie)
Well, this is the whole point - how driver knows which memory is freed and how driver runs a shader by itself?
when you do a release on a texture object, when the context is destroyed, when the glDeleteTextures is called.. you just have to enumerate it all, but eventually all functions are passed to the graphic drivers to be translated into gpu operations.
It's as same as saying that a HDD driver can zero deleted files and delete temporary files when a process is killed because it translates API calls into HDD controller commands.