| > and the display server will wait on the relevant fences (on the CPU side, which is responsible for most GPU scheduling tasks) before starting any render task that would texture that buffer or before attempting scanout from the buffer. Given the fetching and compute performance per watt of modern GPUs I'm still surprised that the watts saving of reducing overdraw is anything but negligible, and certainly if you're talking about pipeline stalls, having pixel data shuttling over the bus between the GPU and CPU seems like a much bigger deal? > ^2: One could implement this instead by creating a different render list for just the area the blur needs to sample instead, in the hopes that this will render much faster and avoid waiting on completion of the primary buffer, but that would be an app specific optimization with a lot of limitations that may end up being much slower in many scenarios. It looks like Apple Silicon avoids the overdraw problem with TBDR, and the tile system would efficiently manage the dependency chain right back to the desktop background if needed. So if a browser is maximised over a bunch of other windows, only a portion of a portion of render targets are being sampled, with no intermediate CPU rendering. To me, the flex by Apple here is that they can do this efficiently, because their rendering system is likely fully GPU and also resource efficient in a way that other typical display servers and GPUs can't be. For this to work on Linux or Windows, a complete refactoring of the display servers would be required, and it would only service GPUs that have tile-based deferred rendering, which seems to be nil outside of Apple's Silicon (and their older PowerVR chips). |