Hacker News new | ask | show | jobs
by artemisart 294 days ago
No, you never compute individual pixels because you never need to, and it's always faster to it in bulk (vectorization, memory access...) and so over an area you take the same number of pixels as input (or a little bit more with padding) and the blur will only increase significantly the compute.
2 comments

You misunderstood, this is not about computing individual pixels but only selective rerendering graphical elements which have been changed, and in turn figuring out the total area of change. This propagates through the entire stack to let the GPU scanout hardware know which tiles have changed, and allow partial panel self refresh updates (depending on hardware).

Rendering is still done in bulk for the changed areas, avoiding rendering expensive elements (e.g., transformed video buffers, deeply layered effects, expensive shaders). It's a fundamental part of most UI frameworks.

Are windowed GUIs still doing diffed screen updates? I would have assumed that GPUs make this kind of thing very unrewarding to implement as an optimisation. I'd imagine every window is being redrawn every frame as a 2D billboard with textures and shaders.

The Guassian blur and lensing effects would still slow things down by needing to fetch pixels from the render target to compute the fragment, vs painting opaque pixels.

The usual mechanism is to mark widgets that changed dirty, accumulate the bounding boxes of such dirty areas, take the next swapchain buffer and get its invalid regions, iterate through the widget tree and render anything that intersects with the bounding box or invalid regions, and submit the buffer + the dirty areas to the display server/driver.

And yeah, having a render step depend on the output of a previous non-trivial render step is Badâ„¢.

I was under the impression that for GPU accelerated GUIs, all windows are rendered to a render target. It might be that windows underneath have gone to sleep and aren't updating, but they would have their last state rendered to a texture. This permits things like roll-over previews and layered effects to have a more trivial overhead.

Software renderers typically do the optimisation you're suggesting to reduce on memory and CPU consumption, and this was a bigger deal back in the day when they were the only option. I think some VNC-like protocols benefit from this kind of lazy rendering, but the actual VNC protocol just diffs the entire frame.

On the GPU, the penalty for uploading textures via the bus negate the benefit, and the memory and processing burden is minimal relative to AAA games which are pushing trillions of pixel computations, and using GBs of compressed textures. GPUs are built more like signal processors and have quite large bus sizes, with memory arranged to make adjacent pixels more local to each other. Their very nature makes the kinds of graphics demands of a 2D GUI very negligible.

> I was under the impression that for GPU accelerated GUIs, all windows are rendered to a render target.

Each window renders to one or more buffers that they submit to the display server, which will then be either software or hardware composited ("software" here referring to using the GPU to render a single output buffers vs. having the GPU scanout hardware stitch the final image together from all the source buffers directly).

Note that in the iPhone cases, the glass blur is mostly an internal widget rendered by the app, what is emitted to the display server/hardware is opaque.

> It might be that windows underneath have gone to sleep and aren't updating,

The problem with blur is when content underneath does update, it requires the blur to also update, and rendering of it cannot start until the content underneath completed rendering.

> Software renderers typically do the optimisation you're suggesting to reduce on memory and CPU consumption,

I am solely speaking about GPU-accelerated rendering, where this optimization is critical for power efficiency. It's also required to propagate all the way down to the actual scanout hardware.

It also applies to CPU rendering (and gpu-accelerated rendering still CPU renders many assets), but that's not what we're talking about here.

> I think some VNC-like protocols benefit from this kind of lazy rendering, but the actual VNC protocol just diffs the entire frame.

Most modern, non-VNC remote desktop protocols use h264 video encoding. Damage is still propagated all the way through so that the client knows which areas changed.

The frames are not "diffed" except by the h264 encoder on the server side, which may or may not be using damage as input. The client has priority for optimization here.

> Their very nature makes the kinds of graphics demands of a 2D GUI very negligible.

An iPhone 16 Pro Max at 120 fps is sending 13.5 Gb/s to the display, and the internal memory requirements are much higher. This is expensive.

Not rendering a texture and being able to pass it off to scanout hardware so that the render units can stay off is the difference between a laptop giving you a ~5 hour battery life and a 15-20+ hour battery life.

The GPU could texture your socks off, but you're paying a tax every microsecond your GPU's render units are active, which matter when you're battery powered or thermally constrained. This is why display servers and GUI toolkits go through lengths to not render anything.

> Note that in the iPhone cases, the glass blur is mostly an internal widget rendered by the app, what is emitted to the display server/hardware is opaque.

This sounds wild to me, so I'm just going to ask. Do you work on these kind of optimisations for a modern OS? If so, just ignore my ponderings and I'll just accept what you're saying here.

I honestly couldn't imagine this kind of compositing not happening completely on the GPU or requiring any back and forth between the CPU and GPU. That is, the windowing system creates a display list, and that display list is dispatched to the GPU along with any assets it requires (icons, font etc.). I'd also imagine this is the same as how the browser renders.

As for optimisations, if the display list is the same for a particular render target (e.g., window, widget, subsection, or entire screen), there's no reason to rerender it. There's no reason to even rebuild the display list for an application that is asleep or backgrounded. Tile-based culling and selective update of the screen buffer^ can also happen at the GPU level. Though hierarchical culling at the CPU level would be trivial and low-cost.

This is not my wheelhouse, so perhaps I'm missing something crucial here.

^ Edit: It does look like the Apple silicon GPUs do use tile-based deferred rendering.

https://developer.apple.com/documentation/metal/tailor-your-...

>you never compute individual pixels because you never need to

Pixel shaders are looking at this laughing at you. PS_OUTPUT is a single pixel whether you want it or not. PS wavefronts are usually very small, so you're still going to be doing a lot of sampling.