Hacker News new | ask | show | jobs
by gmueckl 5 days ago
That sounds like a lot, but: modern renderers do between 20 to 40 passes, many of them in screen space. And each screen space pass typically reads from at least two input images, sometimes 3 or 4 even with optimally packed inputs. At 60fps that can quickly get up to way over 2000 full screen buffer reads per second and more for less than optimal access patterns in some algorithms. That also doesn't account for texture access during shading passes, which are somewhat random memory accesses.
1 comments

Very true, but I'll point out that even those 2000 full screen reads per second at 4k are only 4% of the 5090's bandwidth. Sacrificing some of that speed for a unified memory architecture seems like a good trade.

Plus, DLSS can greatly reduce the bandwidth requirements for 4K gaming.

I'm being very, very conservative with my estimates here. Based on the renderers I know, I could have easily tweaked the numbers to go up to 8000 full screen texture reads per second. That doesn't include texture or geometry or BVH reads or any memory writes. That is all in addition to those operations.
But do you think you'll reach 1.8 TB/s?
Quite likely, but the transfer throughput is required in bursts, not necessarily continously.

Let me put it this way: what I care about is how quickly data arrives after a bunch of shader threads request it. Throughput is one way for hardware to reduce that time. The other way is to hide the latency (GPUs do a lot to keep themselves busy while waiting for memory), but those strategies can only do so much.

Lower memory throughput almost always leads to a longer runtime of GPU calls in practice, and thus lower update rates.

Empirically, these benchmarks are showing it doesn't make much difference once you reach this level of bandwidth: https://www.tomshardware.com/pc-components/gpus/early-rtx-50...