Hacker News new | ask | show | jobs
by TomVDB 1363 days ago
Thanks for that.

The weird part is that this latency difference has to be due to a terrible MC design by AMD, because there's not a huge difference in latency between any of the current DRAM technologies: the interface between HBM and GDDR (and regular DDR) is different, but the underlying method of accessing the data is similar enough for the access latency to be very similar as well.

1 comments

Or... supercomputer users don't care about latency in GCN/CDNA applications.

500ns to access main memory, and lol 120 nanoseconds to access L1 cache is pretty awful. CPUs can access RAM in less latency than Vega/GCN can access L1 cache. Indeed, RDNA's main-memory access is approaching Vega/GCN's L2 latency.

----------

This has to be an explicit design decision on behalf of AMD's team to push GFLOPS higher and higher. But as I stated earlier: video game programmers want faster latency on their shaders. "More like NVidia", as you put it.

Seemingly, the supercomputer market is willing to put up with these bad latency scores.

But why would game programmers care about shader core latency??? I seriously don't understand.

We're not talking here about the latency that gamers care about, the one that's measured in milliseconds.

I've never seen any literature that complained about load/store access latency in the shader core. It's just so low level...

> But why would game programmers care about shader core latency??? I seriously don't understand.

Well, I don't know per se. What I can say is that the various improvements AMD made to RDNA did the following:

1. Barely increased TFLOPs -- Especially compared to CDNA, it is clear that RDNA has fewer FLOPs

2. Despite #1, improved gaming performance dramatically

--------

When we look at RDNA, we can see that many, many latency numbers improved (though throughput numbers, like TFLOPs, aren't that much better than Vega 7). Its clear that the RDNA team did some kind of analysis into the kinds of shaders that are used by video game programmers, and tailored RDNA to match them better.

> I've never seen any literature that complained about load/store access latency in the shader core. It's just so low level...

Those are just things I've noticed about the RDNA architecture. Maybe I'm latching onto the wrong things here, but... its clear that RDNA was aimed at the gaming workload.

Perhaps modern shaders are no longer just brute-force vertex/pixel style shaders, but are instead doing far more complex things. These more complicated shaders could be more latency bound rather than TFLOPs bound.