Hacker News new | ask | show | jobs
by dragontamer 1373 days ago
Or... supercomputer users don't care about latency in GCN/CDNA applications.

500ns to access main memory, and lol 120 nanoseconds to access L1 cache is pretty awful. CPUs can access RAM in less latency than Vega/GCN can access L1 cache. Indeed, RDNA's main-memory access is approaching Vega/GCN's L2 latency.

----------

This has to be an explicit design decision on behalf of AMD's team to push GFLOPS higher and higher. But as I stated earlier: video game programmers want faster latency on their shaders. "More like NVidia", as you put it.

Seemingly, the supercomputer market is willing to put up with these bad latency scores.

1 comments

But why would game programmers care about shader core latency??? I seriously don't understand.

We're not talking here about the latency that gamers care about, the one that's measured in milliseconds.

I've never seen any literature that complained about load/store access latency in the shader core. It's just so low level...

> But why would game programmers care about shader core latency??? I seriously don't understand.

Well, I don't know per se. What I can say is that the various improvements AMD made to RDNA did the following:

1. Barely increased TFLOPs -- Especially compared to CDNA, it is clear that RDNA has fewer FLOPs

2. Despite #1, improved gaming performance dramatically

--------

When we look at RDNA, we can see that many, many latency numbers improved (though throughput numbers, like TFLOPs, aren't that much better than Vega 7). Its clear that the RDNA team did some kind of analysis into the kinds of shaders that are used by video game programmers, and tailored RDNA to match them better.

> I've never seen any literature that complained about load/store access latency in the shader core. It's just so low level...

Those are just things I've noticed about the RDNA architecture. Maybe I'm latching onto the wrong things here, but... its clear that RDNA was aimed at the gaming workload.

Perhaps modern shaders are no longer just brute-force vertex/pixel style shaders, but are instead doing far more complex things. These more complicated shaders could be more latency bound rather than TFLOPs bound.