|
|
|
|
|
by TomVDB
1366 days ago
|
|
Density, I can accept. But what kind of latency are we talking about here? CDNA has 16-wide SIMD units that retires 1 64-wide warp instruction every 4 clock cycles. RDNA has a 32-wide SIMD unit that retires 1 32-wide warp every clock cycle. (It's uncanny how similar it to to Nvidia's Maxwell and Pascal architecture.) Your 1/4 number makes me think that you're talking about a latency that has nothing to do with reads from memory, but with the rate at which instructions are retired? Or does it have to with the depth of the instruction pipeline? As long as there's sufficient occupancy, a latency difference of a few clock cycles shouldn't mean anything in the context of a thousand clock cycle latency for accessing DRAM? |
|
That's what's faster.
Vega64 accesses HBM in like 500 nanoseconds. (https://www.reddit.com/r/ROCm/comments/iy2rfw/752_clock_tick...)
RDNA2 accesses GDDR6 in like 200 nanoseconds. (https://www.techpowerup.com/281178/gpu-memory-latency-tested...)
EDIT: So it looks like my memory was bad. I could have sworn RDNA2 was faster (Maybe I was thinking of the faster L1/L2 caches of RDNA?) Either way, its clear that Vega/GCN has much, much worse memory latency. I've updated the numbers above and also edited this post a few times as I looked stuff up.