| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dragontamer 1657 days ago

> Seems this is an HPC product, but many HPC apps are memory bound.

The point of a supercomputer is to throw so much compute at a problem, that everything else is the bottleneck.

If an HPC app is memory-bound, then the GPU / Supercomputer was successful at its job. So many HPC apps are memory bound because... well... turns out our machines are actually quite good.

In any case, MI200 has 1.6x the bandwidth as the A100. So if you have a massively-parallel use-case that is memory bound, the MI200 line should have an advantage.

-------

The main issue IMO, is that the MI200's 1.6x bandwidth is really 80% bandwidth applied over two die, connected with a incredible amount of "infinity fabric" links to share the data. I have to imagine that the A100's larger design wins in some cases over the MI200's chiplet design.

1 comments

volta83 1653 days ago

I agree with you, which is why I don't really understand what the point of improving FP64 perf by 4x is, if that is not the bottleneck for many apps.

Per node, a 4x MI250X node has more or less the same BW as a DGX-A100 (8x A100). It has 2x more FP64 compute, but for most science and engineering apps, which are memory bound, 2x more FP64 compute does not make these apps any faster.

link