|
|
|
|
|
by paulmd
1508 days ago
|
|
CFD is highly memory-bandwidth-bottlenecked, it is in fact pretty much the prototypical memory-bandwidth-bottlenecked task. The performance scaling you see between systems pretty much corresponds to the memory bandwidth in those configurations. Note that on the M1, the CPU can only access a fraction (about 25% iirc) of the total memory bandwidth, you have to use the GPU to really get the full performance of the M1 here. |
|
Also keep in mind that normal x86-64's, even without an IGP only get about 60-65% of peak, even with nothing else sharing the memory bus. I often see this quantified with McCalpin's stream benchmark.
So the M1 Ultra likely has a pretty impressive memory bandwidth of around 440GB/sec, which isn't a large fraction of 800GB/sec, but it still more than any other desktop or server chip I know of. The AMD Epcy maxes out at 8 channels of DDR-3200, which is in the neighborhood of 208GB/sec peak, with an observed bandwidth of 110-120GB/sec.