Hacker News new | ask | show | jobs
by jedbrown 2139 days ago
A100 has a 20% edge on energy efficiency for HPL, along with higher intrinsic latencies. It's also 6-12 months behind A64FX in deployment. https://www.top500.org/lists/green500/2020/06/

HPCG mostly tests memory bandwidth rather than interconnect, but Fugaku does have a great network.

Adding DRAM to a GPU-heavy machine has limited benefit due to the relatively low bandwidth to the device. They're effectively both HBM machines if you need the ~TB bandwidth per device (or per socket).

Normalizing per node (versus per energy or cost) isn't particularly useful unless your software doesn't work well with distributed memory.

1 comments

> Adding DRAM to a GPU-heavy machine has limited benefit due to the relatively low bandwidth to the device. They're effectively both HBM machines if you need the ~TB bandwidth per device (or per socket).

This POWER10 chip under discussion has 1TB bandwidth to devices with expandable RAM.

Yeah, I didn't think it was possible. But... congrats to IBM for getting this done. Within the context of this hypothetical POWER10, 1TB bandwidth interconnects to expandable RAM is on the table.

It's 410 GB/s peak for DDR5. The "up to 800 GB/s sustained" is for GDDR6 and POWER10 isn't slated to ship until Q4 2021 so it isn't really a direct comparison with hardware that was deployed in 2019.
IIRC, the new memory connection they introduced with the latest POWER9 chips traded bandwidth and physical distance for a bit of latency.
Slides posted earlier in this thread say the OMI (the memory interface) adds about 10ns compared to straight DDR5.