| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dragontamer 2139 days ago

Fugaku has 158,976 nodes x2 chips each, or 317,952 A64FX chips.

Summit has 4,608 nodes x 6 GPUs each, or 27,648 V100 GPUs. It also was built back in 2018.

---------

While Fugaku is certainly an interesting design, it seems inevitable that a modern GPU (say A100 Amperes) would crush it in FLOPs. Really, Fugaku's most interesting point is its high rate of HPCG, showing that its interconnect is hugely efficient.

Per-node, Fugaku is weaker. They built an amazing interconnect to compensate for that weakness. Fugaku also is an HBM-based computer, meaning you cannot easily add or remove RAM (like a CPU / GPU team can configure to more, or less RAM by adding sticks).

These are the little differences that make a difference in practicality. But yes, A64FX is certainly an accomplishment, but I wouldn't go so far as to say its proven that CPUs can keep up with GPUs in terms of raw FLOPs.

1 comments

jedbrown 2139 days ago

A100 has a 20% edge on energy efficiency for HPL, along with higher intrinsic latencies. It's also 6-12 months behind A64FX in deployment. https://www.top500.org/lists/green500/2020/06/

HPCG mostly tests memory bandwidth rather than interconnect, but Fugaku does have a great network.

Adding DRAM to a GPU-heavy machine has limited benefit due to the relatively low bandwidth to the device. They're effectively both HBM machines if you need the ~TB bandwidth per device (or per socket).

Normalizing per node (versus per energy or cost) isn't particularly useful unless your software doesn't work well with distributed memory.

link

dragontamer 2139 days ago

> Adding DRAM to a GPU-heavy machine has limited benefit due to the relatively low bandwidth to the device. They're effectively both HBM machines if you need the ~TB bandwidth per device (or per socket).

This POWER10 chip under discussion has 1TB bandwidth to devices with expandable RAM.

Yeah, I didn't think it was possible. But... congrats to IBM for getting this done. Within the context of this hypothetical POWER10, 1TB bandwidth interconnects to expandable RAM is on the table.

link

jedbrown 2139 days ago

It's 410 GB/s peak for DDR5. The "up to 800 GB/s sustained" is for GDDR6 and POWER10 isn't slated to ship until Q4 2021 so it isn't really a direct comparison with hardware that was deployed in 2019.

link

rbanffy 2139 days ago

IIRC, the new memory connection they introduced with the latest POWER9 chips traded bandwidth and physical distance for a bit of latency.

link

jabl 2138 days ago

Slides posted earlier in this thread say the OMI (the memory interface) adds about 10ns compared to straight DDR5.

link