Hacker News new | ask | show | jobs
by moggi 866 days ago
A really good presentation showing why the Xeon Max can't reach full HBM bandwidth: https://www.ixpug.org/images/docs/ISC23/McCalpin_SPR_BW_limi...
2 comments

That's a great paper. I've tangled with the line fill buffer issue before, along with other hidden limits such as infinity fabric limits on AMD systems. It's one of the first big disppointments when you start doing high performance, bandwidth-limited compute work. It's also one that's almost entirely undocumented which is frustrating.
Interesting paper. I focus more on the latency than bandwidth. The paper gets a few things wrong, DDR5 is not a single 64bit channel, but 2 x 32 bit channels. So the normal Xeon is 16 x 32 bit channels, not 8x64 bit.

He talks about cache misses as 60ns, which glosses over that approximately half of that is missing through L1/L2/L3, then you enter the queue for the memory controller, for the memory channel you need. As a result you only get half the bandwidth if you only have a single request pending per channel.