Hacker News new | ask | show | jobs
by eigenform 480 days ago
AFAIK you have to think about how many different 512b paths are being driven when this happens, like each cycle in the steady-state case is simultaneously (in the case where you can do two vfmadd132ps per cycle):

- Capturing 2x512b from the L1D cache

- Sending 2x512b to the vector register file

- Capturing 4x512b values from the vector register file

- Actually multiplying 4x512b values

- Sending 2x512b results to the vector register file

.. and probably more?? That's already like 14*512 wires [switching constantly at 5Ghz!!], and there are probably even more intermediate stages?

1 comments

… per core. There are eight per compute tile!

I like to ask IT people a trick question: how many numbers can a modern CPU multiply in the time it takes light to cross a room?