|
|
|
|
|
by eigenform
480 days ago
|
|
AFAIK you have to think about how many different 512b paths are being driven when this happens, like each cycle in the steady-state case is simultaneously (in the case where you can do two vfmadd132ps per cycle): - Capturing 2x512b from the L1D cache - Sending 2x512b to the vector register file - Capturing 4x512b values from the vector register file - Actually multiplying 4x512b values - Sending 2x512b results to the vector register file .. and probably more?? That's already like 14*512 wires [switching constantly at 5Ghz!!], and there are probably even more intermediate stages? |
|
I like to ask IT people a trick question: how many numbers can a modern CPU multiply in the time it takes light to cross a room?