|
|
|
|
|
by dzaima
742 days ago
|
|
Yeah, it's certainly possible that it's not double-pumping. Should be roughly possible to test via comparing latency upon inserting a vandpd between two vpermd's (though then there are questions about bypass networks; and of course if we can't measure which method is used it doesn't matter for us anyway); don't have a Zen 4 to test on though. But of note is that, at least in uops.info's data[0], there's one perf counter increment per instruction, and all four pipes get non-zero equally-distributed totals, which seems to me much simpler to achieve with double-pumping (though not impossible with splitting across ports; something like incrementing a random one. I'd expect biased results though). Then again, Agner says "512-bit vector instructions are executed with a single μop using two 256-bit pipes simultaneously". [0]: https://uops.info/html-tp/ZEN4/VPADDB_ZMM_ZMM_ZMM-Measuremen... |
|