Hacker News new | ask | show | jobs
by dzaima 742 days ago
Yeah, it's certainly possible that it's not double-pumping. Should be roughly possible to test via comparing latency upon inserting a vandpd between two vpermd's (though then there are questions about bypass networks; and of course if we can't measure which method is used it doesn't matter for us anyway); don't have a Zen 4 to test on though.

But of note is that, at least in uops.info's data[0], there's one perf counter increment per instruction, and all four pipes get non-zero equally-distributed totals, which seems to me much simpler to achieve with double-pumping (though not impossible with splitting across ports; something like incrementing a random one. I'd expect biased results though).

Then again, Agner says "512-bit vector instructions are executed with a single μop using two 256-bit pipes simultaneously".

[0]: https://uops.info/html-tp/ZEN4/VPADDB_ZMM_ZMM_ZMM-Measuremen...

1 comments

it seems plausible that they could be using power of 2 random choices to keep the counts even.