| "So AMD runs at 2/3 the IPC of an old Intel processor. That is quite poor!" That is most certainly an overreach. An extraordinary overreach. Worse, it's absurdly using an AVX2 codebase, optimized for Westmere, as the baseline for "IPC" testing? The premise itself borders of gross negligence. IPC as a generalized concept is a broad, general purpose set of instructions, not an absurdly narrow test. Saying "Intel is faster at AVX512" is going to surprise exactly no one, and also happens to be irrelevant for the overwhelming majority of users and uses. The microbenchmarking thing has gone on for years, and at this point anyone who has paid any attention is rightly cautious when stomping their feet and making declarations, because usually they're just pouring noise into the mix. Lazily running a couple of tiny tests is not the rigour to avoid deserved criticism. |
I agree using Westmere isn't necessarily the best approach, but there is no difference in this case with either -march=native or -march=znver1.
The loop is small and simple, with only 9 instructions and compiles more or less the same regardless of march setting (I observed some basically no-op changes such as a mov and blsr swapping places). Here's the assembly (for the second test, with the bigger IPC gap):