| From what I've read now [0], it looks like AMD still uses 2 x 128bit AVX units to execute AVX2 instructions. Also, AMD is always coming a generation behind Intel in terms of FP instructions sets, so Zen doesn't support AVX512. According to WikiChip [4], Zen 2 actually has 256 bit FPU paths. I was unable to find a credible benchmark for Zen 2, so I can't talk about its performance. However, when analyzed from the perspective I've given below, it's not hard to assume that Zen 2 is a heavy hitter in terms of floating point performance. However, the interesting part is, when you look to SpecCPU 2017 FP Rate [1], AMD Epyc 7601 [2] system has a similar per core performance with a much bigger Intel Xeon Platinum 8180 [3] system. Why interesting? * AMD's per core base (lowest) rate is 4.1875.
* Intel's per core base (lowest) rate is 4.3482.
* AMD is running GCC compiled code.
* Intel is running Intel compiled code.
* Intel has higher clock speed.
Intel has some CPUs (like Gold 5118, Gold 6148) which have per core base rate of ~5.125. These are the CPUs are considered as HPC processors, and used by a lot of people.As I said before, it looks like Zen 2 is going to be a better HPC processor than Zen. Zen looks like a very good Enterprise processor now. So with my hat, I can conclude that not having 512 bit hardware is not a crippling omission. Addenda: I forgot to say that Intel has something called "AVX frequency". Since AVX, AVX2 and AVX512 has tremendous power requirements when compared to other operations, Intel lowers CPU to an undisclosed frequency. When I last checked, AVX frequencies of Intel CPUs that we use weren't in the technical guides and were not public in any way. So, the peak SpecFP Rate is not very different from the base ones. Also, since the CPUs thermal budget is very constrained during AVXx operations, other ports' speed is also reduced. So at the end of the day, AVX512 is not a free turbo boost in HPC environments and heavy/continuous loads. [0]: https://en.wikichip.org/wiki/amd/microarchitectures/zen#Floa...
[1]: http://spec.org/cpu2017/results/rfp2017.html
[2]: http://spec.org/cpu2017/results/res2018q4/cpu2017-20180917-0...
[3]: http://spec.org/cpu2017/results/res2017q4/cpu2017-20171017-0...
[4]: https://en.wikichip.org/wiki/amd/microarchitectures/zen_2#Ke... |
So even though the Intel CPU can in theory do 4x the computation AMD can in the vector units, in reality even the tightest real vector code does all kinds of things other than vector computation, in the middle of that vector stuff, like computing addresses for loads and stores and managing loop variables. On AMD, those intermixed scalar instructions go into separate scalar ports, on Intel CPUs they take space in the same issue slots that the vector code uses.
Then on top of that, the memory bandwidth is a great equalizer. Doesn't matter how many multiplies you can compute if you cannot load the operands, and the AMD systems are much closer there than they are in the pure computation, especially as they have a lot more L3 cache per core.
On Zen 2, AMD does two big things that are going to really help them in HPC loads. They are doubling vector unit width, and they are doubling the amount of L3 per core. I honestly think the second change will help more than the first.