| It depends. AMD Zen has 4 128-bit SIMD units, while Skylake-S (and earlier) Intel chips have 3 256-bit SIMD units, and Skylake-X/SP (hereafter SKX) chips additionally have 1 or 2 512-bit SIMD units (which overlap with the 256-bit units). Now not all units can run all instructions. E.g., Intel chips run FP instructions on only 2 of those units, but AMD can run FP instructions on all 4, so in that sense they are even. However, not all FP instructions can run on all units on AMD: FP multiplications run on only 2 units, and FP additions run on two units - so if you are doing all multiplies, both AMD and Intel can do 2 per cycle and since Intel is twice (quadruple on SKX with 2 FMA units) as wide, then Intel is twice as fast. If you do a 1:1 mix of mul and add, however, then AMD and Intel may be tied - but then AMD is further hamstrung by only have 2 128-bit load units, vs Intel's 2 256-bit load units (512-bit on SKX) - so it is entirely possible that many kernels are limited by load/store throughput on AMD. For things like in-lane shuffles, AMD and Intel (pre-SKX) have the same throughput: AMD has 2 128-bit shuffle units, and Intel 1 256-bit one. For cross-lane shuffles the 128-bit AMD units really struggle since multiple ops are required and Intel wins big. So AMD's AVX-256 perf being half of Intel's is more or less the worst case, and many cases will see closer performance. If Zen 2 doubles everything to 256-bits everything will change dramatically. |
No one has announced 256 AVX in Zen 2; while I'm sure it's possible, I'd think AMD would be advertising that at some point if it were happening. On the other hand, they've doubled the chiplet per socket count, which may compensate somewhat for the reduced AVX width per core.
For the record, I'm firmly in AMD's camp here; I appreciate both the underdog aspect and the renewed competition in the x86 space. I own a first-gen threadripper myself. But it's still important to acknowledge where Zen falls short compared to Skylake-X.