|
|
|
|
|
by martinpw
3401 days ago
|
|
> I bet if you're using Intel MLK for example, it will probably "magically" get faster on these Skylake machines by using AVX512 automatically. (Aside, I think you mean MKL.) There is a 200MHz clock reduction when running AVX512 instructions. If your code makes heavy use of AVX512 there is of course still a big net win, but I'm curious of the impact with more heterogeneous workloads. We have an app that is a mixture of scalar and vector code. Some, but not all, of the vector code would benefit from 512 bit vectors. But how much does the clock slowdown when running this code bleed over into running the other non-AVX512 code? I guess I'm asking how quickly it clocks down, and how quickly the full clock speed is restored. Worst case it seems you could be running full time at a 200MHz slowdown due to blocks AVX512 instructions scattered throughout the application. Is that a valid concern? |
|
I'd have to look up the specifics; but does AVX512 simply slow the clock, or does it actually have some kind of limited number of hardware ports? I wonder if some clock slowdown would be very much of an issue, since clock-for-clock, you should see better performance on Skylake anyway.
Just curious, what kind of workloads do you think you're looking at here?