Hacker News new | ask | show | jobs
by PixelOfDeath 2119 days ago
Isn't AVX512 basically cacheline-instructions?
1 comments

That's the way normal SIMD loads work, yeah.

But the scatter/gather instructions do random access memory operations. You have one SIMD register with a 8 (or whatever the width is) indexes to be applied to a base address in a scalar register, and the hardware then goes and does 8 separate memory operations on your behalf, packing the results into a SIMD register at the end.

That has to hit the cache 8 times in the general case. It's extremely expensive as a single instruction, though faster than running scalar code to do the same thing.