|
|
|
|
|
by BeeOnRope
744 days ago
|
|
> of the high-performance programs that is possible in 512-bit AVX-512 due to the equality between register size and cache line size, so the consumer Intel CPUs will remain a worse target for the implementation of high-performance algorithms. Can you elaborate here? I love full-width AVX-512 as much as the next SIMD nerd, but I rarely considered the alignment of the cache line and vector width one of the particularly useful features. If anything, it was a sign that AVX-512 was probably the end of the road for full-throughput full-width loads and stores at full AVX register width, since double-cache line memory operations are likely to be half-throughput at best and a doubling of the cache line width seems unlikely. |
|
But an increase in cacheline size would be nice if it can get us larger vectors, or otherwise significantly improve memory bandwidth.