Hacker News new | ask | show | jobs
by BeeOnRope 744 days ago
> of the high-performance programs that is possible in 512-bit AVX-512 due to the equality between register size and cache line size, so the consumer Intel CPUs will remain a worse target for the implementation of high-performance algorithms.

Can you elaborate here? I love full-width AVX-512 as much as the next SIMD nerd, but I rarely considered the alignment of the cache line and vector width one of the particularly useful features. If anything, it was a sign that AVX-512 was probably the end of the road for full-throughput full-width loads and stores at full AVX register width, since double-cache line memory operations are likely to be half-throughput at best and a doubling of the cache line width seems unlikely.

2 comments

I read that comment as "the wider, the sweeter" (which I agree with), but that we're now (as you say) at the end of the road, and thus the sweetest point.

But an increase in cacheline size would be nice if it can get us larger vectors, or otherwise significantly improve memory bandwidth.

it does seem plausible. Apple has gone to 128 bit cache line.