| HN Mirror

Yes, as I wrote, it is only true for one float value.

SIMD/MIMD will benefit of working on smaller width. This is not only true because they do more work per clock but because memory is slow. Super slow compared to the cpu. Optimization is alot about cache misses optimization.

(But remember that the cache line is 64 bytes, so reading a single value smaller than that will take the same time. So it does not matter in theory when comparing one f32 against one f64)