|
|
|
|
|
by sgerenser
789 days ago
|
|
I think this is only true if using x87 floating point, which anything computationally intensive is generally avoiding these days in favor of SSE/AVX floats. In the latter case, for a given vector width, the cpu can process twice as many 32 bit floats as 64 bit floats per clock cycle. |
|
SIMD/MIMD will benefit of working on smaller width. This is not only true because they do more work per clock but because memory is slow. Super slow compared to the cpu. Optimization is alot about cache misses optimization.
(But remember that the cache line is 64 bytes, so reading a single value smaller than that will take the same time. So it does not matter in theory when comparing one f32 against one f64)