Hacker News new | ask | show | jobs
by unsigner 633 days ago
There might be a chicken-and-egg situation here - one often hears that there’s no point having wider SIMD vectors or more ALU units, as they would spend all their time waiting for the memory anyway.
1 comments

The width and count of the SIMD execution units are matched to the load throughput from the L1 cache memory, which is not shared between cores.

Any number of cores with any count and any width of SIMD functional units can reach the maximum throughput, as long as it can be ensured that the data can be found in the L1 cache memories at the right time.

So the limitations on the number of cores and/or SIMD width and count are completely determined by whether in the applications of interest it is possible to bring the data from the main memory to the L1 cache memories at the right times, or not.

This is what must be analyzed in discussions about such limits.