Hacker News new | ask | show | jobs
by jules 1261 days ago
It's not clear that even a super smart compiler can do this. The best schedule depends on the latency of instructions. This is a problem because we can't know statically whether a particular memory load is in L1/L2/L3/DRAM/etc., as this can vary for different executions of the same load instruction.
1 comments

According to [1], 88% of the speedup given by OOO processors is due to speculation, and the reordering in the case of cache misses attributed around 10% of the speedup. If OOO in general gives around a 50% speedup compared to in-order designs, reordering in the face of cache misses gives only around a 5% speedup. If you use a good static schedule with speculation, you'll get the bulk of the speedup. The rest can be recovered by increasing the clock rate by 5%, since you'll have gotten rid of so much silicon.

[1] https://doi.org/10.1145/2451116.2451143

Nice paper!