| HN Mirror

Thanks for that that those very interesting links. But I still don't think that is what is going on here.

In designs which rename using the ROB, the register file holds values produced by instructions which are completed and retired, the ROB holds values from instructions that are completed but not retired, and the bypass network supplies values from instructions currently completing.

What Agner is doing in his example with the seemingly useless instruction is transferring a value from the the register file to the ROB so that instructions which try to read logical register ECX will now source it from the ROB instead of the register file. But when I look at the code in the stack overflow question, nothing actually reads from s1. So these are even "more useless" instructions than Agner's example.

Some people have already mentioned instruction alignment issues, so that is one likely explanation. There are a whole bunch of other possible issues involving the scheduler and dispatch restrictions. For example, I've seen processors where there were two pipelines with slightly different instruction schedulers. So adding a useless instruction like this might push your bottleneck instruction into a pipe with a scheduler that is slightly better for your code. Sometimes bypassing across different pipes is more expensive than within the same pipe, so again the useless instruction might push some instructions into pipes that have more of their sources. It could one of any number of reasons and it's going to be very hard to tell from the outside without knowing the details of the microarchitecture.