|
|
|
|
|
by dzaima
531 days ago
|
|
You don't quite need a full 64-bit adder to materialize the proper value, as one argument is only a 10-bit int. So a 10-bit adder, a 54-bit increment, and muxing it with the original depending on the add carry. And, presumably, the OP shift case here is in fact a case of there not being a built-in immediate adder and thus a need for fixup uops being inserted to materialize it? |
|
In general I think people are overstating the delay of an additional 64-bit add on register file reads (though I'm not a hardware guy so maybe someone can correct me). There are carry-lookahead adders with log_2(n) == 6 gate delays. Carry-save adders might also be relevant to how they can do multiple dependent adds with 0c latency.
> And, presumably, the OP shift case here is in fact a case of there not being a built-in immediate adder and thus a need for fixup uops being inserted to materialize it?
No, the perf counters show 1 uop dispatched/retired in both the slow and fast cases.