Hacker News new | ask | show | jobs
by dzaima 531 days ago
You don't quite need a full 64-bit adder to materialize the proper value, as one argument is only a 10-bit int. So a 10-bit adder, a 54-bit increment, and muxing it with the original depending on the add carry.

And, presumably, the OP shift case here is in fact a case of there not being a built-in immediate adder and thus a need for fixup uops being inserted to materialize it?

1 comments

Right. Actually it turns out it's 11 bits, since [-1024, 1023] are all supported by the immediate add renamer.

In general I think people are overstating the delay of an additional 64-bit add on register file reads (though I'm not a hardware guy so maybe someone can correct me). There are carry-lookahead adders with log_2(n) == 6 gate delays. Carry-save adders might also be relevant to how they can do multiple dependent adds with 0c latency.

> And, presumably, the OP shift case here is in fact a case of there not being a built-in immediate adder and thus a need for fixup uops being inserted to materialize it?

No, the perf counters show 1 uop dispatched/retired in both the slow and fast cases.

Ah, good to know on the uop count. Still could be (or, well, has to be to some extent) the same concept, just pipelined within one uop.