|
|
|
|
|
by dzaima
494 days ago
|
|
So your argument isn't that it's irrelevant, but rather that it might be irrelevant, if you happen to have a core where the extra latency of a 64-bit adder on the load/store AGU pushes it just over to the next cycle. Though I'd imagine that just having the extra cycle conditionally for indexed load/store instrs would still be better than having a whole extra instruction take up decode/ROB/ALU resources (and the respective power cost), or the mess that comes with instruction fusion. And with RISC-V already requiring a 12-bit adder for loads/stores, thus and an increment/decrement for the top 52 bits, the extra latency of going to a full 64-bit adder is presumably quite a bit less than a full separate 64-bit adder. (and if the mandatory 64+12-bit adder already pushed the latency up by a cycle, a separate shNadd will result in two cycles of latency over the hypothetical adderless case, despite 1 cycle clearly being feasible!) Even if the RISC-V way might be fine for tight loops, most code isn't such. And ideally most tight loops doing consecutive loads would vectorize anyway. We're in a world where the latest Intel cores can do small immediate adds at rename, usually materializing them in consuming instructions, which I'd imagine is quite a bit of overhead for not that much benefit. |
|
I'll also note that only x86 can do base + scaled index + constant offset in one instruction. Arm needs two instructions, just like RISC-V.