|
|
|
|
|
by crest
1230 days ago
|
|
It's neat when writing assembler e.g. add a scaled byte value to the PC to implement a jump table or perform a scaled and indexed load to the PC. In ARM it also produced a neat short and fast function prologue/epilogue. In my opinion the worst problem causes are the 1001 and one special cases it adds in an optimised out of order implementation. The Thumb interworking makes it more worse, but is useful to increase code density in ARM v6-M and can even increase performance (per clock) of ARM v7-M cores. I don't expect it causes too much problems in single-issue in-order implementations like the Cortex M3 and M4. I would like to know how much design time and core area is spend on this in the M7 and M85 cores. |
|
RISCV unfortunately didn't quite do this well since return uses the same opcode for call, return, and indirect branch and so you have to fully decode the instruction in order to determine whether you should use the RAS or your other predictors. This isn't a problem that can't be overcome (next line predictors help a lot for these early predictions) but it makes something very performance critical just that much harder.