|
|
|
|
|
by eigenform
536 days ago
|
|
I dunno, you could imagine it happens [speculatively?] in parallel, at the cost of a read port and adder for each op that can be renamed in a single cycle: 1. Start PRF reads at dispatch/rename 2. In the next cycle, you have the result and compute the 64-bit result. At the same time, the scheduler is sending other operands [not subject to the optimization] to read ports. 3. In the next cycle, the results from both sets of operands are available |
|
It also adds an extra cycle of scheduling latency, which may or may not be an issue (I really have no idea how far ahead these schedulers can schedule).
If you think about the "1024 constant registers" approach: It allows you to have a small 10bit adder in the renamer which handles long chains of mov/add/sub/inc/dec ops, as long as the chain stays in the range of 1024. This frees the main adders for bigger sums, or maybe you can power gate them.
And in the case where one of the inputs is larger than 1024, or it's value is unknown during renaming, the renamer can still merge two two-input adds into a single three input add uop.