| HN Mirror

Seems rather pointless to implement it like that. You would save space in the scheduler because the uops have been fused, but execution time and execution unit usage is the same as just executing the original ops before this optimisation was implemented.

It also adds an extra cycle of scheduling latency, which may or may not be an issue (I really have no idea how far ahead these schedulers can schedule).

If you think about the "1024 constant registers" approach: It allows you to have a small 10bit adder in the renamer which handles long chains of mov/add/sub/inc/dec ops, as long as the chain stays in the range of 1024. This frees the main adders for bigger sums, or maybe you can power gate them.

And in the case where one of the inputs is larger than 1024, or it's value is unknown during renaming, the renamer can still merge two two-input adds into a single three input add uop.