|
|
|
|
|
by timmclean
3060 days ago
|
|
That all makes sense, but it doesn't seem to apply to the example code in the article, right? `inc` doesn't decode to a single fused uop on Ivy Bridge. AFAIK, the example code in both cases decodes to the same number of uops in the fused domain... |
|
If you use the three-instruction sequence the load and ALU op can't fuse, which potentially makes it slower (but not in this case since the bottleneck is elsewhere).