|
|
|
|
|
by userbinator
1481 days ago
|
|
You probably added an extra instruction or two to put the value in a register? The CPU can already split memory accesses into uops and cached accesses are fast, so there's no point in doing that because it'll just waste an additional register (vs. using one of the many the renamer generates) and add instructions to decode. x86 is fundamentally a CISC; if you treat it like a RISC, it will definitely disappoint. |
|
I asked around at the time and someone mentioned that I might have overtaxed certain execution ports or something like that, but yeah that just cemented my belief that x86 optimization is not my cup of tea anymore. Better to spend time learning how to write code the compiler can optimize well.