|
|
|
|
|
by userbinator
4557 days ago
|
|
Originally the instruction was the fastest way to do a block copy, and generally this was the case until MMX appeared, and then it fell into the set of "microcoded CISC instructions no one really uses" - so Intel didn't bother to optimise it much (the RISC fad was also really starting to take off in the PC world at the time) and it started falling behind. But then, in the post-P4 era, when CPU designers realised that high clock speeds weren't everything, and it was better to make instructions do more per clock instead, it got a lot more attention and a lot of detailed information about that can be found in this thread: http://software.intel.com/en-us/forums/topic/275765 Even more recently (Nehalem and beyond), they really started paying attention to optimising this instruction, so that even the byte/word variants will copy entire cache lines at once if possible. http://stackoverflow.com/questions/8858778/why-are-complicat... (IMHO the 2nd answer to that question should really have been chosen, since the 1st answer would be closer to reality a decade or two ago.) |
|