|
|
|
|
|
by userbinator
1737 days ago
|
|
That was really only in the 286-486 era. On the 8086 it was the fastest, and since the Pentium II, which introduced cacheline-sized moves, it's basically nearly the same as the huge unrolled SIMD implementations that are marginally faster in microbenchmarks. Linus Torvalds has some good comments on that here: https://www.realworldtech.com/forum/?threadid=196054&curpost... |
|
https://www.realworldtech.com/forum/?threadid=196054&curpost...
https://www.realworldtech.com/forum/?threadid=196054&curpost...
It seems to me that rep move is so bad that you want to avoid it, but trying to write a fast generic memcpy results in so much bloat to handle edge cases that rep move remains competitive in the generic case.