Hacker News new | ask | show | jobs
by gatronicus 1736 days ago
Except that for decades REP MOVS/STOS were avoided on x86 because they were much slower than hand written assembly. This only changed recently.
1 comments

That was really only in the 286-486 era. On the 8086 it was the fastest, and since the Pentium II, which introduced cacheline-sized moves, it's basically nearly the same as the huge unrolled SIMD implementations that are marginally faster in microbenchmarks.

Linus Torvalds has some good comments on that here: https://www.realworldtech.com/forum/?threadid=196054&curpost...

Linus seems to consider rep mov still too slow for small copies:

https://www.realworldtech.com/forum/?threadid=196054&curpost...

https://www.realworldtech.com/forum/?threadid=196054&curpost...

It seems to me that rep move is so bad that you want to avoid it, but trying to write a fast generic memcpy results in so much bloat to handle edge cases that rep move remains competitive in the generic case.