It is likely that on recent CPUs they are always faster than C loop versions.
On my Zen 3 CPU, for lengths of 2 kB or smaller it is possible to copy faster than with "rep movsb", but by using SIMD instructions (or equivalently the builtin "memcpy" provided by most C compilers), not with a C loop (unless the compiler recognizes the C loop and replaces it with the builtin memcpy, which is what some compilers will do at high optimization levels).
On my Zen 3 CPU, for lengths of 2 kB or smaller it is possible to copy faster than with "rep movsb", but by using SIMD instructions (or equivalently the builtin "memcpy" provided by most C compilers), not with a C loop (unless the compiler recognizes the C loop and replaces it with the builtin memcpy, which is what some compilers will do at high optimization levels).