memcpy is extremely slow. On any high-load Linux webserver, you can type "perf top" and see 20%~ of the CPU usage consumed by memcpy/syscalls/virtual memory.
However, the performance gap can get even larger! (The kernel is historically not great at this.) For NVME & packet processors, you can see an increase of 10,000%+ in performance easily via a zero-copy implementation.
See: https://www.dpdk.orghttps://spdk.io
memcpy gets weird with pointer aliasing as well. There's a slower path if the pointers can end up overlapping, and you either have to prove it programatically like Java does, do the defensive copy, or YOLO it and hope.
memcpy is only defined for non-overlapping memory regions (otherwise you should use memmove), but many platforms use memmove for memcpy anyway to avoid breaking user programs in unpredictable ways. Apparently this has also led to some arguments and glibc version incompatibilities (https://www.win.tue.nl/~aeb/linux/misc/gcc-semibug.html).