Hacker News new | ask | show | jobs
by iscoelho 635 days ago
memcpy is extremely slow. On any high-load Linux webserver, you can type "perf top" and see 20%~ of the CPU usage consumed by memcpy/syscalls/virtual memory.

This article is a good demonstration of the performance improvements via mmap zero-copy: https://medium.com/@kaixin667689/zero-copy-principle-and-imp...

Netflix also relies on zero-copy via kTLS & zero-copy TLS to serve 400Gbps: https://papers.freebsd.org/2021/eurobsdcon/gallatin-netflix-...

However, the performance gap can get even larger! (The kernel is historically not great at this.) For NVME & packet processors, you can see an increase of 10,000%+ in performance easily via a zero-copy implementation. See: https://www.dpdk.org https://spdk.io

3 comments

memcpy gets weird with pointer aliasing as well. There's a slower path if the pointers can end up overlapping, and you either have to prove it programatically like Java does, do the defensive copy, or YOLO it and hope.
memcpy is only defined for non-overlapping memory regions (otherwise you should use memmove), but many platforms use memmove for memcpy anyway to avoid breaking user programs in unpredictable ways. Apparently this has also led to some arguments and glibc version incompatibilities (https://www.win.tue.nl/~aeb/linux/misc/gcc-semibug.html).
I don’t know why I said “path”, I meant instruction.
Any implementation of an algorithm is slow when your baseline is not performing the computation at all.
The fastest line of code is no line at all.‡

[‡]: Unless it's some weird architectural fluke with pipelining.

Haha, it's zero-copy! I never said it was "faster-copy" (-:
Apples and oranges. They're very different things, even if there's some overlap in use cases.