| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by iscoelho 635 days ago

memcpy is extremely slow. On any high-load Linux webserver, you can type "perf top" and see 20%~ of the CPU usage consumed by memcpy/syscalls/virtual memory.

This article is a good demonstration of the performance improvements via mmap zero-copy: https://medium.com/@kaixin667689/zero-copy-principle-and-imp...

Netflix also relies on zero-copy via kTLS & zero-copy TLS to serve 400Gbps: https://papers.freebsd.org/2021/eurobsdcon/gallatin-netflix-...

However, the performance gap can get even larger! (The kernel is historically not great at this.) For NVME & packet processors, you can see an increase of 10,000%+ in performance easily via a zero-copy implementation. See: https://www.dpdk.org https://spdk.io

3 comments

hinkley 635 days ago

memcpy gets weird with pointer aliasing as well. There's a slower path if the pointers can end up overlapping, and you either have to prove it programatically like Java does, do the defensive copy, or YOLO it and hope.

link

nyanpasu64 635 days ago

memcpy is only defined for non-overlapping memory regions (otherwise you should use memmove), but many platforms use memmove for memcpy anyway to avoid breaking user programs in unpredictable ways. Apparently this has also led to some arguments and glibc version incompatibilities (https://www.win.tue.nl/~aeb/linux/misc/gcc-semibug.html).

link

hinkley 635 days ago

I don’t know why I said “path”, I meant instruction.

link

formerly_proven 635 days ago

Any implementation of an algorithm is slow when your baseline is not performing the computation at all.

link

hinkley 635 days ago

The fastest line of code is no line at all.‡

[‡]: Unless it's some weird architectural fluke with pipelining.

link

iscoelho 635 days ago

Haha, it's zero-copy! I never said it was "faster-copy" (-:

link

AlotOfReading 635 days ago

Apples and oranges. They're very different things, even if there's some overlap in use cases.

link