|
|
|
|
|
by lossolo
297 days ago
|
|
> seems marginal at best Depends on the workload. Normally you would go read() -> write() so: 1. Disk -> page cache (DMA) 2. Kernel -> user copy (read) 3. User -> kernel copy (write) 4. Kernel -> NIC (DMA) sendfile(): 1. Disk -> page cache (DMA) No user space copies, kernel wires those pages straight to the socket 2. Kernel -> NIC (DMA) So basically, it eliminates 1-2 memory copies along with the associated cache pollution and memory bandwidth overhead. If you are running high QPS web services where syscall and copy overheads dominate, for example CDNs/static file serving the gains can be really big. Based on my observations this can mean double digit reductions in CPU usage and up to ~2x higher throughput. |
|
Which makes me sceptical for the argument for kTLS which is stated in the article; what benefit does offloading your crypto to the kernel provider (possibly making it more brittle). I've seen the author of haproxy say that performance he's seen has been only marginal, but did point out it was useful in that you can strace your process and see plaintext instead of ciphertext which is nice.
[1]: https://blog.tjll.net/reverse-proxy-hot-dog-eating-contest-c...