Hacker News new | ask | show | jobs
by drewg123 1959 days ago
GSO is better than nothing, but its still far more expensive than actual hardware TSO that swallows a 64KB send in hardware. At least it was when I did a study of it when I was at Google. This is because it allocates an skb for each MTU sized packet to hold the headers. I think it costs something like 1.5-2.0x real hardware backed TSO, mostly in memory allocation, freeing, and cache misses around header replication.

Also, sendmmsg still touches data, and this has a huge cost. With inline kTLS and sendfile, the CPU never touches data we serve. If nvme drives with big enough controller memory buffers existed, we would not even have to DMA NVME data to host RAM, it could all just be served directly from NVME -> NIC with peer2peer DMA.

Granted, we serve almost entirely static media. I imagine a lot of what YouTube serves is long-tail, and transcoded on demand, and is thus hot in cache. So touching data is not as painful for YouTube as it is for us, since our hot path is already more highly optimized. (eg, our job is easier)

I tried to look at the Networking@Scale link, but I just get a blank page. I wonder if Firefox is blocking something to do with facebook..