Hacker News new | ask | show | jobs
by majke 1163 days ago
Okay, as far as I understand this writeup.

There are two sides, userspace UDP socket to receive wg packets on. Then the tap file descriptor to receive unencrytped packets from the host OS.

To speed up the userspace UDP socket it's desirable to use UDP_GRO flag on RX, and UDP_SEGMENT flag on TX. `tx-udp-segmentation` is a HW help for the latter. No need for any checksums and stuff. This is just speedup for userspace "classic" UDP socket.

However, buffering with UDP_GRO is interesting, since you need to pass potentially large 64KiB buffer to kernel since you don't know how large the next GRO-packet is. (this is a digression)

On the tap side, the article implies they enabled TUN_F_TSO4, which is a magical offload flag on tun interface. With it it is possible to get large pakets form the host OS. This is where it gets interesting. If you get a very large block from the host, like say 14KiB or larger.... how do you push it to the wireguard socket? I guess it's nececesary to packetize it back to small-MSS packets before encrypting. That means recreating TCP headers (with seq numers) and filling the checksum. This sounds like "fun".

The same on TX side towards the host... if you get a number of TCP segments from the wg tunnel, decrypt them.... do you push them as one large TUN_F_TSO segment to tun? or do you push one-by-one and rely on the kernel to GRO them? I didn't quite get it from the article. Or maybe it's possible to send large packets over wg without segmentation?

The same discussion is about UDP. With UDP you can use TUN_F_USO, however, this is only available in kernel 6.2. This might be why there arent' too many UDP numbers in the article.