|
|
|
|
|
by 5e92cb50239222b
1174 days ago
|
|
Since tailscaled uses the tun/tap driver and thus copies all traffic to userspace (and back), it is extremely inefficient. On my Haswell i5 (plus multiple servers with comparable hardware) the process consumes 40% of CPU time at just 4 MiB/s, and close to 100% at 10-11 MiB/s (with recent sendmmsg/recvmmsg patches¹). This is about ~2-3x worse than similar applications written in highly optimized C, so don't expect any miracles from further optimizations unless they switch to kernel Wireguard (which doesn't seem likely in the nearby future). They claim it's very difficult if not impossible, but this sounds like an issue with their architecture — a similar application from their competitors² has had kernel WireGuard support from the start (no relation, I don't even use it and cannot recommend for or against it). 1: https://tailscale.com/blog/throughput-improvements 2: https://github.com/netbirdio/netbird |
|
Kernel WireGuard for Tailscale is hard because of DERP (HTTPS/TCP fallback relay, all connections start over DERP so that they can Just Work if hole punching fails), but I'm sure it could happen with the right combination of eBPF and Rust in the kernel. It'd be a bit easier if there was a high level abstraction for using the kernel TLS stack to do outgoing TLS connections.