Hacker News new | ask | show | jobs
by 0xQSL 1158 days ago
Nice improvements! I'd be interested to see how much overhead tailscales magicsock adds and what a flamegraph after the change looks like. Mostly crypto or still a lot of networking syscall time?
1 comments

magicsock definitely does a bunch more work, and we do look at both profiles. The magicsock profile is harder to read as a consequence of being a more complex path, adding packet filters, the indirection for DERP and other NAT busting details, etc. Jordan did do some optimizations in the magicsock path alongside this wireguard-go work to get us over the 10gbps line.

Overall the summary of time spent is still a similar story at the coarse scale - our recent optimizations mean that we're getting ever closer to the point where we need to start working on the next layer, such as optimizing the queues (visible here in the chanrecv and scheduler times - Go runtime stuff), and once we get that out of the way things like crypto and copying will become targets. The work goes on, we have lots of plans and ideas!

Super neat.

Have these optimizations (TCP GRO/GSO) been applied to non-root tailscale? I imagine, the changes needed are wildly different as the TUN device itself is gvisor/netstack. I believe, the UDP GRO/GSO part (discussed in today's blog post) may work as-is.

Good question, it's bits and pieces. I know there's more we can do with the userspace stack - netstack has some support for GRO/GSO, but unless I'm forgetting a detail we haven't fully plumbed that yet. It would definitely be interesting to do so - avoiding TUN turnaround while still utilizing mmsg and so on should provide excellent performance for something like a tsnet/libtailscale based server. We did recently improve performance in that configuration by enabling SACK, which is very significant.