Hacker News new | ask | show | jobs
by protoman3000 828 days ago
> The problem you quickly run into to build this design is that Linux kernel WireGuard doesn’t have a feature for installing peers on demand.

I don't seem to understand. You can add peers at runtime, e.g.

https://serverfault.com/questions/1101002/wireguard-client-a...

Can somebody clarify?

EDIT: If I understand correctly, that step is already too late. They want to authenticate a peer before adding it to the interface in order to prevent stale entries on the interface.

They thus put a eBPF filter in front of the interface and do the cryptokey-routing based association to an authorized counterpart by themselves. If it checks out, then they add the peer to the interface and remove it after a timeout.

2 comments

What you want, and what in the medium term I think Jason plans to provide, is a Netlink API from kernel WireGuard that just gives you a feed of all the public keys the kernel has seen from initiator messages. With that feed, you wouldn't have to install a single WireGuard peer in advance. They could all just live in a SQLite database (or something), and get installed on-demand as clients try to connect with them.

If you're a VPN provider (for instance), the current API is a little clunky. It's not just that at any given time only a small fraction of your peers are actually in use, though that is probably true. It's also that as the number of peers you handle scales, from hundreds of thousands to tens of millions, you lose the ability to store them all in a single instance of the kernel at all; there are just too many. If peers have to be pre-installed, a consequence of that is that they get locked to specific server machines.

But, as the post points out, you can get a facsimile of the interface you need today with simple packet capture. And Jason set the API up so that you can --- very easily --- flip the initiation from server to client so the connection experience is seamless, even though the kernel dropped the first initiation message (because the peer hadn't been installed).

So that's the idea here.

Jann Horn pointed out that we could have taken this a step farther: we could have held on to the initiation packet that we captured, and, once the peer was installed, replayed the packet into the kernel. Which is also a neat idea.

I don't think there's much in this post that is going to change your life. It's just a couple of neat tricks that we thought people would like to know about.

(Though: the next step for us is to build on this to get "floating peers", de-regionalizing them completely, so users no longer have to think about what region their peers are configured in, which I think will actually have a product benefit for users, unlike this, which has primarily nerd benefit.)

We do a very similar thing in Tailscale, it was necessary to stay within the old iPhone memory budget.
Whatever we did with WireGuard, Tailscale did it better first. :)
I thought this was brilliant and made a little POC here as I couldn't find the code anywhere: https://github.com/realrasengan/wg-jit

This doesn't implement the completion (replay or reverse initiate) which I think both are also novel approaches to this.

So exciting!

Seems to me they did this to avoid the alternative of running WG in user space. They wanted a feature the Linux kernel didn’t have to route by cryptographic address first but without leaving the kernel so they hacked it in.???

JIT Wireguard is a weird way to frame this. My mind went to “why? The performance bottleneck is the crypto and per client JIT won’t help with that.”

I would have just gone user space. Use something like tokio-uring or glommio to get the performance. If you keep going in the kernel you are going to keep hitting limitations because Linux is not built to serve millions and millions of active tunnels. Even doing millions of TCP connections per kernel gets hairy sometimes.

Every limitation will require a hack. Every hack will be some system config that has to be applied and managed. The tool chains for provisioning Linux metal boxes are vastly inferior to the tooling for developing apps and services and managing their config.

Or am I stupid and misunderstanding?

It does not seem like they need huge numbers of active tunnels per gateway.

And JIT just as in "just in time" configuration of Wire guard. Once the configuration has been done, their stack stays out of it.

Ahh. In that case they are using the term JIT weirdly. Usually that means just in time compilation of script or byte code to machine code.
The phrase "just-in-time" can be used for other things besides compilation (it's often used for manufacturing, for instance). I think it's a helpful way to describe lots of things, and that we shouldn't try to limit its usage in tech to just compilation.
Exactly. The very first time I heard about "JIT" was in the context of manufacturing. The Toyota Production System [0].

I think JIT compilation wasn't popular in ancient times, so I never associated JIT with compilation by default.

0 - https://en.wikipedia.org/wiki/Toyota_Production_System

JIT compiling is the term your most used to it being used with but JIT has been around in other fields for longer, and just means what it says... Just In time :)