Hacker News new | ask | show | jobs
Vxlan over WireGuard (On OpenBSD) (rob-turner.net)
86 points by hucste 1222 days ago
12 comments

I love how this starts off "What is a VXLAN..." that was going to be my first question! So often posts like this seem to assume everyone knows what the topics are at the start. I know what WireGuard is, and I've at least heard of VXLAN before, but I couldn't remember what it was.
This can cause massive packet fragmentation. I'd be most interested in the performance degradation due to the L2 encapsulation. Are there any benchmarks available for this kind of project?
I've tunneled VXLAN over Wireguard on Linux. In my setup, my WAN's MTU was 1500 bytes, and my Wireguard tunnel's MTU was 1550, with the VXLAN's MTU being 1500. Surprisingly, traffic and iperf3 tests going over the VXLAN had much better throughput than traffic going directly over the Wireguard connection. IIRC, over the VXLAN, I was pulling ~800Mbps over the VXLAN/WG setup with iperf3.

Where this would fall apart is if there are firewalls in between that silently drop UDP fragments. In a case like that, it may be necessary to do VXLAN/Wireguard/Wireguard to conceal the fragmented packets with MTUs of 1500/1550/1440 respectively, assuming IPv4 and WAN MTU of 1500. I bet this would come with a significant performance hit though.

That's what I was thinking, unless you have jumbo frames you're going to have a hard time stuffing ethernet frames into IP payloads. Does Vxlan mitigate this somehow?
I could not get into the site, so from archive:

https://web.archive.org/web/20230214134248/https://rob-turne...

A little bit over my head, but an interesting read.

I took this to extremes last year: I used it to run MAAS from Australia to sweden (which requires layer 2). Granted I used tailscale to make the WireGuard part even easier, but it was a lot of fun.

https://medium.com/@antongslismith/bare-metal-cloud-provisio...

MAAS doesn't require L2, it requires DHCP and DNS to be configured correctly.
The difficulty isn’t setting it up, it’s in determining how much traffic you can put through it before it breaks.

How does it handle segmenting jumbo frames? Etc.

The feature I would be interested in is if this can do link state toggling on the on the vxlan interface if the wg handshake timer goes stale. If that works, then it becomes practical to do things like run ospf routing over the vxlan interface.
What exactly would you be intending to accomplish? OSPF already has state timers, and furthermore runs just fine on a wireguard interface without having to introduce a vxlan tunnel.
This is fun, but applications requiring L2 adjacency do it to limit latency/distance. Creating a L2 domain between here and the moon, what are you gonna use it for? Certainly not anything other than fun.
There's a number of specific scenarios this could be useful, like, some SANs can only replicate to L2 adjacent units. Say you wanted a replica off-site, and your gear is older/proprietary, you used to have to buy enterprise network gear to encap L2 and ship VLANs to remote sites. I wouldn't be dismissive of using VXLAN over wireguard to accomplish that.
You just explained how to increase technical debt in as few steps as possible.

The solution is getting a wavelength or dark fiber to the off-site, or throwing out the piece of junk SAN that only works on L2, it's too old by now.

I didn't say "this solution is easier", simply said someone might find a use case for it. Ya'll are dismissive of something neat with a number of use-cases (I only tossed out just one use case that popped into my head, based on actual experience on a million dollar SAN, that is still supported (and sold!) to this day).
If you have a million dollar SAN you have capital to get a real connection where you need it.

The reason I'm very against even discussing this is because people who don't understand the downsides would be open to doing this, shooting themselves in the foot along the way. People who understand the pitfalls just won't and are telling you DON'T, YOUR FOOT WILL HURT.

You could use VXLAN over WireGuard with a lower MTU, attach the VXLAN interfaces to different VRFs and route traffic, it's a somewhat valid usecase.

Switching over the internet is pain, I've got experience. Used to work at an MSP that did this as common procedure, worked fine until it didn't, and noone could explain why. And we're not even talking loops yet, you'll have to build a pretty sick RSTP.

The use-case is at best an SMB migration strategy.

You don't have to explain the perils of extending L2 over any type of WAN to me. There used to be a hard requirement from some SANs to have dark fiber for their replication - not just L2 adjacency (needed actual FC zones extended to another site if I recall). But all is not happy times with L3 links and BGP between things (and even then, BGP configs to achieve anything decent in terms of failover needs BFD or more). But sure, poop all over this fun thought experiment if you want, I doubt anyone's going to deploy this and put a billion dollar company at risk.
Can you use this to get Apple bonjour / mDNS working over a remote network (connected via VPN)? Or similarly, could you use it for a cloud seedbox to cast to a chromecast on your local network (via the VPN obviously)?
For mDNS, you should run avahi to relay between subnets. For chromecast, that is SSDP/DLNA which is a multicast, so it is a matter of establishing mulitcast routing between sites.

Bridging L2 is not the optimal solution for either of your scenarios.

Indeed check out my other post here - and it definitely was fun!
Wow, I can’t believe the HN audience is so accepting of stretched layer 2 as a solution. It’s almost as though we’ve been invaded by middle management.

Stretched layer 2 is almost always a mistake.

This is great! One small suggestion: try it with the newer veb bridge device. It should be a bit faster.
IP over Ethernet over VXLAN over UDP over IP over WireGuard over UDP over IP over Ethernet… sigh

OpenBSD does support both routing domains and multiple routing tables and includes multiple routing daemons in the base system. I would recommend to the author to stop hacking at the keyboard, grab whatever not to structured visualisation tool works for them (e.g. a whiteboard, a block of paper, a random drawing app, Visio) and (re-)phrase the problem. Are you solving a problem or showing of how many acronyms you can expand without looking them up? This n layer encapsulation can work and can even be required to reproduce some (problematic) organisational structure, but it's far from elegant. Given the chance I would vastly prefer to just use multiple routing domains for the WireGuard tunnel interfaces and the underlay. It would result in far less complexity to manage as well as less overhead.

Why do so many people insist on tunneling Ethernet over IP? What's keeping operators from using IP routing (and just one layer of encapsulation) instead? Is IP routing so scary or everyone that indispensable applications that only work over Ethernet?

Just the grateful that nobody has tried to wrap the entire thing in JSON over HTTP yet! I wouldn't be surprised if we get Wireguard over websockets for "enterprise" applications soon.

Sometimes you just need an L2 tunnel. Most of the time you don't, but when you do, you do. For example, if you use IPv6 over SLAAC in a private network, you'll need to route NDP.

In the rare cases that you do need an L2 tunnel between two different locations, you probably want some kind of authorisation and authentication of the traffic to prevent injection/spoofing attacks and to make life just a but harder for the NSA (Google's use of HTTP was one way the NSA managed to tap connections that were otherwise encrypted by HTTPS). After all, this isn't just any traffic, these are internal subnets.

In terms of authorised traffic, Wireguard is quite lightweight and foolproof. Perhaps IPSec is even more lightweight but it's a pain to set up. The alternative would be to wrap all internal network traffic in an encrypted protocol and set up the necessary whitelists in the upstream ISPs.

The impact of such layering depends on the network connection between the data centers. If you can get jumbo packets across, fragmentation won't be a problem at all. If you run your own fiber between data centers, there's basically no downside until you're reaching very high saturation network saturation.

Because stuff that requires this circus of encapsulation is usually so brain-dead that it can't be gotten to work in any other less horrible way.

(also I think you lost one 'over UDP')

You're right. I forgot that WireGuard sits on top of UDP.
I end up having to run basically this very setup (on OpenBSD, too) because I have a customer who has a Novell NetWare 5 setup and runs IPX only. Bad times.
NetWare 5 can do IP fine.

I used to run a cluster of DNS/DHCP servers that were the first on site to run 5. The rest were 4.11 until we binned them for 6. Three cream coloured Compaq 3U lumps.

Right, but they weren't running IP and refused to do it. I did set up a /30 so the poor old thing could synchronize its clock via NTP, but that was the only IP it talked.

This one's running (present tense) on a 1 GHz Socket 370 Pentium 3. It's got some weirdness about Pentium 4 and newer CPUs. I think it can be patched up but the client doesn't want to pay for the work because "it's fine like it is." Not worth the headache to virtualize as the first attempt didn't work.

People still use Novell NetWare?? Wow
Unfortunately.
There are a ton of protocols that don't work using cross-subnet IP routing, e.g. anything that uses multicast.
Multicast works across subnets with PIM
Using vxlan you can also connect L3 networks, not just L2 networks. i.e., virtualize an L3 network
Wireguard virtualizes L3 out of the box.
With Wireguard being a point-to-point protocol (as I understand), it will be challenging to get good performance for L3VPN BUM traffic?
A Wireguard interface is point-to-multipoint non broadcast which if a single peer is configured on it can in general be treated as point-to-point.
You wanna do PIM? There's no BUM on a p2p link.
Right, that's what I was trying to understand.

"wireshark can virtualize an L3 network out-of-the-box" How can this be true then?

vMotion needs L2 adjacency to make live migrating VMs easy. Some software rely heavily on broadcast discovery messages and are thus designed for LAN usage not Internet connectivity but businesses try to stuff a square peg into a round hold.
I have a vHost which is sitting on a public IP in another country, while the rest of vSphere is here in RFC1918.

vMotion, Provisioning and backups work just fine.

vMotion doesn't need L2 at all, this is a flat out lie.
Just do static routes or BGP over Wireguard. Simpler, scalable, less error prone.
But doesn't provide layer 2 between networks. Think of devices that are hardcoded to communicate with broadcast or multicast with TTL of 1, you either need some active reflector, and cope with any perculiarities of the device, or you simply extend a single vlan between two routers (using vxlan or another solution)

I sometimes need to extend a system like this from one site to another. One is a calrec system (an audio mixer, I think it's the control traffic that needs to be sent), and I don't have enough access or time to see if I could build some kind of transparent proxy -- it won't work with multicast routing.

I do however have enough time to create a layer2 network between two nics. I tend to use mikrotiks for that, create an eoip tunnel (GRE with proprietary addons to cope with fragmentation) between the two endpoints and pop the interface in a bridge with a physical port, and move on.

"Here's how I did this thing."

"You don't need to do that thing."

How do you know?

I've been in or very close to networking for a long time. Here's a certainty: there's lots of ways to do anything. Here's a corollary to that: someone will be sure to tell you how bad any choice you make is.
Does this get you Layer 2 connectivity?
No, because most applications don't need layer 2 adjacency.
You may consider VoIP phones. When a phone boots up on a network segment the DHCP process comes into play. This process is a Layer 2 process. The DHCP packet could contain a boot server field (this is typical) so the VoIP phone can grab configuration. One may want the boot server info "isolated" from other network segments. Utilizing VLANs is one way to do this. Additionally, it is typical that QoS is applied at Layer 2 (something necessary for real-time protocols like VoIP).
I hope you're not trying to run VoIP phones via vxlan over wireguard because you don't want to setup a local DHCP server
Over VXLAN yeah... You won't believe how many small-medium companies are moving from on-prem to small managed private clouds. VXLAN allows them to maintain their existing network configuration. "You hope" ha! I suppose by this you mean that the wireguard + vxlan may not be a "mature" combination for something as mission-critical as voice traffic? VXLAN implementations by mature companies (Netgate, Fortinet, etc...) those work. Maybe this wireguard+vxlan would work too.. who knows!?

Edit: I see you changed your comment to consider remote DHCP which causes my comment above to be irrelevant. Oh well. The fact still holds that VXLAN is meant to handle Layer 2 so saying BGP is an alternative is like putting a square peg in a round hole.

Or dhcp relay to a central dhcp server
Maybe he'll write a blog post about it :)
As someone that has fight for years with L2 in the WAN, I will not advice that at all. Just installa a DHCP server and transport the traffic over L3. You have lot more control both on routing both on QoS
I've found that they work just fine. In my case all that was required was setting a DHCP option pointing them at their controller. The QoS to make it work well under load would've been the same for a L2 tunnel.
Then it's not actually an equivalent solution.

L2 connectivity is still quite useful, even if you don't have a need for it.

I'm doing OSPF over Wireguard (running BIRD on Debian, not OpenBSD though.) It works pretty well.
There's a weird font-rendering bug on this site that causes the text in the code blocks to be unreadable unless you highlight it with your mouse. If you enable Javascript, it seems to fix it.

Not sure if the author is reading this thread, but it's something you may find worth investigating fixing.

Works for me with JS disabled, unless the author fixed it in the two minutes since you posted your comment. Code blocks have:

    style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4"
... in the HTML itself.
The bug still appears present for me:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:109.0) Gecko/20100101 Firefox/109.0

Code blocks without JS: https://i.imgur.com/Tcq34IK.png

Code blocks without JS under highlight: https://i.imgur.com/Lbvk2LS.png

Code blocks with JS enabled: https://i.imgur.com/kDCW4Q0.png