|
|
|
|
|
by afeiszli
1629 days ago
|
|
The biggest differences with Tailscale are that you can use the kernel version (speed) and that we are self-hostable. We actually noticed a major contributor to the speed difference between us and them was a lot of times, they'll route traffic through their relays which can eat up a good amount of time. With Nebula, they're a lot closer on speed, but also we've got a management GUI which makes things a lot easier. We were very close to using Nebula in our early days. The main thing that stopped us was, we decided WireGuard was going to be the standard in the future, and wanted to be based on WireGuard. That leads to a bit more fundamental of a difference which is a bit harder to quantify. Our aim is to really be a "WireGuard controller." You should be able to shut down our server and agents and your network should still run fine, and you should be able to manually modify all your WireGuard interfaces if necessary. We're getting close to that vision but aren't quite there yet. That last point leads back to the kernel thing. We use kernel by default, but really, Netmaker can use any WireGuard implementation. If users are scared of the security implications of using kernel, they can use the userland version, and Netmaker should be able to pick that up just fine. They can even run it in a docker container on their machine. |
|
We briefly considered building something atop Wireguard in the early days of Nebula, but decided not to do so because of scaling. Wireguard's protocol necessitates that all nodes have existing keypairs for each other ahead of time. At Slack's scale, that means every time a fresh node is launched, you would have to tell 50,000 other nodes it exists.
Obviously you can smarten this up and tell only hosts it might talk to. But this adds complexity. Using PKI eliminates this key distribution problem and means that you don't have the same scaling limitations as something built on WG.
Wireguard is a very very good VPN, but I cannot imagine trying to run something on the scale of tens of thousands of nodes when you need such complex coordination systems to exchange keys/trust, especially in a dynamic environment where nodes are coming and going all the time.
I totally get that it solvable overall, but Slack has had 4 years of nearly perfect uptime on Nebula, whilst using it to pass >95% of all backend traffic. These considerations may seem simple to address, but there are fundamentals that mattered and led us to writing Nebula. We didn't want to create something new, but to do what Slack needed, we had to.