Hacker News new | ask | show | jobs
by crudbug 1364 days ago
ZeroTier alternative ? Has the team done any performance measures ?
1 comments

Yes, it is a zerotier alternative but there are key differences in how we do somethings... in fact, its on my list of things to do for creating some of these comparisons.... in the mean time, here is some comments on Ziti vs others - https://www.reddit.com/r/selfhosted/comments/v1ymn5/when_pub....

We did do performance testing too ... OpenZiti is built for high performance - https://netfoundry.io/benchmark/benchmarking%20open%20source...

Thanks! From the benchmark report [1], it is not clear how much of the baseline wire performance is observed, the numbers in the table are anywhere from 30% to 90% of plain wire bandwidth.

As there are multiple overlay projects popping up - Tailscale, NetMaker, OpenZiti, NetBird, Nebula, ZeroTier, EVPN, etc, we should consider a baseline benchmark index, like [2].

P.S. We have been testing ZeroTier for VPN access and observe ~70% baseline wire bandwidth.

[1] https://netfoundry.io/benchmark/benchmarking%20open%20source...

[2] https://techoverflow.net/2022/08/19/iperf-benchmark-of-zerot...

Nice! Sounds like OpenZiti is your next one to put on that list? We would __LOVE__ to have a third-party do any kind of performance testing like that and report the results. Good -- or bad! It's important to be transparent in things like this, we believe that wholly.

If you want any help, we'd be happy to help you setup a network (if you need it), just ask over on the discourse!

Have you tested large numbers of endpoints? Sometimes performance bottleneck isn't the data throughput but the connection establishment phase.

Eg if a smart phone app rolled out to 10k+ users, or an IoT service with 100k+ devices, will the service be able to handle it?

A bunch of us (me included) used to work in the IoT world. You can *TRY* to simulate 100k assets using "20 or so" nodes but truthfully it's just never *really* the same. So, to be transparent, no. It's really, really, really hard to test 100k devices effectively in my experience. (happy to be told otherwise/taught what others have done). We ran hundreds of __actual__ machines simulating "thousands" of devices, but it's still NOT the same... We do perform scale testing though it's not me that does it...

So no, we haven't gotten to 100k actual deployed devices - yet. You can be the first! :)

As every developer will say, we "built it for scale". Many of us are also a bunch of ex-IoT devs, and we _have_ built systems like this in the past so we're really familiar with the types of issues that can crop up.

Verified users count, we're in the 5000 to 10000 range that I know of (as in, I am pretty sure we have networks of that size deployed out there in the wild). I'll ask what our "fabric" people have tested and how and get back to you if it's significantly different than what I know about.

I'll add some more detail. There are a lot of different ways your application can break as it scales up. From something which is handles data flow, the three I tend to think about the most are: the model, throughput and number of connected clients.

We've tested the model with 100k identities with the operations that clients use: auth, listing services, creating sessions, etc, to make sure that the model scales reasonable well. We had to add additional denormalization and make some other tweaks, but now the controller scales relatively well for those operations. I'm sure there are still edge cases where it may break down, but we'll have to fix them if someone hits them.

We've done thoughput testing to make sure we can handle high throughput use cases. This also resulted in lots of changes, including reworking the end-to-end flow control. This is an area where we're happy with the progress we've made, and performance is in a reasonably good place, but where we have lots of ideas of how to continue to improve and will be continuing to test and iterate.

Having tons of connected clients (even without much traffic flowing) is it's own scaling challenge. We've done some amount of testing here. As part of the testing mentioned above we had to make some changes to make sure that slow clients didn't hold up fast clients. More generally,this is where things start getting complicated and very specific. You can have very different types of traffic flow, so it's hard to model anything generic. We have not done as much in this area because we've not seen any cases where we're memory constrained, which is the usual sign of a concurrent connections scaling bottleneck.

Hope that's helpful.

Great response, thank you.
zero tier is layer 2 - openziti is layer 3, 4 or 7 depending on what you're doing. it's similar to zero tier in ways, but very, very different in others.

openziti's main goal is to bring application embedded, zero trust into applications but getting there is a long journey. that's why we provide "agents" like other "better vpns" like zero tier, wireguard, etc