Hacker News new | ask | show | jobs
Beyond stunnel: High-speed, secure connections across the public Internet (blog.vcider.com)
11 points by jbrendel 5225 days ago
The standard ways to secure connections across public links (even for applications that don't support encryption themselves) has been to use stunnel or OpenVPN. But those solutions come with a significant performance hit. This article presents measurements and comparisons to illustrate this and presents a more modern solution with much better performance characteristics.
6 comments

As you can see, after publishing this article I received some feedback and questions about the comparison, including from the author of stunnel himself, Michal Trojnara. Even though the stunnel website itself states that the default is ‘no compression’, this apparently is not so. It appears iperf’s default data seems to be highly compressible, thus heavily skewing the performance numbers: stunnel was performing very different work than native networking or vCider. To arrive at more realistic numbers, I used a large image transfer (a JPEG) instead, which by its nature is not much more compressible. I transferred this file with iperf (which can use file input) as well as wget, The results? stunnel is much more comparable with both the native and the vCider networking speeds.

Interrupts and context switches are now roughly the same for all three solutions. stunnel still exhibits a significantly higher CPU load (20%), but certainly does not max out the CPU anymore. I suspect that the higher numbers of context switches and interrupts result from iperf’s default behavior of sending as much data as it can in a given time interval. And since stunnel can easily compress iperf’s default data, iperf was able to send a lot of this, which also explains the results reported by iperf.

While I maintain that a setup consisting of multiple nodes is much easier to maintain with vCider – which also provides a number of other interesting features – it must be noted that stunnel does indeed perform very well for point to point connections. Note to self: Be sure not to use synthetic data for performance tests like this.

One question that isn't in the FAQ that I would instantly be concerned about is exactly what encryption mechanism is used, and where the private keys are stored. Specifically, are they stored in vCider's database?

If my goal was to set up a secure tunnel, I'd be incredibly wary of using a closed source solution like this. Even if you're not worried about the government secretly demanding access to your keys, you might reasonably be concerned about vCider getting hacked.

Indeed, the keys are centrally created and (briefly) stored. However, they are also changed frequently and historic (older) keys are not kept on record. So, if someone intercepts a bunch of your traffic and then wants to hack or demand our database, they would have to hurry since the older keys are not kept around.

However, I understand your concern. We are thinking of ways to address this in an even more comprehensive way.

Sorry, I forgot to answer this: vCider uses AES 256.
Testing results indicated that stunnel was much faster than the line speed (and thus faster than his own product), but the author simply ignored it. In fact his version of stunnel had DEFLATE/ZLIB compression enabled by default. In order to prove that his own product is better than stunnel, the author of this test decided to compare the bandwith of an uncompressed stream encrypted with his product with the bandwidth of a compressed stream encrypted with stunnel.
Well, I contacted you privately to discuss this so that we may both enlighten each other on what's going on, but you did not respond to my email and instead went out publicly. Oh well.

But let's look at it. The physical link was capable of carrying something like 18 Mbit/s. When running iperf through stunnel, it reported more than 400 Mbit/s. But if you look at the actual bandwidth that was used during the transfer, it was indeed just 4.8 Mbit/s.

So, the 400 Mbit/s is clearly an illusion caused by the compression. Compressing a stream is a worth while goal, for sure. And it can be helpful in many cases. But I guess iperf's data is highly compressable. Considering that much of what's transferred these days is already compressed (multi-media files), I doubt that in the real world you will see any sort of speedup even remotely like this.

The fact is this: If it comes to actually transferring data over the wire, stunnel is very slow and there are just no two ways around it. It attempts compression at very high cost in CPU cycles and in the end is still going to be bound by context switches and interrupts.

I'm going to repeat the tests with compression disabled and update the blog post accordingly.

1. It's you who started the flame by publishing your obviously unfair comparison.

2. I didn't receive your email (if you really sent it).

3. Stunnel overrides the OpenSSL default of enabling compression by default since version 4.51 released over a month ago http://www.stunnel.org/?page=sdf_ChangeLog

4. Whether compression is useful or not depends on many factors, including not only type of data, but also available bandwidth and CPU power. And data compression is not an illusion. I'd be afraid to use your products if you don't understand it.

5. Compression is indeed much slower than encryption. This is a fact. Do you really mean that your product is better just because it doesn't support compression?

6. Stunnel is indeed a performance bottleneck, but only if your internet connection is over 0.5Gbps, and your server is as slow as my desktop: http://www.stunnel.org/?page=perf

1. Sorry you thought a comparison in which your program doesn't come out on top is a flame. It wasn't. Don't know what's unfair about comparing out of the box, default install performance of two systems.

2. Yes, I sent it, sorry you didn't receive it.

3. I did a standard apt-get install for stunnel on the Ubuntu Oneric system.

4. Sigh. I'm not going to comment on that one.

5. Really don't know where you would get this idea. Also don't understand why you get so upset. As I said, I was more than happy to discuss this with you ahead of time. Let me try to explain this again: I suggested that most real-world data is already compressed, thus the benefits you could derive by doing compression on the wire are lessened to the point where compression won't get you anything. For any compressable data, compressing it before sending is great and beneficial and I never stated otherwise.

Did you really want to discuss your results "ahead of time"? You could easily do it before publishing them. My email address is in the manual of stunnel.

I could argue whether an old Ubuntu package is "out of the box" stunnel, or whether sending 20mbit stream of compressed video is really the most common use of stunnel...

As usual, someone's pretending that IPSec doesn't exist.
I'm not pretending this. But consider that IPSec doesn't exactly have a reputation for being easy to set up or even is particularly well supported across IaaS provider networks.

People revert to stunnel (and OpenVPN), exactly because they may not have the knowledge or inclination to get a fully featured IPSec setup going.

Are the two doing the same amount of encryption or is vCider using a much less complex cipher? Is the CPU difference really just in the kernel vs userspace implementation?

tinc (http://tinc-vpn.org/) would seem like a more interesting comparable than stunnel since it sets up a p2p VPN that routes all IP traffic instead of just a point-to-point link.

vCider uses AES 256 encryption.

Tinc looks interesting, I will test that as well.

Considering the huge difference in context switches and interrupts, I don't think that encryption induced CPU load is the only issue here.

Kernel stuff on its own doesn't just magically run faster, of course. But in this case, I think it's the constant interaction between user-space and kernel-space, which causes the problem. That's an issue that will impact any user-space solution to a networking problem.

And what encryption is stunnel using? AES was picked to be fast so it may very well be outperforming whatever stunnel uses.

Can 1200 extra context switches and ~6400 extra interrupts per second use up 100% cpu? In fact you mention that for stunnel most of the CPU was in userspace and not the kernel which would indicate time spent actually using the CPU instead of doing context-switches and interrupts, which I assume top would show as "sys".

I also find the extra interrupts strange. I wonder if vCider is sending bigger packets and what caching/latency implications that might have.

Also, I had a bitch of a time getting performance out of stunnel, it seemed to ignore hardware acceleration units that openssl was happy to use.
vCider does not use larger packets. It can't, since they still have to be routed over the same public Internet.

stunnel tries to compress traffic as well, which works if you send a lot of stuff that isn't already compressed. But most of what's sent these days (multi-media?) already is encrypted, so this will be a wasted effort.

What version of OpenSSL was used on the test machine? That would greatly effect stunnel's performance. See here: http://vincent.bernat.im/en/blog/2011-ssl-benchmark-round2.h...
Version 1.0.0.e