Hacker News new | ask | show | jobs
by aarmenaa 601 days ago
I've just spent the last month learning exactly why I definitely do want a TCP over TCP VPN. The short answer is almost every cloud vendor assumes you're doing TCP, and they've taken the "unreliable" part of UDP to heart. It is practically impossible run any modern VPN on most cloud providers anymore.

Over the last month, I've been attempting to set up a fast Wireguard VPN tunnel between AWS and OVH. AWS killed all internet access on the instance with zero warning and sent us an email indicating that they suspected the instance was compromised and being used as part of a DDOS attack. OVH randomly performs "DDOS mitigation" anytime the tunnel is under any load. In both cases we were able to talk to someone and have the issue addressed, but I wanna stress: this is one stream between two IPs -- there's nothing that makes this anything close to looking like a DDOS. Even after getting everything properly blessed, OVH drops all UDP traffic over 1 Gbps. It took them a month of back-and-forth troubleshooting to tell us this.

The really terrible part is "TCP over TCP is bad" is now so prevalent there's basically no good VPN options for it if you need it. Wireguard won't do it directly, but there's hacks involving udp2raw. I tried it, and wasn't able to achieve more than 100 Mbps. OpenVPN can do it, but is single-threaded and won't reasonably do more than 1 Gbps without hardware acceleration, which didn't appear to work on EC2 instances. strongSwan cannot be configured to do unencapsulated ESP anymore -- they removed the option -- so it's UDP encapsulated only. Their reasoning is UDP is necessary for NAT traversal, and of course everybody needs that. It's also thread-per-SA so also not fast. The only solution I've found than can do something not UDP is Libreswan, which can still do unencapsulated ESP (IP Protocol 50) if you ask nicely. It's also thread-per-SA, but I've managed to wring 2 - 3 Gbps out of a single core after tinkering with the configuration.

For the love of all that's good in the world, just add performant TCP support to Wireguard. I do not care about what happens in non-optimal conditions.

/rant

9 comments

The whole point of this article is that performant Wireguard-over-TCP support in Wireguard simply does not work. You're not fighting the prevalence of an idea, you're fighting an inherent behavior of the system as currently constituted.

In more detail, let's imagine we make a Wireguard-over-TCP tunnel. The "outer" TCP connection carrying the Wireguard tunnel is, well, a TCP connection. So Wireguard can't stop the connection from retransmitting. Likewise, any "inner" TCP connections routed through the Wireguard tunnel are plain-vanilla TCP connections; Wireguard cannot stop them from retransmitting, either. The retransmit-in-retransmit behavior is precisely the issue.

So, what could we possibly do about this? Well, Wireguard certainly cannot modify the inner TCP connections (because then it wouldn't be providing a tunnel).

Could it work with a modified outer TCP connection? Maybe---perhaps Wireguard could implement a user-space "TCP" stack that sends syntactically valid TCP segments but never retransmits, then run that on both ends of the connection. In essence, UDP masquerading as TCP. But there's no guarantee that this faux-TCP connection wouldn't break in weird ways because the network (especially, as you've discovered, any cloud provider's network!) isn't just a dumb pipe: middleboxes, for example, expect TCP to behave like TCP.

Good news (and oops), it looks like I've just accidentally described phantun (and maybe other solutions): https://github.com/dndx/phantun I'd be curious if this manages to sidestep the issues you're seeing with AWS and OVH.

> The retransmit-in-retransmit behavior is precisely the issue.

But you're concerned about an issue I do not have. In practice retransmits are rare between my endpoints, and if they did occur poor performance is acceptable for some period of time. I just need it to me fast most of the time. To reiterate: I do not care about what happens in non-optimal conditions.

> it looks like I've just accidentally described phantun (and maybe other solutions): https://github.com/dndx/phantun

I'll definitely look into that. They specifically mention being more performant than udp2raw, so that's nice.

> In practice retransmits are rare between my endpoints

You seem to be mistaken about how (most) TCP implementations work. They regularly trigger packet loss and retransmissions as part of their mechanism to determine the optimal transmission rate over an entire path (made up of potentially multiple point-to-point connections with dynamically varying capacity).

That mechanism breaks down horribly when using TCP-over-TCP.

Can't the tunneling software detect when the upper TCP is retransmitting segments and drop them?

That would give the lower TCP enough time to transmit the original segment.

Maybe, but packet loss isn't the only problem. You'll also want to preserve latency (TCP has a pretty sophisticated latency estimation mechanism), for example.

Some middleboxes will also do terrible things to your TCP streams (restrictive firewalls only allowing TCP are good candidates for that), and then all bets are off.

If you're really required to use TCP, the "fake TCP" approach that others in sibling threads have mentioned seems more promising (but again, beware of middleboxes).

But, my connection speed is usually greater and my loss is much less to my VPN endpoint than to whatever services I am accessing though that endpoint. As a result it doesn't affect things much. Further, accessing it with UDP is not always possible.
> [...] my loss is much less [...]

Unless it's actually zero, any loss on the "outer" TCP stream will cause a retransmission, visible to the inner one as a sharp jump in latency of all data following the loss. Most TCP stacks don't handle that very well either.

Sure, even when outer loss is pretty close to zero, it's conceptually not great.

On the other hand, I get 400mbps over TCP-over-TCP connections, and can't connect in any other reasonable way. 400mbps > 0.

Even tunneling in UDP is not great due to MTU effects.

> just add performant TCP support to Wireguard

But IP over TCP is in principle non-performant. There's no (non-evil) magic Wireguard could perform to get around that.

Adding TCP support to Wireguard would add a whole bunch of complexity that it doesn't need – for a very niche use case (i.e. where you absolutely have to get an IP VPN to work over a restrictive firewall).

> Wireguard won't do it directly, but there's hacks involving udp2raw.

Which significantly does not do UDP over TCP in the problematic sense (it just masquerades UDP as TCP, without providing a second set of TCP control loops on top of the first one).

> AWS killed all internet access on the instance with zero warning and sent us an email indicating that they suspected the instance was compromised and being used as part of a DDOS attack.

It makes no sense for that to be due to Wireguard usage, though (not saying I don't believe you that it happened, just their explanation or your assumption of their motivation seems strange). Things like Tailscale use Wireguard and should be common enough for AWS to know about them by now, I'd assume?

> But IP over TCP is in principle non-performant.

No it's not. In principle it risks meltdown, which is different. A link that occasionally breaks can be performant while it's working.

I run WireGuard to all my ec2 and AWS instances with no problem. I also run UDP video streams into AWS with little issue.
Ye I think there's either more to the story or a misconfiguration. I've done WireGuard at Azure, Hetzner, and AWS. All work fine.
It is very difficult to misconfigure Wireguard -- there's just not that much to tune aside from MTU. We've had a 1 Gbps tunnel between AWS and OVH for years and it worked mostly fine, except for the handful of times OVH's DDOS mitigation kicked in and killed the tunnel. The issue is when you start wanting to go beyond 1 Gbps.

I think AWS will do 5 Gbps with a capable peer -- which is their limit for a single flow [1] -- but you might need to tell them first so they don't kill public networking on the instance though. I found that UDP iperf tests reliably got my instance's internet shut off, so keep that in mind. On the other hand, OVH will happily do 5-ish Gbps to/from my EC2 instance in a TCP iperf test, but won't tolerate more than 1 Gbps of inbound UDP. OVH support has indicated that this is expected, though they do not document that limitation and it seemed that both their support and network engineering people were themselves unaware of that limit until we complained. They don't seem to have the same limits on ESP, which is why I developed an interest in ipsec arcana.

[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-inst...

Enter TunSafe: https://github.com/TunSafe/TunSafe

This comes with TCP implementation https://github.com/TunSafe/TunSafe/blob/master/docs/WireGuar...

Bad news is that runs only between TunSafe instances.

Worst case, can't you run a minimal turn server and have TCP over Wireshark/UDP over turn/tcp?

For a site to site VPN, something where you use transparent proxying at the routers to turn TCP into TCP over SOCKS (over TLS) might work. TCP proxying with 1:1 sockets avoids most of the issues with TCP over TCP, at the expense of needing to keep socket buffers at the proxy hosts.

Did ipsec over udp for client vpn, to datacenter and even to Azure from AWS. No issues whatsoever, never did more than a Gbit over 1 tunnel though.
We've run Wireguard tunnels that max out at 1 Gbps in AWS for years with no issues (on the AWS side, anyways). It seems like things get hairy once you want to do more than that.
Did you try udp/443 to see if OVH clobber that traffic ?

I was quite hoping that the advent of QUIC would let us all use UDP again, albeit on one port.

Did you go down the shadowsocks path at all?
I did not. I'm not terribly familiar with it, but it doesn't look like I can do general routing with it, right? My end goal is to route between two subnets.
Nope, shadowsocks is just plain TCP-in-TCP (not TCP-over-TCP) proxy. If you cannot have performant routing between clouds due to UDP QoS, then the only sensible solution would be to setup proxy nodes on both sides and transparently redirect TCP (if that's all you need) traffic through the proxy.

(I wrote https://github.com/shadowsocks/go-shadowsocks2)

> strongSwan cannot be configured to do unencapsulated ESP anymore -- they removed the option

wait, what? Pretty sure I still used unencapsulated ESP a few months ago… though I wouldn't necessarily notice if it negotiates UDP after some update I guess… starts looking at things

Edit: strongswan 6.0 Beta documentation still lists "<conn>.encap default: no" as config option — this wouldn't make any sense if UDP encapsulation was always on now. Are you sure about this?

Sorry, I misremembered the issue. Looking at my notes the issue is they don't allow disabling their NAT-T implementation, which detects NAT scenarios and automatically forces encapsulation on port 4500/udp. The issue is that every public IP on an EC2 instance is a 1:1 NAT IP. Every packet sent to the public IP is forwarded to the private IP -- including ESP -- but it is technically NAT and looks like NAT to strongSwan.

There's an issue open for years; it will probably never be fixed:

https://wiki.strongswan.org/issues/1265

Ah, OK, yeah that makes sense.

FWIW, using IPv6 might be an option here?