Hacker News new | ask | show | jobs
by petethepig 1990 days ago
This happens because there's NAT (network address translation) happening somewhere.

Without NAT the only 2 parties that need to know anything about a TCP connection are client and server.

With NAT you have this problem where the router now also has to keep track of opened TCP connections.

E.g if you have a router with local IP 10.0.0.1 and external IP 30.0.0.1 and you are 10.0.0.2:55000 connecting to 230.0.0.1:443 router will have to allocate a port on it's external interface (let's say 56000) and remember it (this is the key part). So the connection will look like this:

10.0.0.2:55000 <-> NATing router 10.0.0.1 - 30.0.0.1:56000 <-> 230.0.0.1:443

When router receives packets on 30.0.0.1:56000 it has to remember to redirect them to 10.0.0.2:55000.

Memory is a limited resource so you can't just have an unlimited number of these opened connections floating around. This also makes your router vulnerable to an attack where an attacker can just open a bunch of connections and never close them, making your router eventually run out of memory.

So the classic solution to this problem is to use an LRU cache. So when your router is close to running out of space you just drop the connection that has been idling the longest.

Unfortunately, a) some routers are less sophisticated and will still drop your connections even if you do keep-alives and such, b) no matter what you do, memory is a finite resource and if the router doesn't have a lot of RAM, connections will be dropped.

¯\_(ツ)_/¯

5 comments

It's not just NATs that cause this. Stateful firewalls must also keep track connections to allow the responses for outbound requests that would not otherwise be allowed into the network. E.g. when you make a request to

www.example.com:443

From source port 12345, and you or your isp has a firewall that blocks everything that isn't explicitly allowed (this is common in corporate networks), the response could be allowed using firewall rules such as

iptables -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

This has the benefit of being general, but the drawback is that the firewall now needs to track the connection, with similar consequences to the NAT example you have.

It's also more likely for the firewalls to time out connections rather than use some kind of LRU scheme. In my opinion the time-based eviction is more predictable, so I prefer it. (Of course once you run out of memory you still need to evict "live" connections)

Indeed. It's fairly common to mix up stateful firewalls with NAT. You can have a stateful firewall without NAT, but you can't have NAT without a firewall. It's actually the firewall that is keeping track of connections.

The big difference here, though, is carrier-grade NAT. That means the firewall is not under your control and might have a tiny state table. NAT is bad enough as it is, but CGN should never have happened. It's just depressing to think about, to be honest.

Even with IPv6 many ISPs are still doing it wrong. They'll give subscribers dynamic prefixes which means having to use unique local addresses (ULAs) in addition to their Internet routable addresses because the latter keep changing. This kind of stupidity makes people at home want to hang on to their IPv4 LANs because they seem more under their control.

If only I could get an ISP like Hurricane Electric to provide me with a DSL line at home for a reasonable price. Consumer-grade ones are all hopelessly bad.

> but you can't have NAT without a firewall

While it is true that most NAT arrangements are provided by firewalls, it is quite possible for a device to provide NAT with no other firewalling features at all, so not be considered a firewall. In this case the device would just be a router that provides NAT.

Some confuse NAT and firewalling because NAT effectively implements a default-deny-all-not-initiated-here rule in one direction which is what most home users want in a firewall.

"Some confuse NAT and firewalling because NAT effectively implements a default-deny-all-not-initiated-here rule in one direction which is what most home users want in a firewall."

To make it even more confusing what most people are confusing with firewalling is actually NAPT which is the specific type of NAT described in this thread. There are other types of NAT which don't require keeping track of state and which don't provide the default-deny-all-not-initiated-here rule side benefit.

> what most people are confusing with firewalling is actually NAPT

Yes. I should be clearer myself as just referring to NAT this way could serve to increase the confusion.

What most people just call NAT, what is offered by simple home/office routers (or APs when not in bridge mode or similar) and phones in tethered wireless mode, is actually NAPT (Network Address Port Translation), which is a subset of SNAT (Source Network Address Translation), which is in turn a subset of NAT.

Indeed. A misconfigured NAT setup can also result in some traffic being NAT'd correctly and other traffic not being NAT'd, but ultimately still leaking out onto the wire (in either direction)

Beware when you're doing pure NAT, it doesn't always do what you think!

Isn't the firewall config you describe just essentially a software NAT?
No, not at all. NAT means Network (and port) Address Translation. If you don't change the contents of packets, it's not NAT.
> If you don't change the contents of packets, it's not NAT.

Oh well, yes. I agree.

I was thinking more from the point of view of the behavior it causes: essentially establishing some sort of look-up table to verify if an incoming packet corresponds to a previously outgoing one. Where, if the entry in that table gets deleted, the incoming packets suddenly start being rejected.

AT&T's home gateways have a maximum NAT translation table of 1024^H^H^H^H8192 connections. Some websites will go past that. A torrent client almost certainly will. And, now that people are working from home, there's a good chance that having multiple computers will only make that 1024 table limit even more laughable.

EDIT: okay I'm wrong. It's 8192 connections, not 1024 connections. But still ridiculously low

Just as a FYI/aside, it is fairly trivial to root AT&T home gateways, pull the certs and use your own hardware to authenticate to the network, removing their hardware off your stack entirely except for the ONT. (goodbye internet downtime due to random uncontrolled gateway "upgrades"). You just need a router capable of 802.1x client auth.

Throughput both ways actually gets really close to what I am paying for with this configuration, where as before with the default gateway (regardless of configuration), I was lucky to see 1/2 of the gigabit speeds I have been paying for.

I have such AT&T hardware also, but you and I have very different ideas about what's trivial.

I didn't know their box even had certs, or what "ONT" is. Is there like... a written series of steps I could follow?

If you are willing to move to Ubiquiti hardware (recommended, security breach from today notwithstanding) there's a relatively straightforward bypass method where the authentication packets are forwarded from the ONT to the AT&T box but it's otherwise out of the loop, and you have fully native routing with the Ubiquiti USG (a really nice router and ecosystem).

Instructions: https://medium.com/@mrtcve/at-t-gigabit-fiber-modem-bypass-u... Github project that makes it possible: https://github.com/jaysoffian/eap_proxy

It's definitely not plug and play but I've been using this setup for a year and a half and I get my full 1gb bandwidth throughout my network with lots of hosts.

AT&T has started using a much newer gateway for new installations.
Damn, that's a serious bummer. I hope mine doesn't break anytime soon.
If you have the BGW210 gateway there is a written series of steps for root here: https://github.com/Archerious/bgw210-root As well as step by step configuration for complete gateway bypass on Mikrotik router hardware here: https://forum.mikrotik.com/viewtopic.php?t=154954

If you are stuck with the newer XG-PON hardware, it looks like you might be out of luck for now.

This is true for existing installs. But recently ATT moved to XGPON gateways with integrated ONT. You can no longer bypass these gateways. Also to my knowledge you can’t extract the certs from Pace gateways.
And, these gateways use NAT even when in "bridged mode"
You can request to go into bridge mode which will bypass the internal residential gateway (NAT).
If an ISP is NAT'ing everyone (which I've heard of referred to as an "InterNAT Service Provider"), does "bridge mode" mean you get a real public IP? How does that work with everyone else still behind the NAT?

(I have an actual end-to-end-connectable public IP from my ISP, which from the general discussion seems like an increasingly rare thing --- they keep pestering me to "upgrade" to outrageously faster yet slightly cheaper plans with a "free router included", so I suspect they are trying to get me to give up that IP...)

There are 2 different topics here. One is carrier grade NAT (CGNAT), which is used by ISPs that have run out of IPv4 addresses so you don’t get a real public IPv4 address, although you should have a public IPv6. If you’re unlucky enough to be on one of thee ISPs there’s likely not much you can do.

The other issue is ISP provided gateways that handle authentication onto the ISP network, like ATT fiber. These devices contain the certificate/keys to gain access to the network. Unfortunately theses devices also try to be more than just an auth device/gateway. In ATT’s case the gateway also handles some Uverse/IP TV services so they don’t have a true bridge mode where they send all traffic to another device. This approach then causes issues like update downtime or NAT table issues.

Either of these issues shouldn’t be caused simply by an ISP provided router. If an ISP wants to implement either approach they will do so without your approval.

The AT&T gateways do not have a true bridge mode. They still use NAT even if they look like they are just passing the connection on.
It's even more trivial with CenturyLinks Fiber. You don't even need any certs.
1k certainly seems absurdly small considering how much RAM routers likely have, the fact that they can use most of it, and the amount of data needed for a single connection table entry (2 bytes external port, 2 bytes internal port, 4 bytes internal IP adds up to 8 bytes per entry, even being very generous at 16 bytes including overhead, that's still only 16K --- on a device that likely has several MB if not more, and whose primary function is likely NAT.
Some providers do this to force you to upgrade to business plans. Comcast business though, at least a while back, still had a limit too low for the office I worked. We switched to ATT business fiber and used our own GW.
> Some websites will go past that.

Do you literally mean a website? Using a browser? What’s an example website that would go past that?

You can overwhelm a NAT in several ways.

UDP is connectionless, but typically a UDP communication is bidirectional. This means a NAT needs to inspect UDP packets and retain a mapping to direct incoming UDP packets to the right place. With no connection information this can only be done as an LRU cache or similar.

TCP is connection oriented, and a NAT might rapidly free up resources when a connection is closed (ie, when the final FIN has been ACKed). But if there's no FIN, the NAT is in the same case as it is with UDP. Making a lot of connections without closing them fills up NAT buffers.

When you have a home NAT and a carrier-grade NAT you may get an impedance mismatch of sorts. The CGNAT might have insufficient ports allocated to your service to keep up with your home NAT, resulting in timeouts or dropped mappings. Your home NAT will have one set of mappings and the CGNAT another, and the two sets probably won't be exactly the same. This means some portion of the mappings held in memory are useless.

As a specific example, many years ago Google Maps would routinely trigger failures. Using Maps would load many tile images, which could overwhelm a NAT or CGNAT. The result was a map with holes in it where some tiles failed to load.

Browsers have long had limits on concurrent connections per domain. Total concurrent connection limits are also old news, but are not quite as old as per-domain limits. You probably can't make a NAT choke with just simple web requests (even AJAX) any more. You might be able to do it using, eg, WebRTC APIs, though I would be surprised if those aren't also subject to limits.

I remember being able to overwhelm my first "home router" with the "Browse for servers" tab in Counter Strike 1.6! It would fetch a list of all servers from Steam, and then connect to them individually, eventually killing my router.
"Using Maps would load many tiles images, which could overwhelm NAT of CGNAT."

Just curious, were these image resources all hosted on the same domain?

No, and that's by design, as many browsers limit you to two http-connections per domain. When you're loading tens of images (like map tiles), you want to use as many different subdomains as possible to load them in parallel.
With HTTP/2 one is way better off using one connection to one domain instead.

How has the world changed.

I'm afraid I don't recall. I suspect that they could not have been, based on best practices for performance at the time and the fact that the problem existed at all. I did, however, find a reference to the problem:

https://meetings.apnic.net/32/pdf/Miyakawa-APNIC-KEYNOTE-IPv...

Slides 12-15 show the degradation of Maps in action. 20 connections per user is a heavily over-committed CGNAT, but that level of port sharing does happen.

But the "connections per user" limit is per-webserver. You'd have to have thousands of users simultaneously loading maps off the same google server just to run out of ports on one IP.

I bet you could put 10k people behind each IP and never even get close to an issue of this type.

wouldn’t websockets be impacted by this limit?
Yes but I've yet to see a website use more than 10 simultaneous websocket connections, let alone 1000.
There's something like a 256 count limit on total websockets, and 30 per domain, in Chromium.

A malicious website could open up 256 websockets and as many HTTP connections as the browser allows, and that might be enough to swamp cheaper NATs.

See https://bugs.chromium.org/p/chromium/issues/detail?id=12066 for some 2009 discussion about people having troubles using the web when background tabs held connections open for polling. That wasn't a NAT issue, but it does highlight that a decade or two ago we all thought we only needed to manage tens of connections for a host to be online but that rapidly spiralled into hundreds.

I know it's not 2002 anymore, but I'm pretty sure no website on this planet would even come close to 1000 open connections, unless it actively tries to achieve just that, but even then I think browsers still have a limit on number of concurrent open connections, per tab and maybe total.
I also was very surprised about that number, so I checked with tcpdump and google maps on a new browser instance: I count just 31 syns after zooming in, moving and clicking on a pub :?
I'm pretty sure this guy is not using anything from AT&T though, as the chances seem really good he's in Denmark.
I just want to say thank you, this was a very concise explanation of a very complex concept.

I've been working with NATs for years and your comment helped me "click" and understand them at a different level.

Similarly, I found that OP's article provided an excellent primer on many concepts -- it certainly clarified the relationship between NAT and firewalls: that is, the latter being somewhat of an unintended consequence of the former.

Stumbling upon a great blog post that makes something click is always a pleasant experience.

Thanks! I guess I should keep blogging then :)
Me too. I’ve always wondered how a NAT knows where to route traffic. I figured it would use a lookup table, but I never know what the “keys” were. For some reason, using different ports for each device behind the NAT never crossed my mind! I knew it couldn’t be done by adding routing data to the packets (which is what IPv6 ended up doing) because that isn’t sustainable over multiple NATs. A port based routing with a table makes so much sense! It also explains why idle sessions are dropped.
"Without NAT the only 2 parties that need to know anything about a TCP connection are client and server."

Even without NAT there may be multiple devices between a client and server which need to know about the TCP connection. Stateful firewalls, WAN accelerators, and load balancers are some examples.

They should all be under your control, though. Once you hand off to your ISP it should be nothing but routers all the way.
I wish they were all under my control.
as noted ITT:

set ServerAliveInterval in your ssh_config to avoid this (default uset)