Hacker News new | ask | show | jobs
by aednichols 1045 days ago
NAT is pretty computationally intensive, this is why e.g. ISPs & mobile carriers are pushing IPv6 over CGNAT.
5 comments

AWS NAT gateway is $0.045 per hour plus $0.045 per GB. The hourly fee seems mostly okay - for largish users, one or two per region is fine.

$0.045 per GB is nuts. That’s $20.25/hour or $14580/mo for 1 Gbps. One can buy a cheap gadget using very little power that can NAT 1 Gbps at line rate for maybe $200 (being generous). One can buy a perfectly nice low power server that can NAT 10Gbps line rate for $1k with some compute to spare. One can operate one of these systems, complete with a rack and far more power than needed, plus the Internet connection, for a lot less money than $14580/mo. (Never mind that your $14580 doesn’t actually cover the egress fee on AWS.)

A company with a couple full time employees could easily operate quite a few of these out of any normal datacenter, charge AWS-like fees, and make a killing, without breaking a sweat. But they wouldn’t get many clients because most datacenter customers already have a NAT-capable router and don’t need this service to begin with.

In other words, the OpEx associated with a service like this, including the sysadmin time, is simply not in the ballpark of what AWS charges.

Is that $0.045/GB for all data transferred through it, or just egress to the public internet? If it's the latter, that's half the price of normal EC2 instance egress to the public internet.

If it's the former... oh sweet jesus, what? Probably way cheaper to just run an a1.large or something with Linux on it, plus a very short shell script to set up NAT. That's assuming well more than half of the traffic going through it is ingress from the internet. If it's 50/50 ingress and egress, then it's basically the same pricing as NAT gateway.

No, it’s so much worse than that. Look closely at https://aws.amazon.com/vpc/pricing/ and note this line:

> You also incur standard AWS data transfer charges for all data transferred via the NAT gateway.

Yes, the $0.045/GB “data processing” charge is in addition to the usual $0.09/GB egress charge. You are paying an effective $0.135/GB for all of your egress, in addition to the $0.045/hr just to keep the NAT gateway running.

And yes, your ingress and even internal-to-AWS traffic is also billed at the $0.045/GB rate. (An example given on the aforementioned page is traffic from an EC2 instance to a same-region S3 bucket, which they note doesn’t generate an egress charge but does generate a NAT processing charge.) As far as I can tell, the only traffic which isn’t billed is traffic routed with internal VPC private IP addresses, which don’t hit the NAT gateway and thus aren’t counted.

There are highly paid AWS consultants who shave literal millions of dollars off of many company’s AWS bills by just setting it up a cheap EC2 box to handle their NAT instead of using the built-in solution. Doing that instantly wipes out the ingress charges and effectively halves the egress charges, and it’s probably a lower hourly cost than they’re already paying: an a1.large is $0.051/hr on-demand but that immediately drops to just $0.032/hr with a 1 year no upfront reserved plan. If you’re willing to pay upfront and/or sign a longer contract, you can get it as low as $0.019/hr.

It's quite unfortunate they sunsetted the NAT instance AMI.
Bit confused. Couldn't you just run a Linux VM to do your NAT and only pay normal egress?
Yes. And AWS do (sorta) offer a NAT AMI (amazon machine image) if you want to do more management yourself and not get extorted for bandwidth.

https://docs.aws.amazon.com/vpc/latest/userguide/VPC_NAT_Ins...

I say sorta because it's built on an old version of Amazon Linux and is headed towards EOL with no replacement except "go build your own" as you suggest.

https://www.lastweekinaws.com/blog/an-alternat-future-we-now...

AlterNAT uses managed NAT Gateways as a fallback when the NAT Instance is out of service, but again you will have to make your own NAT AMI.

This is not to excuse AWS' frankly absurd NATGW pricing, but to point out other ways around it.

You don’t actually need to use the AMI. Here’s an example of a NAT instance we build from scratch:

https://github.com/somleng/somleng-project/blob/main/infrast...

Thanks! That is exactly what I wanted to know.
Another thing: EC2 instances (VMs) have a "Source/Destination IP check" which makes them ignore any packets not intended for them. If you want an instance to do NAT, you need to turn this off.
I've also got an open source terraform module for this-

https://github.com/tedivm/terraform-aws-nat

Weird, I was just looking into this yesterday and found https://fck-nat.dev/
> just run a Linux VM

+ Run extra for failover, HA etc + manage security + Monitor performance + ...

You would have to run that in your own data center which is what original poster was comparing to.
You also have to do it in AWS if you don't want to use the NAT Gateway service and still desire reliability over and above the MTBF for an EC2 instance or AZ, or ever want to do anything requiring a reboot.
For example, rather than simply routing IP packets and then forgetting them, you need to statefully inspect every TCP segment and every supposedly connectionless UDP conversation, you need to maintain state for every live conversation, and you need to mitigate DOS with all those resources.

At that point, you might as well be running a Layer 7 Firewall or an Intrusion Protection System.

> At that point, you might as well be running a Layer 7 Firewall or an Intrusion Protection System.

If you go down this path consider using Transit Gateway so you can route multiple VPC traffic to a central security VPC in a region. I’ve done this a Palo Alto VM and it seems to work well.

UDP is connectionless precisely so you can build novel stateful protocols on it. There’s no promise in UDP that you’ll be able to statelessly monitor it.
UDP is actually more expensive to NAT than TCP is. The reason is UDP fragmentation, which is my vote for the worst, and least forgivable, design error of TCP/IP.

Instead of putting the fragmentation in L4 (like QUIC now does) and including a UDP header on every fragmented packet in a datagram, UDP only includes the header on the first packet. With fragmentation happening; firewalls, NATs, and end-hosts have to buffer and coalesce IP packets based on IP IDs, before the destination can be identified. It's a real nuisance. A lot of CGNAT "stateless" implementations can't handle this and you get very hard to debug issues when there are fragmentation and MTU mismatches.

This is probably more accurately called IP fragmentation (since that is the layer where the fragmentation happens), and a lot of companies make it optional to support in networking gear. I'm surprised that you are using it or seeing it, because it is essentially obsolete today.

It has a legitimate purpose in old-timey systems which have bespoke MTUs on each link, but now the usual thing is to use 1500 bytes for WAN traffic, which is the generic Ethernet MTU, and reserve larger sizes for intra-datacenter communications.

There's a number of UDP protocols that have large enough payloads to fragment. DNSSEC and EDNS0 in particular made it much more common, though the EDNS0 flag day in 2020 partially undid some of the damage by getting folks to ratchet down their EDNS0 buffer sizes.

1500 is absolutely not a pervasively usable WAN MTU, you're going to need pMTUd if you're sending 1500 byte packets broadly. Plenty of WAN links won't tolerate it. If you don't want to deal with fragmentation at all ... 500 is the minimum guaranteed MTU, but in practice it's exceptionally rare to see anything below about 1200 require fragmentation. But you can always only control what you send, not what others are sending you.

One thing I've learned since joining Fly.io in 2020 is to laugh when people point to the 1500 MTU. You absolutely can't count on that: IPv6 cuts into it, and so does every additional layer of encapsulation on your path.
Yeah, you have to account for the headers in the 1500 byte MTU, which I suppose can be substantial if you have several VLAN tags, IPSec, IPv6, and a bunch of IP options. Presumably most of that encapsulation happens inside a datacenter, though, where you can use jumbo frames.
With IPv6 only the endpoint can fragment, not any hop in between.
Even well-behaved unfragmented UDP should be more expensive to NAT because it doesn't have an end-of-stream "FIN" marker, meaning stateful middleboxes need to retain state for longer because they can only time out.
Timeouts on UDP are usually much shorter than TCP, so it's not as bad as it sounds.
But TCP fragments in the same way?
TCP does not use IP fragmentation, and the IP packets are marked "Don't fragment". TCP performs its own fragmentation and every packet gets a TCP header in its leading section. A NAT, Firewall, or end-host can L4 route the TCP packet as-is and does not need to correlate with other packets.

Edited to extend: this is why TCP has a "Maximum Segment Size", and why Path MTU Discovery information has to be passed into the TCP state machine. It is TCP that takes responsibility for carving up the data into the packets, not IP.

One of the goals of UDP was to avoid needing this kind of state, which is why the IP layer handles fragmentation for it instead. This is allowed on a hop-by-hop basis, unless the DF bit is set; so when a "too big" packet gets to a node with a smaller MTU, it can just split it and send on the fragments. No PMTUD needed.

The design could have been for the fragmenting node to also add a UDP header as part of that process, but was not. It would have been a simple change at the time. It's had a lot of consequences since and is responsible for a decent amount of complexity in hardware and software packet pipelines.

It could not have copied the UDP header. Otherwise you wouldn't be able to put any new protocol on IP without teaching it to every router.
It's been a while since I've thought about this; thanks for the refresher.
Which is why game networking libraries put a lot of emphasis on NAT traversal, forcing NATs to recognise the "connection". And why game console manufacturers tell users to just forward all incoming traffic unmanaged by the NAT to the console.
> ISPs & mobile carriers are pushing IPv6 over CGNAT

LOL. Not Metronet. They are doubling down on CGNAT. They've acquired ISPs with IPv6 and killed it in favor of CGNAT.

This is missing the point mostly, my own sites have supported ipv6 for a going on a decade because it was fun to get it working. But that's a very different thing than supporting only IPv6.
It's best for an ISP to deploy IPv6 and CGNATv4 in parallel, so the NAT only needs to handle traffic for services that don't support IPv6 (e.g. news.ycombinator.com)
It's not really computationally expensive, it's memory expensive. You need per connection state.
it already has stateful firewall

so that's: source ip, dest ip, protocol, source port, dest port, connection state (say 16 bytes total)

doing NAT too is what, 3 more bytes per connection (8 bits for an offset into an IP table and 16 bits for the translated port)

NAT and Stateful firewalling are commonly bundled together (especially on home systems) but I would not go so far as to say “NAT has a stateful firewall”-

I hear such takes all the time and its really frustrating; usually in threads regarding IPv6, incidentally it is usually programmers who think they understand everything about networks because they know how tcp operates.

> but I would not go so far as to say “NAT has a stateful firewall”-

> I hear such takes all the time and its really frustrating

maybe you'd be less frustrated if you understood what people were saying, because I didn't say that

AWS already do 1:1 NAT and there's additionally a stateful firewall, which necessitates connection state tracking

adding the extra few bytes to do port translation shouldn't vastly increase the memory required

> incidentally it is usually programmers who think they understand everything about networks because they know how tcp operates.

from someone who has written a commercial packet filter: in terms of complexity, TCP blows the preceding layers of the stack out of the water

In almost all NAT implementations, public-side ports are dynamically assigned, which implies that inbound connections aren't possible (unless port forwarding is explicitly configured).

Is that really conceptually so different from a stateful firewall allowing inbound packets only for established connections/flows?

"NATs are good because otherwise people wouldn't have any firewalls" is a tired take, yes, but I don't see the point being needlessly pedantic about the semantics of NAT vs. stateful firewalls when in this case, the effect is the same: No inbound packets without prior outbound packets (or a connection establishment for TCP).

Generally an ISP does not have a stateful firewall prior to deploying CGNAT.