| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by geggam 3055 days ago
	My question is this. Why does the container world use NAT.. ( 3 layers to get out of container to base host in k8s ) ... and not use routing ? Is it just the container devs dont know routing ?

5 comments

puzzle 3055 days ago

Kubernetes is the opposite. NAT is explicitly not required:

https://kubernetes.io/docs/concepts/cluster-administration/n...

E.g. on AWS you might have all of a node's pod IPs on a bridge interface, then you talk to pods on other nodes thanks to VPC route table entries that the AWS cloud provider manages. NAT happens only when talking to the outside world or for traffic to Amazon DNS servers, which don't like source IP addresses other than those from the subnet they live in.

falcolas 3055 days ago

My memory is a touch fuzzy, but to route traffic out of a container in AWS, you have to either NAT thorough the instances network adapter, or attach an ENI to the container. However, you only get one ENI per vCPU in a VM (at least until Amazon finishes its custom NICs). What I'm really fuzzy on is whether the instance itself consumes one of those allocated ENIs.

That is, if you're running off a m4.2xLarge instance, you get a maximum of 8 ENIs - 8 containers if you want to use only VPC routing. For some services, this may be OK, but for many others (most?), it's far too few.

puzzle 3055 days ago

What's the destination? If it's the outside world, yes, you need NAT for state tracking and address rewriting, since the rest of the AWS infrastructure knows nothing about the pod CIDR (I guess you could set up a subnet for it and run a GW there).

For pod to pod, if you're OK with the limitations of 50 routes per VPC route table (you can open a ticket to bump that to 100, at the cost of some unspecified performance penalty), then you don't need NAT.

Otherwise, you can use something like Lyft's plugin, which does roughly what you describe. On a m4.2xlarge you only get 4 ENIs, but each of them can have 15 IPv4 and 15 IPv6 addresses, which the plugin manages. They assign the default ENI to the control plane (Kubelet and DaemonSets), so you should get 45 pods.

user5994461 3055 days ago

AWS instances can do IP routing just fine. There is a flag to set when the instance is created or else it drops all traffic not from its own IP.

raesene9 3055 days ago

In my experience NAT is almost always involved in a Kubernetes setup (for on-prem).

The container network is generally not routable to the wider corporate WAN (it'll use RFC1918 addresses by default). You typically get one set of addresses for the main container network, a different set of addresses for the service IPs and then an routable set on the ingress.

zaat 3055 days ago

What you describe is not NAT, the containers network segment is a separate network segment which is not accessible from outside the cluster, not directly and not through address translation. The ingress and service addresses are externally reachable addresses that expose services. NAT is not required for the setup.

raesene9 3052 days ago

If traffic flows from the pod network to an external network NAT is involved, as the Pod network is not routable.

puzzle 3055 days ago

I can see how it's more likely on prem, but at my job, we run Kubernetes in production on AWS and most traffic is pod to pod, without NAT involved.

geggam 3055 days ago

This is an interesting article about k8s

https://medium.com/google-cloud/kubernetes-from-load-balance...

puzzle 3055 days ago

That's inbound traffic coming from the outside world. You need NAT because the load balancer only knows about nodes, not individual pods (perhaps you can pull it off with e.g. ELBv2, but definitely not with v1).

There's more iptables magic if you talk to a service's virtual cluster IP, because of the load balancing, but from pod to pod, which is what I thought you were referring to, NAT is usually not involved.

geggam 3055 days ago

No point in having a service you cant use :)

puzzle 3055 days ago

Are you referring to the service cluster IPs? Those are great for short lived or low volume connections. If you want to balance load over long lived connections or have high volume, you really want to know the addresses of all your backends, whether that's done in your code or in a sidecar like Istio's.

lima 3055 days ago

Look into Project Calico, they get it right: https://www.projectcalico.org/

falcolas 3055 days ago

A lot of it is due to an effort to make it work in as many environments with as few external dependencies (and environment control) as possible. The "simplest solution which could possibly work".

Personally, I'd rather just bring on ipv6. But, in my case, we don't have enough people who understand ipv6 (and it's barely supported in AWS) to use it ourselves.

takeda 3055 days ago

Because that's the easiest thing to do when you don't know anything about networking. Ironically this also makes everything else much more complex and failure prone.

geggam 3055 days ago

This is the answer surprisingly missing in the industry overall. It amazes me that I work with highly educated folks who cannot grasp some of the fundamental issues with k8s and the container ecosystem.

notzorbo3 3055 days ago

Because NATting encapsulates while routing doesn't? And encapsulation is the whole idea behind containers. Until everything is ready for IPv6 (lol, yeah right), NATting seems the only way to me.

lima 3055 days ago

You can encapsulate just fine with a routed architecture.

You still need NAT to talk to the outside world (your services are behind a load balancer either way).

geggam 3055 days ago

Why do you need NAT ?