|
|
|
|
|
by tedreed
3621 days ago
|
|
Anyone working on K8s at Box or I guess anywhere else that has deployed it partially feel free to answer this, but: How do you handle gatewaying traffic into Kubernetes from non-K8s services? I've been trying to get a basic cluster out the door with one of our most stateless services, but I'm having a having a hard time just getting the traffic into it. The mechanism I'm using is having a dedicated K8s nodes that don't run pods hold onto a floating IP to act as gateway routers into k8s. They run kube-proxy and flannel so they can get to the rest of things, but ksoftirqd processes are maxing CPU cores on relatively recent CPUs trying to handle about 2Gbps of traffic (2Mpps) which is a bit below the traffic level the non-k8s version of the service is handling. netfilter runs in softirq context, so I figure that's where the problem is. Are you using Calico+BGP to get routes out to the other hosts? What about kube-proxy? |
|
Our network setup is constantly evolving due to a number of internal networking limitations related to nearly static ip-addressing and network acls. I'll describe our current setup and then describe where we'd like to go. The big piece of context is that we already have a number of services already being managed via puppet and a smaller number of new and transitioned services in Kubernetes so we need to allow interop though a number of different mechanisms.
We are currently using Flannel for ip-per-pod addressability within our cluster. No services are communicating inside the cluster so they aren't using kube-proxy yet. For services outside the cluster talking into the cluster we are using a heavily modified (https://github.com/kubernetes/contrib/tree/master/service-lo...) which we have contributed back yet. It supports SNI and virtual hosts. And we get HA and throughput for the individual loadbalancers by using anycast.
We have a number of internal services outside the cluster slowly moving to SmartStack. So I assume we will be figuring out interop with that and running it as a sidecar at some point. We would like to move to calico as we have some fairly high throughput services running outside of the cluster which we need to avoid bottlenecking on a loadbalancer for. We have separate project running internally to move our network acls from network routers to every host via Calico.
Hope that is more helpful than confusing.