Hacker News new | ask | show | jobs
by 79d697i6fdif 3453 days ago
It's mostly because the point of DPDK and similar is to go around a lot of the processing in kernel, and IPVS does exactly this. I'm surprised IPVS isn't more popular, it's built into the kernel and extremely fast.

HTTP proxy type load balancers are slugs in comparison

Scaling app servers to nearly unlimited size is easy to explain but really hard in practice. It basically amounts to this:

1) Balance requests using DNS anycast so you can spread load before it hits your servers

2) Setup "Head End" machines with as large pipes as possible (40Gbps?) and load balance at the lowest layer you can. Balance at IP level using IPVS and direct server return. A single reasonable machine can handle a 40Gbps pipe. I guess you could setup a bunch of these but I doubt many people are over 40Gpbs. Oh, and don't use cloud services for these. The virtualization overhead is high on the network plane and even with SR-IOV you don't get access to all hardware NIC queues. Also, I don't know of any cloud provider thats compatible with direct server return since they typically virtualize your "private cloud" at layer 3, whereas IPVS actually touches layer 2 a little. Do yourself a favor and get yourself a few colo's for your load balancers.

3) Setup a ton of HTTP-proxy type load balancers. This includes Nginx, Varnish, Haproxy etc... One of these machines can probably handle 1-5 Gbps of traffic so expect 20 or so behind each layer 3 balancer. These NEED to be hardened substantially because most attacks will be layer 4 and up once an adversary realizes they can't just flood you out(due to powerful IPVS balancers above). SYN cookies are extremely important here since you're dealing with TCP... just try to set everything up to avoid storing TCP state at all costs. This also means no NAT. You might want to keep these in the colo with your L3 load balancers.

4) Now for your app servers. Depending on if you're using a dog slow language or not, you'll want between 3 and 300 app servers behind each HTTP proxy. You don't really need to harden these as much since the traffic is lower and any traffic that reaches here is clean HTTP. Go ahead and throw these on the cloud if want

3 comments

>"'Im surprised IPVS isn't more popular, it's built into the kernel and extremely fast."

I feel it actually is popular at places that do 10's of Gigs of traffic and up, usually in combination with a routing daemon - Bird, Quagga etc. I have worked in couple of shops now that utilized a similar architecture. I also read recently about a Google LB that leveraged IPVS and now this of course.

I saw ipvs was implemented in kernel but didn't realize it bypassed the stack. Thanks for clarifying.
not all of it. IVPS runs between layer 2 and 3 if using direct server return. It does bypass quite a bit though...
What if you're not dealing with millions of connections but instead only a few thousand from whitelisted IP's and you need to optimise for high availability & latency? Could it be done with just anycast -> IPVS layer -> app servers ?
If its stateless traffic then yes.

The ECMP/Anycast just gets you beyond the limit of an single pair of IPVS boxes which are are kept in sync with keepalived/vrrp for HA.

But a pair of boxes with ipvs + keepalived + iptables should be be able to handle a few thousand connections no problems. Your concern would then likely be the bandwidth going through the box. But if your client pull rather than push using direct server return should be able to get you past the bandwidth limitations of a single box.

Yeah it works pretty much the same. If your clients aren't geographically dispersed replace anycast with DNS round robin or use both like most huge sites do.

Also there's three layers :) dns->ipvs->httpproxy->app servers.

You could ditch the HTTP proxy layer if your app servers are extremely fast like netty/go/grizzly.