Hacker News new | ask | show | jobs
by KenanSulayman 2877 days ago
Why would one use this over HAProxy?
3 comments

HAProxy is an Layer 7 (i.e. HTTP, for the most part) load balancer and only handles the use case of spreading load across multiple backends. A single instance binds to a single IP. There's no redundancy; lose the HAProxy and you lose traffic.

For true redundancy, you need a layer above that handles the distribution of traffic to multiple redundant load balancer instances, and GLB does that via ECMP (Equal-Cost Multi-Path) routing. Github supposedly uses HAProxy as their L7 load balancer.

All of this thoroughly explained in the article.

I thought best practice for HAproxy was to run two HAProxy's in parallel with VRRP or DNS load balancing?

Does that not achieve the same outcome? I've used HAProxy for layer 4 in the past without any issue this way.

It does but that setup runs into limits on throughput of individual servers, and doesn't have the same drain/fill/failover capabilities discussed in the article.

To be clear, the HAProxy+vrrp+dns is often a better solution, but this describes an interesting design for a load balancing system that can handle many orders of magnitude more traffic and have maintenance without breaking established connections (one of it's core design features)

Imagine if you need a lot more than two HAProxies and they aren't in the same rack/subnet; that's where more sophisticated techniques come in.
I have very successfully in the past and still do use HAProxy as level 4 LB. It's one of the fastest to my knowledge. I have used HAProxy as entry to big Mesos clusters without any issue before.

One example of using HAProxy as L4 LB instead of letting it do the termination is when it is proxying TLS traffic from and to multiple backends. Or Websocket. Or even as bastion LB for SSH should one bastion go down.

It's not that HAProxy doesn't do L4. As I said, projects like GLB solves how to make the load balancer itself redundant; how to load balance the load balancer, so to speak.
For the cost of a ton of added complexity though? What do you get out of this solution that other solutions don't provide? Say DNS load-balancing, VRRP, CARP, or any other HA solution.
Not for most companies, but Github's scalability requirements are pretty extensive, and really cry out for something more sophisticated than the technologies you mention.

This stuff isn't exactly new; it's essentially the Maglev system described by Google in a 2016 paper. Other companies are now catching up to Google (which is of course 2+ years ahead).

well you can't have 100% redudancy without a virtual ip or bgp. so basically glb-director is the same as just using haproxy + bgp. (bgp can basically do anycast/ecmp multipath really easily. well you still need redudant network routers.)
basically glb-director is the same as just using haproxy + bgp

BGP (really ECMP) doesn't handle failures gracefully; that's the benefit of GLB.

Wouldn't it be possible to use DNS for this, with multiple A entries per LB, a TTL of 30 or 60? And remove unhealthy servers from the list? That would even come with IPv6 support.

Then you could address the LB with an address like some-service.lb.intranet and just use that where ever you would use the original service.

Designs such as GLB can (I haven't looked deeply enough in GLB specifically to see if they can do it or not, but I would assume so) handle director level failures mid-flow, i.e. connection won't be interrupted even if one them dies (packet losses are still likely, but TCP will take care of them). That allows a lot faster recovery than solutions that depend on client's DNS settings.

Additionally DNS will leave your load balancing at the mercy of ISPs DNS server settings. At least in the past it wasn't exactly unheard of that ISPs only cached single A entry so all of their clients would be directed to single server.

That said, DNS based load balancing is generally good enough solution for most of people.

Problem here is you assume that every client honors the TTL. That is a very bad assumption to make.
DNS failover works well in practice.
DNS failover looks like a neat idea, but does not work well that good. Until a new DNS entry propagates it could take a really really long time. also using anycast/ecmp via bgp means that you have a single ip that is highly redudant because it can be backed by many servers.
>GLB Director does not replace services like haproxy and nginx, but rather is a layer in front of these services (or any TCP service) that allows them to scale across multiple physical machines without requiring each machine to have unique IP addresses.

It's right there in 2nd paragraph.

From the article:

>GLB Director does not replace services like haproxy and nginx, but rather is a layer in front of these services (or any TCP service) that allows them to scale across multiple physical machines without requiring each machine to have unique IP addresses.

Yes, I get that. But in Mesos, for instance, you could use overlay-networks to get one virtual IP per HAProxy cluster (i.e. 3 HAProxy instances). They are then round-robin'd on each server in the whole cluster so that you have a transparent LB with a singular IP.
Round-robin routing is one way that really doesn't cut it at the level of scale that Github and others operate at. Overlay networks also typically have no notion of load and how to send traffic to the hop with the least amount of load.
Use least connections for HTTP load balancing, it is load aware.
That's an HAProxy option. What do you do to load-balance HAProxy itself? That's what the article about.