Hacker News new | ask | show | jobs
by Mavvie 1155 days ago
So what do you think the disadvantage is of a pull approach? Presumably it's not just better or else tools would use it?
4 comments

Not an expert in load balancing, but in similar problems (work sharing / work stealing, MPI get/put) it makes sense to pull only if you can pull fast enough to avoid incurring prohibitive latency at every request/message.

Multithreading-based work stealing à la Cilk relies on extremely cheap thread mechanisms and implementation to minimize communication.

In another similar situation, HPC switches are credit based so that until you hit congestion, you can “instantaneously” know if a remote is ready to receive.

This isn't a formal explanation, of course.

Edit: after some thought, that's not really the distinction that is made for load balancing. There's already knowledge of the remote state required for pushing to the last loaded queue. So the difference between pull and push is about having one queue vs several. In that sense it is like supermarkets that implement the more efficient one queue to every cashier Vs the more traditional one queue per cashier. In supermarkets there is a choice to make because there's other constraints, but just optimising for load balancing it's strictly better to have a single queue, if you only have one input.

Having enough workers to do the work shouldn't be any more constraining than it was before.

The article does nicely mention that simple round Robin actually has lower latency, because some traffic gets lucky & goes to under-utilized machines. Unfairness helps some traffic go faster. The queue is probably going to eliminate this, but the unfairness advantage comes at the cost of a lot of other traffic getting put into long queues on workers, so it wasn't really a good thing anyways. The p90+ is usually awful.

A pull approach seems difficult to manage when you have many layers in your load balancing.

In small setups, you may just have one layer with a single load balancer (well, hopefully at least a hot-warm pair), but larger setups often have multiple levels. There may be a network level traffic split to multiple frontend load balancers via something like ECMP; those frontends may connect directly to the origin hosts, or maybe there are frontends in many locations and they connect to backend load balancers near the origins.

In this bigger case, managing pull requests becomes difficult, because balancing may be unequal at earlier layers --- if your origin can handle N concurrent requests, so it sends N pulls, how many should it send to which of the upstreams, and if some upstreams get many requests and some get zero, those many requests will have unnecessary delay.

There's also unnecessary delay when at capacity between when one request finishes and the round trip of sending a pull and getting the next request.

But, it's always tradeoffs. It depends on the volume of requests, the typical time to process a request, behavior at or near capacity, etc.

I also think a pull based system is more work for the load balancer, and load balancers are harder to scale --- I prefer to move the work to the origins as much as possible, because it's typically easy to add more of those --- that's what the load balancer enables. But, that doesn't seem to be a commonly held opinion, direct server return is rarely available, load balancers commonly do TLS termination, and often intense traffic inspection and manipulation; again, there's tradeoffs.

Nice to see someone mention direct server return or as BigIP called it nPath routing. This was an effective scaling method for handling small request that returned large payloads (audio and video files). I don't know how well known this configure is or whether it is still viable in an all TLS world.
It doesn't seem particularly well known. It works fine with TLS, but the origin servers need to do the TLS termination (IMHO, this is better for security than having your load balancers do it, but it does mean you have to work harder on key distribution). On a non-DSR load balancer, doing TLS termination on the origins means the load balancer has less application data to work with (request path, response status code, etc) in making load balancing decisions, but for DSR, the load balancer never had any of that, so adding TLS doesn't disadvantage the load balancer any more.

TLS session establishment is expensive, so why would I want my load balancers to do it anyway? :P

Should you wish to give DSR load balancing a go without having to invest in hardware/licenses you could try https://github.com/davidcoles/vc5

Put that in front of some HAProxy servers to do TLS termination and farm out requests to another layer of NGINX/uWSGI boxes and Robert is your cousins father.

It kinda fell out of favour when even midrange server can do tens of gigabits of TLS traffic. So you need to be very big traffic wise to make it worth.
Making a high reliability queue that can also soak load spikes is non-trivial.

Now that we have hundreds of Gbps ethernet and TB of memory the idea has more merit, can scale pretty absurdly high with mundane systems. Or maybe you have sharding, which means now you have a load balancing problem again, of picking which work queue to take work from.

The HA bit is still hard. You have to to figure out if there's a netsplit (some folks can't connect to one server) or if one server really is gone. Probably just multicast to each queue all the incoming work & all the incoming pulls. Ideally each queue could also hear all the outgoing traffic. If ethernet capacity were unidirectional this would be great, box #2 could autonomously detect faults & take over. But ethernet is bidirectional, and now it needs all box #1's incoming traffic and it's outgoing traffic too. So instead maybe have the clients fail over. We can iterate on resign but HA is non-trivial.

i never ran this in prod, so not sure

i think the retries are a bit strange, and debugging was weirder as well, but its probably just me not used to it

even though i had multiple chances to use it in prod, i always go for http somehow, it just feels so familiar

also because of the way the REPLY_TO address:port worked in my experiments, sometimes having half open tcp connections really messes up things