|
A pull approach seems difficult to manage when you have many layers in your load balancing. In small setups, you may just have one layer with a single load balancer (well, hopefully at least a hot-warm pair), but larger setups often have multiple levels. There may be a network level traffic split to multiple frontend load balancers via something like ECMP; those frontends may connect directly to the origin hosts, or maybe there are frontends in many locations and they connect to backend load balancers near the origins. In this bigger case, managing pull requests becomes difficult, because balancing may be unequal at earlier layers --- if your origin can handle N concurrent requests, so it sends N pulls, how many should it send to which of the upstreams, and if some upstreams get many requests and some get zero, those many requests will have unnecessary delay. There's also unnecessary delay when at capacity between when one request finishes and the round trip of sending a pull and getting the next request. But, it's always tradeoffs. It depends on the volume of requests, the typical time to process a request, behavior at or near capacity, etc. I also think a pull based system is more work for the load balancer, and load balancers are harder to scale --- I prefer to move the work to the origins as much as possible, because it's typically easy to add more of those --- that's what the load balancer enables. But, that doesn't seem to be a commonly held opinion, direct server return is rarely available, load balancers commonly do TLS termination, and often intense traffic inspection and manipulation; again, there's tradeoffs. |