Hacker News new | ask | show | jobs
by andyg_blog 546 days ago
From TFA:

> PReQuaL does not balance CPU load, but instead selects servers according to estimated latency and active requests-in-flight

So, still load balancing

5 comments

Load balancing is a term of art; the actual algorithm for distributing requests need not be load-based. A more accurate term for the component might be "request distributor," but I don't foresee people changing their vocabulary any time soon.
I’ve never heard of a load balancer that balances CPU load. They balance queuing depth and that’s only a proxy for cpu load and a pretty terrible one at that.

I really don’t understand how their claim is anything more than a least-conn with a better weighting algorithm.

We don’t generally use heterogenous server clusters anymore. Noisy neighbors and differences from one data center to the next are definitely things, but outside of microservices, you’ve got a lot of requests with different overhead to them. Route B might be five times as expensive as route A. So it’s not server predictors that I want, but route predictors. Those need a weight or cost estimator based on previous traffic.

Poor man version of this: we had ingress load balancers and then a local load balancer, like one does for Ruby or NodeJS or a handful of other languages. I found that we got much better tail latency running a more “square” arrangement. We initially had a little under 3 times as many boxes as cores per box, and I switched to the next biggest EC2 instance, which takes you to 3:4 ratio. That not only cancelled out a slight latency increase from moving to docker containers but also let me to reduce the cluster size by about 5% and still have a bit better p95 times.

I get two equally weighted attempts to balance the load fairly, instead of one and change.

Abstract:

We present PReQuaL (Probing to Reduce Queuing and Latency), a load balancer for...

“Don’t load balance, erm, here’s our loadbalancer” struck me as quite humorous too. :)
Maybe RIF-balancing is a better term.

Fascinating that 2-3 probes per request is a sweet spot, intuitively it seems like a lot of overhead.

Rquests (in flight or currently processing) are the load in this case. But I guess "queue balancing" captures the intuittion better: what matters for latency is the future delay more than current delay.
Least-conn is requests in flight. It’s like they’re trying to make it hard for people to search for prior art. My ass feels very smoky right now.
They estimate what load will be in the future too.