Hacker News new | ask | show | jobs
by jedberg 2453 days ago
> Load-sensitivity is one “smart” approach. The idea is that you keep track of the load on each shard, and selectively route traffic to the lightly-loaded ones and away from the busy ones. Simplest thing is, if you have some sort of load metric, always pick the shard with the lowest value.

Gotta be super careful with this one. We did this at reddit and it bit us bad. The problem was as soon as the load on a machine went down it got pounded with new requests and the load shot up, but it takes a few seconds for the load number to react to all the new requests. So we saw really bad see-saw affect.

We had to add extra logic to mark how long a machine had beed at a certain load and also randomly send requests to slightly more loaded machines to keep things even.

The moral of the story here is make sure you pick a metric that reacts to the change in request rate as quickly as your request rate changes!

10 comments

This sounds a lot like the control-theory problem of balancing the proportional, integral, and derivative coefficients of a PID controller [1]?

I'm curious how you reached this condition as a requirement:

> The moral of the story here is make sure you pick a metric that reacts to the change in request rate as quickly as your request rate changes!

It makes sense intuitively, but I'm having trouble proving to myself that this is necessary+sufficient.

[1]: https://en.wikipedia.org/wiki/PID_controller

So I studied control theory after I left Reddit and you’re indeed right, it’s a PID issue.

Picking a metric that reacts to changes quickly is neither necessary nor sufficient, but it certainly helps reduce the error on your calculation. You need to know how far off your set point you are and so you need as accurate a measurement as possible.

Imagine a cruise control for a car where the speedometer had a five second delay. You’d still stay at your desired speed on average but it would vary a lot more and require more work to get back to the desired speed. It would have to accelerate harder and brake harder.

Oh I see! I was misunderstanding what you meant by "quickly"; it seems you're referring to the sensor delay / how tight the feedback loop is.

Thanks for explaining!

See also Bode stability analysis.

Delay in the feedback path counts against phase margin, requiring you to reduce the loop bandwidth to maintain stability.

Adding hysteresis definitely helps to stabilize issues like this. Using rolling windows or exponentially decayed weighting has worked out well in my experience. In general, it seems like load based routing can be quite perfidious if you get the heuristic for “load” wrong.

I worked on a system that used total connections as our heuristic, measured by the load balancer. The problem we experienced was that some failure scenarios could cause requests to fail quickly compared to normal traffic. In effect what would happen is that a host would go into a bad state, start failing requests with a lower latency than normal traffic causing the load balancer to route an increasing amount of traffic to the bad host. This happened because the load balancer was only capable of measuring connections and didn’t discriminate between good/bad responses.

We ended up injecting fake latency into bad responses at the application layer which worked to prevent this sort of “black hole” effect.

It’d seem you could detect the “black hole” effect by checking the rate of connection change per node and if it goes beyond a certain limit to blacklist the node.
Xerox Grapevine experienced this in 1983. Servers that had free disk space for messages in a congested cluster would announce this, and other nodes would swarm and relocate objects there and you'd get oscillation.

> The moral of the story here is make sure you pick a metric that reacts to the change in request rate as quickly as your request rate changes!

The key things are that you don't react to changes in the metric faster than the metric can move (be appropriately damped), and that you react to the metric "smoothly" (e.g. pick a random server where the odds to get a specific server vary with the loading metric, like you mention).

As others say, it's fundamentally a controls problem... a controls problem where there are many, many actuators and the delay/phase shift is relatively unpredictable. Making things easy for the control system by making reaction smooth and the system overall react slowly (overdamped) is important.

That reminds me of a series of blogposts a while back about "the power of two choices", where you randomly pick/sample two nodes and pick the one with the least contention.

The claim is that it results in pretty decent behavior under load while avoiding some problems caused by delayed information.

https://ieeexplore.ieee.org/document/963420

I worked for a major cloud service a few years back, and got a wonderful introduction in to how loadbalancers are, for the most part, somewhat awful.

They work better when it's just one in front of a fleet of servers, and so have the total picture of what is going on, but of course that's quite the bottleneck. So you get two LBs, or more, and they each only have their notion of what the back end fleet is doing. There's no standard feedback mechanism to them at all.

Some offer approaches like measuring response time, but that doesn't work so great as soon as you consider APIs where no two requests perform the same. Was it a fast request that got answered slowly (back-end overloaded?), or a slow request that got answered quickly (back-end bored?). Who knows.

For the service I was working on a few years ago, no two requests are the same by any stretch of the imagination, even for the same API call, and came with a variation on request size, and computational power required to process them.

As you'd expect, traditional loadbalancer behaviour actually handled about things to an okay degree probably 90% of the time. That 10% was a real killer though.

I have similar experience. I once worked with a pretty big site which used a load-balancer configured to route all traffic to the server based on average response-time.

The intention was that if a server was returning results "quickly" that meant it was least-loaded, and could handle the newest requests.

What it actually meant though was that the server disk filled up, and it started returning "500, Internal Server Error" errors. Very quickly.

At the point the alarms were raised almost all incoming traffic had been routed to this dead/dying host.

I had almost the same thing happen once - but the server was serving requests 'very quickly' because it was caching everything based on the LB requesting it.
Netflix also wrote about the drawbacks on this approach https://medium.com/netflix-techblog/netflix-edge-load-balanc...
We had the same problem at Justin.tv with the video servers, with the added wrinkle that every choice had the potential to meaningly affect load for an hour or more. We eventually ended up putting extremely detailed information into a central database for the load balancer so that it could consider not only server load, but also the load on all of our internal and external network links.

We also had to keep track of how many people were on the webpage for a channel when it went live so that we could preemptively replicate the video stream to enough, but not too many, servers.

Sounds fun.
Yeah, I certainly wouldn't start at that end.

A useful metric for "load" is just as hard as doing the load balancing itself.

Requests can be vastly different, unless you have only one application they're also constantly changing and there are more load balancers involved (both horizontally and vertically as a single request can pass multiple). There are also numerous failure conditions under which responses are very fast.

In that situation it's easier to design for an even spread, then work to improve that metric as much as possible as more information becomes available.

Per Brendan Gregg's Gesamptkunstwerk, BPF Performance Tools, I feel like you should be able to measure instructions per cycle at the service level. Even in the cloud if exposed by Xen. And even at the resource utilization level for each container.

http://www.brendangregg.com/bpf-performance-tools-book.html

Of course, you can always just use cloudflare ;)

The problem is how fast and how frequently can you get that information back to the one more nodes making load balancing decisions? The local measurements, no matter the mechanism of collection, are usually the easiest/fastest part.
Obligatory racelbythebay post: https://rachelbythebay.com/w/2018/04/21/lb/