|
|
|
|
|
by plandis
2451 days ago
|
|
Adding hysteresis definitely helps to stabilize issues like this. Using rolling windows or exponentially decayed weighting has worked out well in my experience. In general, it seems like load based routing can be quite perfidious if you get the heuristic for “load” wrong. I worked on a system that used total connections as our heuristic, measured by the load balancer. The problem we experienced was that some failure scenarios could cause requests to fail quickly compared to normal traffic. In effect what would happen is that a host would go into a bad state, start failing requests with a lower latency than normal traffic causing the load balancer to route an increasing amount of traffic to the bad host. This happened because the load balancer was only capable of measuring connections and didn’t discriminate between good/bad responses. We ended up injecting fake latency into bad responses at the application layer which worked to prevent this sort of “black hole” effect. |
|