Hacker News new | ask | show | jobs
by nickvanw 2457 days ago
This is somewhat valuable, but misses any investigation as to why there were outlying latency spikes when using non-Envoy software for load balancing. Furthermore, using the average latency like this doesn't tell us much, especially with outliers making the graphs worthless for steady-state performance analysis.

My first thought is that the spikes are somewhat clearly the result of requests getting sent to pods that no longer exist, or are starting and not prepared to process requests. This might just speak to the method of configuration for all three of these underlying softwares and say absolutely nothing about how well they actually fare doing any load balancing.

If someone came to me with this at work, I would say it is the beginning of a series of troubleshooting steps to answer the question of why there are such outlying requests when using our load balancer of choice, and not an analysis of which software to pick.

Edit: Even worse is that this appears to be from a company that sells.. an API gateway built on top of Envoy.

1 comments

(one of the authors here)

Thanks for the feedback.

So regarding your hypothesis on the spikes being sent to pods that no longer exist/are starting: 1) it is the responsibility of the ingress controller on K8S to properly handle that situation 2) it would be highly unlikely for people to implement their own custom ingress controller around a given proxy (it's actually somewhat complicated) and 3) the pod theory wouldn't address the latency spikes seen on reconfiguration.

But you're right that there probably should be some explanation around why we think this is happening (I just didn't want to speculate too much; I suspect that the issue is with the hitless reloads implementation in the proxies which is tricky to do well).

Could it be at all related to the circuit-breaking behavior that nginx describes[1] in some of their reference architecture? Unclear to me which (if any) of these properties might be in play for this test.

[1]https://www.nginx.com/blog/microservices-reference-architect...

I suspect it's because of reloads:

https://kubernetes.github.io/ingress-nginx/how-it-works/#whe...

The NGINX ingress controller goes to some lengths to avoid reloads because it recognizes the hit from reloads. In Ambassador-land, we use Envoy's xDS APIs to avoid this problem. Not clear what the HAProxy ingress controller does.

The official HAProxy Ingress Controller uses the Runtime API [1] to avoid restarts/reloads and also has hitless reloads configured by default. HAProxy Technologies also contributed the capability to use the Runtime API within the jcmorais-haproxy ingress controller as well but it requires you to activate using the dynamic-scaling option [3]

One thing I wanted to point out is that the HAProxy Ingress Controller actually has over 25 [2] configuration options at the time of publishing, not 8 as mentioned.

While we have identified a few on our own we'd love to work with you further to identify any missing configuration directives that can help perform some more accurate benchmarks using the official HAProxy Ingress Controller.

disclosure: I work at HAProxy Technologies

[1] https://www.haproxy.com/blog/dynamic-configuration-haproxy-r...

[2] https://github.com/haproxytech/kubernetes-ingress/tree/maste...

[3] https://github.com/jcmoraisjr/haproxy-ingress#dynamic-scalin...

Thanks! Drop me a line (email in my profile) and would love to chat.

I updated the article to clarify that there were 8 configuration options at the time of testing (we started this effort awhile ago) and now there are 25.

We'd definitely like to rerun the tests with the official controller to use the Runtime API.