| HN Mirror

In a perfect world, I would recommend http2 all the way. But since an HTTP1.1 implementation is much easier to dissect I think it's a better choice for someone writing their own web service.

In response to your first 2 questions:

In front of HAProxy is HTTP2 and in the backend is say 32 long-lived HTTP1.1 connections to your web server each handled on its own thread. HAProxy demuxes the HTTP2 request and then parallelizes the parts over the 32 connections. Then whichever order they come back from the backend is the order in which they are returned to the sender. HAProxy also knows that the connection that just returned is now available to immediately work on a new request. It is not necessary to wait for a request to finish before scheduling the next request on a connection by virtue of pipelining. If you were round robin-ing through your connections and reusing available connections before pipelining, then head-of-line blocking will be fairly rare.

In response to question 3:

Now it's important to remember that thread performance is directly related to the number of physical cores. If the system has 4 cores, then running the web server with 4 threads will perform better than running with 40 threads due to the overhead of context switching. If you have 4 HTTP2 connections using a shared thread-pool to handle the multiplexed requests then there will be a performance hit as each connection waits for a thread to be available as another connection might use multiple threads simultaneously. In the solution I propose, since HTTP1.1 requests are already sequential in nature, each connection can be given its own thread and run as fast as it can. You can't do that with HTTP2 because then you would lose all ability to parallelize over a single connection. Really what it comes down to is scheduling tasks for CPU cores and a thread that has constant work in its queue will work faster than one that continuously returns a value and waits to be given another request to process. However, HTTP2 does have the advantage of scheduling on the thread-level vs the connection level. I suspect this results in better performance under very high loads and when there is a significant size/time-complexity differential between various assets which causes frequent head-of-line blocking.

In response to question 4:

I figured that if you already implemented HTTP2 and you were that concerned with speed that you wouldn't want to introduce a middleman, but HAProxy's overhead is pretty low.