| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by KirinDave 2655 days ago
	Right, and this is "better" but if you start handling significant volume you're going to want HTTP/2 behind your proxy. Envoy does this, and it's a basically free way to just get more out of your hardware and network.

1 comments

blackflame7000 2654 days ago

Not necessarily because let's say your backend server is multi-threaded and assigns each connection to its own thread for DoS safety. Now with HTTP2, you will make one connection to the backend server even though you are multiplexing those requests they’re still being served serially by one thread+socket. Even if you Demux and use threadhandlers you have to remux and are still limited to the single socket.

Now if HAProxy makes multiple connections to the backend each one gets served by its own thread+socket and that's going to load much faster because at the very least it's going to get more attention from the OS. Furthermore, if you use the keepAlive header, and set long timeouts, then you don’t even have to pay the connection penalty. So essentially you’ve shifted thread management to HAProxy by virtue of its parallel connections which keeps the webserver code pretty simple. And in a C++ program simplicity is key to correctness

link

KirinDave 2654 days ago

I'm not sure I agree with most of this post. Firstly, a userland socket mux/demux implementation isn't such an insurmountable challenge. It's essentially the core of a good http/2 implementation. If you've got a good HTTP/2 implementation, you've necessarily got a good mux/demux solution.

> Now if HAProxy makes multiple connections to the backend each one gets served by its own thread+socket and that's going to load much faster because at the very least it's going to get more attention from the OS.

I'm not sure what you're basing this on. What is the technical definition of "more attention from the OS." If anything, limiting things down to a single process over one connection will improve latency. It'll can help minimize memory copies if you get volume because you'll get more than one frame per read (and you certainly aren't serving a L>4 protocol out of your NIC). Most importantly, it'll remove connection establishment and teardown costs. These aren't free.

Now, if you really are loading backends so that responsiveness is a problem, you'll need to appeal to whatever load balancing solution you have on hand. But most folks agree that this is faster.

> Furthermore, if you use the keepAlive header, and set long timeouts, then you don’t even have to pay the connection penalty.

But you still have head-of-line blocking, so you're still spamming N connections per client to get that concurrency factor. Connections aren't free, they have state and take up system resources. You'll serve more clients with one backend if you take up less resources per connection.

> So essentially you’ve shifted thread management to HAProxy by virtue of its parallel connections which keeps the webserver code pretty simple.

I don't think this is true at all. If you want to serve connections in parallel or access resources relevant to your service in parallel, you're going to need to do that. You can of course choose to NOT do this and spam a billion processes rather than use concurrency.

> And in a C++ program simplicity is key to correctness

I don't think this is unique to C++, but I'm also not sure it's really relevant here. From an application backend's perspective, it's very much the same model. They need a model that supports re-entrant request serving.

link

shawnz 2654 days ago

Why not just make multiple HTTP2 connections then?

link

blackflame7000 2654 days ago

First let me give some background. In the beginning there was HTTP1.0 where you send a request recieve a reply and then terminate the connection. A TCP connection required 3 round trip packets for every connection + a reques and a recieve meant that 60% of the delay was mearly getting ready to talk.

Http1.1 brought pipelining where you could use the keepalive header and send req1, req2, recN and then expect to recieve reply1, reply2, replyN. The replies are expected in the order they are requested.

Http2 adds a bunch of things. For one, the requests are in a binary format instead of a text in order to acheive better compression. Another thing is that it allows multiplexing. This is different from pipelining because now you can recieve replies out of order which allows small files not to get stalled out by large files.

However, HAProxy will Demux the http2 requests and separate them into multiple parallel connections to the backend server where each connection supports pipelining so as not to close immediately. Each request will use a connection that’s free (ie doesnt already have a pipelined req in progess) so this is effectively the same as http2 multiplexing since HAProxy will send them back to the client in the order they are recieved from the backend (which isnt necessarily the requested order aka multiplexed)

The benefit here is that

1) If you have multiple webservers, they dont have to each deal with the muxing/demuxing of streams and converting the binary to http.(might be possible to skip binary translation but code will be ugly)

2) If you have parallel connections each using threatpools to handle each request stream then you could start running into thread contention problems

3) You protect your C++ webservice with a battle tested service like HAProxy

link

KirinDave 2654 days ago

> Http1.1 brought pipelining where you could use the keepalive header and send req1, req2, recN and then expect to recieve reply1, reply2, replyN. The replies are expected in the order they are requested.

Remember though that because of the model, you cannot possibly serve these in parallel. You must serve them serially to be on spec.

> Each request will use a connection that’s free (ie doesnt already have a pipelined req in progess) so this is effectively the same as http2 multiplexing

It's not the same at all though, is it? HTTP/2 doesn't wait for each request to return. You could easily do exactly that same process with HTTP/2, and by decoupling the notion of "utilization" from "that connection is busy", you can actually balance to servers based on more sophisticated metrics.

> 2) If you have parallel connections each using threatpools to handle each request stream then you could start running into thread contention problems

You already have these problems with resource management for application services.

> You protect your C++ webservice with a battle tested service like HAProxy

Why does the http/2 architecture not get a load balancer but the HTTP 1.1 architecture does?

link

blackflame7000 2651 days ago

In a perfect world, I would recommend http2 all the way. But since an HTTP1.1 implementation is much easier to dissect I think it's a better choice for someone writing their own web service.

In response to your first 2 questions:

In front of HAProxy is HTTP2 and in the backend is say 32 long-lived HTTP1.1 connections to your web server each handled on its own thread. HAProxy demuxes the HTTP2 request and then parallelizes the parts over the 32 connections. Then whichever order they come back from the backend is the order in which they are returned to the sender. HAProxy also knows that the connection that just returned is now available to immediately work on a new request. It is not necessary to wait for a request to finish before scheduling the next request on a connection by virtue of pipelining. If you were round robin-ing through your connections and reusing available connections before pipelining, then head-of-line blocking will be fairly rare.

In response to question 3:

Now it's important to remember that thread performance is directly related to the number of physical cores. If the system has 4 cores, then running the web server with 4 threads will perform better than running with 40 threads due to the overhead of context switching. If you have 4 HTTP2 connections using a shared thread-pool to handle the multiplexed requests then there will be a performance hit as each connection waits for a thread to be available as another connection might use multiple threads simultaneously. In the solution I propose, since HTTP1.1 requests are already sequential in nature, each connection can be given its own thread and run as fast as it can. You can't do that with HTTP2 because then you would lose all ability to parallelize over a single connection. Really what it comes down to is scheduling tasks for CPU cores and a thread that has constant work in its queue will work faster than one that continuously returns a value and waits to be given another request to process. However, HTTP2 does have the advantage of scheduling on the thread-level vs the connection level. I suspect this results in better performance under very high loads and when there is a significant size/time-complexity differential between various assets which causes frequent head-of-line blocking.

In response to question 4:

I figured that if you already implemented HTTP2 and you were that concerned with speed that you wouldn't want to introduce a middleman, but HAProxy's overhead is pretty low.

link

shawnz 2654 days ago

I understand what you are saying. But why do you need to use http 1.1 behind the proxy for this setup? Why can't you use http 2 both behind and in front of the reverse proxy, but still demux into multiple connections at the proxy? Just because http 2 supports multiplexing, doesn't mean that you need to use it.

link

blackflame7000 2654 days ago

You definitely can, I'm just arguing that the complexity involved to implement the feature is not worth the real world performance gain. Furthermore, by adding that complexity you make it that much harder to make sure the program is free of undefined behavior.

link

shawnz 2654 days ago

Fair point, although this doesn't need to be implemented in the application code itself. In reality it should be implemented by an HTTP library like this, and any general purpose HTTP library ought to implement HTTP 2 anyway. So it could end up being that by consistently using HTTP 2 across the whole stack, you end up with less complexity. Plus you get the potential performance gains, even if small.

link