Hacker News new | ask | show | jobs
by nocarrier 3619 days ago
This works for your use case which is great, but in my experience, you get gains with full http2 between edge and origin that you miss out on if you demux or fall back to http 1.1. We saw slightly better median latencies and much better >90th percentile latencies with full http2 from edge to origin, versus when we did http2 to edge users and http 1.1 to origin.

At Facebook, we had clients talk http2 to edge nodes near the users, which then edge terminated TLS and opened up the request to figure out which upstream was best to forward the request to. We kept http2 conn pools to each upstream from a given edge location. We also had tunable amounts of domain sharding so that the client could open N http2 conns at once to a given edge hostname (cdn[1-4].facebook.com vs just api.facebook.com or www.facebook.com). This helped with head of line blocking, assuming that your ISP wasn't hopelessly over-congested. In that case, no amount of sharding will help you, and sometimes it even works against you depending on what tcp middleware your mobile ISP is using.

We did different TCP settings for edge<->origin conns than edge<->user which let us balance between latency and throughput for origin traffic as needed, and we did some interesting work tuning edge tcp settings differently for different user networks too. Things like determining the best congestion window for a given ISP's users, how large to set the send buffers, etc.

Admittedly, this work was much easier since we had good http and tcp instrumentation, and we had high quality links between edge and origin. I just wanted people reading your comment to understand the tradeoffs. We saw a measurable difference in perf with a full http2 stack that made up for the extra complexity of running http2 top to bottom.

1 comments

Your experience with http 1.1 to origin matches mine. I'm not suggesting using http 1.1 from edge > origin. Most CDNs are currently doing that, and it doesn't perform well.

I should be more specific in my description: We are using http2 everywhere.

There's a single http2 connection between the client and edge (the customer could setup sharding, but there's no way for us to force it).

That edge node has a pool of http2 connections to other PoPs, and maintains QoS stats for each PoP. It'll pick the fastest path to send the request. The pool is large enough to service all of the requests that are being handled at any one time. So in effect, if the client sends 10 requests to the edge node, each of those will be sent edge<->edge over 10 http2 connections. (The connections in the pool are kept active, so there's no tls resumption delay either.)

Once the edge node near the origin has received it, there's a smaller http2 pool to the origin server.

So at the origin, it's handling a handful of requests in parallel.. even if there's only a single client making all of the requests.

I think that's a bit clearer.. I've been happy with the performance we're getting from this.. but plan to test sctp and quic later this year for edge<->edge.

Ah cool, that's similar to what we did. I think http2 gets a bad rap, we saw a lot more positives than negatives from it. You should do a writeup after you test sctp and quic, I'd be pretty interested in reading it and I bet others would too.