| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kdunglas 2052 days ago

Server Push has a use case for web APIs. I just published a benchmark showing that under certain conditions APIs using Server Push (such as APIs implementing the https://vulcain.rocks specification) can be 4x times faster than APIs generating compound documents (GraphQL-like): https://github.com/dunglas/api-parallelism-benchmark

They key point for performance is to send relations in parallel in separate HTTP streams. Even without Server Push Vulcain-like APIs are still faster than APIs relying on compound documents thanks to Preload links and to HTTP/2 / HTTP/3 multiplexing.

Using Preload links also fixes the over-pushing problem (pushing a relation already in a server-side or client-side cache), some limitations regarding authorization (by default most servers don't propagate the Authorization HTTP header nor cookies in the push request), and and is easier to implement.

(By the way Preload links were supported from day 1 by the Vulcain Gateway Server.)

However, using Preload links introduce a bit more latency than using Server Push. Does the theoretical performance gain is worth the added complexity? To be honest I don't know. I guess it doesn't.

Using Preload links combined with Early Hints (the 103 status code - RFC 8297) may totally remove the need for Server Push. And Early Hints are way easier than Server Push to implement (it's even possible in PHP!).

Unfortunately browsers don't support Early Hints yet.

- Chrome bug: https://bugs.chromium.org/p/chromium/issues/detail?id=671310

- Firefox bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1407355

For the API use case, it would be nice that Blink adds support of Early Hints before killing Server Push!

1 comments

littlecranky67 2052 days ago

I'm sorry to disappoint you, but your benchmark methodology is flawed. You did not consider TCP congestion control/window scaling. TCP connections between to peers are "cold" (=slow) after the 3-way handshake, and it takes several roundtrips to "warm" them up (allow data to be sent at a level that saturates your bandwidth). The mistake you (and most other people performing HTTP load benchmarks) made, is that the Kernel (Linux, but also all other major OS Kernels) caches the state of the "warm" connection based on the IP adress. So basically, when you run this kind of benchmark with 1000 subsequent runs, only your first run uses a "cold" TCP connection. All other 999 runs will re-use the cached TCP congestion control send window, and start with a "hot" connection.

The bad news: For website requests <2MB, you spend most of your time waiting for the round-trips to complete, say: you spend most of the time warming up the TCP connection. So its very likely that if you redo your benchmarks clearing the window cache between runs (google tcp_no_metrics_save) you will get completely different results.

Here is an analogy: If you want to compare the acceleration of 2 cars, you would have race them from point A to point B starting at a velocity of 0mph at point A, and measure the time it takes to reach to point B. In your benchmark, you basically allowed the cars to start 100 meters before point A, and will measure the time it takes between passing point A and B. Frankly, for cars, acceleration decreases with increasing velocity; for TCP its the other way around: the amount of data allowed to send on a round trip gets larger with every rountrip (usually somewhat exponentially).

link

kdunglas 2052 days ago

Hi, and thanks for the feedback.

I'm aware of this "issue" (I must mention it in the repo, and I will). However, I don't think that it matters much for a web API: in most cases, inside web browsers, the TCP connection will already be "warmed" when the browser will send the first (and subsequent) requests to the API, because the browser will have loaded the HTML page, the JS code etc, usually from the same origin. And even if it isn't the case (mobile apps, API served from a third-party origin...) only the firsts requests will have to "warm" the connection (it doesn't matter if you use compound or atomic documents then), all subsequent requests, during the lifetime of the TCP connection, will use a "warmed" connection.

Or am I missing something?

Anyway, a PR to improve this benchmark (which aims at measuring the difference - if any - between serving atomic documents vs serving compound documents in real-life use cases) and show all cases will be very welcome!

link

littlecranky67 2052 days ago

What you say would be true if images/js/css is truely served by the same IP adresse (not hostname!). In reality, people use CDNs to deliver the static assets like images/js/css, and only the API calls are used to warm up the TCP connection to the actuall data backend. Also things like DNS load-balancing would break the warm up, because the congestion control caches operate on IPs, not hostnames.

Additionally, its really hard to benchmark and claim it is "faster". You will always measure using the same networking conditions (latency, packet loss rate, bandwidth). So if a benchmark between two machines yields faster results using technology A, the same benchmark may return complete different results for different link paramters. Point being: Optimizing for a single set of link parameters is unfeasible, you'd have to vary networking link conditions and find some kind of metric to determine what means "fast": Average across all paraters? Or rather weighted averages depending on your 98th percentile of your userbase etc.

Regarding improving the benchmarks: It is really hard, since (a) docker images cannot really modify TCP stack settings on the docker host and (b) client and server would have to flush their TCP congestion control caches at the same time, and only after both flushed the next run can be conducted.

EDIT: Regarding serving static assets to warm up the connection: In that case, you'd have to include page-load time to download that assets in your meassurement (including time to parse+execute JS) and overall time comparison. Switching the API prototocol from REST to something else will probably not have that big of an impact on the total load time then. Saying: If you spend 80% of your time downloading index.html, css, some javascript etc. and querying your API accounts only 20% of the time, you will only be able to optimized on that 20%. Even if you cut load times for the API calls in half, overall speedup for the whole page load would be 10%.

link