Hacker News new | ask | show | jobs
by TechTeam12 2979 days ago
>http2_push /static/css/main.css;

Silly question, but what's the use case for the HTTP/2 Push? Their example with pushing doesn't make sense to me. Why would you want to push static content?

10 comments

In 2016, the Chromium team at Google produced a document [1] that examines usecases for HTTP/2 Push, talks about deployment models, and analyzes whether it's worth it. In this particular case, you'd push static content because you know it will be needed later, and this way the information arrives in the HTTP header instead of in the payload's content body, so by the time 'main.css' is needed, the UA's HTTP cache may already be populated with the file.

That being said, I fail to see how in the general case, setting static headers in the server software's config for Push is useful [2][3], and wish that more implementations converged on a common way of describing what to push [4], so that tools could be built around discovering dependencies, and around interpreting that manifest to execute push.

[1] https://docs.google.com/document/d/1K0NykTXBbbbTlv60t5MyJvXj... [2] https://news.ycombinator.com/item?id=14077955#14081237 [3] https://news.ycombinator.com/item?id=12719563#12722383 [4] https://github.com/GoogleChromeLabs/http2-push-manifest

Pushes are probably best implemented in a caching layer, not manually describing what to push. A web server should not just cache resources, but also learn what kind of resources are often requested with each page and just push those next time someone makes a request. And some sort of push prediction policy should be configurable.
It's not sensible for pushes to be implemented in a caching layer, because pushes are effectively the manual overrides to the User-Agent's own caching; conversely, the User-Agent's cache is perfectly appropriate as a cache, and doesn't need HTTP/2 Push to work. HTTP/2 Push is effectively the server declaring they know better, so they prime the UA's cache to avoid additional roundtrips.

Nginx does have a module [1] and a corresponding configuration option to scan outgoing headers for Link header preload directives, and once it has learned of a preload being declared by a resource, it will push that resource thereafter. Nginx talks about the justification for this feature, where they too admit that statically configuring pushes in the server config is not terribly useful -- it's quite often the wrong place to specify relationships between resources.

[1] https://www.nginx.com/blog/nginx-1-13-9-http2-server-push/

If you can predict with high enough accuracy what resource is going to be requested by the client next, I don't see why pushing it would be a bad idea. Speculation is how we hide latency after all.

And if you think about it, static pushes in general have very limited usefulness, almost non existent. Imagine when some url becomes popular and almost all of the requests to that url come from people who never visited the website before. It would make sense for a web server to learn what kind of resources clients request with that url and start pushing those resources to people ahead of time.

For that it's easier to parse the pushed content. If it's HTML, then catch stlyesheets, JS, and some other static <img src=.../> things. It doesn't have to be flawless, after all it's just a speed-up. (And if you want a speed up write nice markup.)

Similarly, it should be the backend behind the reverse-proxy that knows what's the page that has been just rendered, and knows about the user's session (is it brand new, or maybe it's not new, but still needs to push things because it's too old, and since then that particular page's background changed, etc.).

And in case of an Angular/React/SPA thing, then the "bundler/compiler" should create a list of things to push for various URLs. Or the Angular/React team should talk with the Nginx team to figure out how to speed up things. (In case of SSR - server side rendering - the NodeJS server can emit the necessary Link headers, for example.)

Common, how parsing things is easier, than gathering some very basic stats?
> It's not sensible for pushes to be implemented in a caching layer, because pushes are effectively the manual overrides to the User-Agent's own caching; conversely, the User-Agent's cache is perfectly appropriate as a cache, and doesn't need HTTP/2 Push to work.

By "caching layer" he probably meant a proxy or load balancer level cache not the user agent cache. It would make total sense for a load balancer to statistically discover relationships.

Once you have a config setting, you've done all the work to actually get Push support, which is the hard part. Support for reading a manifest can be added later, or other people can write tools to read manifests and generate config files for the server.
Let's say you have an HTML page which links to main.css. Ordinarily, the request goes:

    Client: GET /index.html
    Server: <index.html>
    Client (after parsing): GET /main.css
    Server: <main.css>
Loading the page thus takes 2 round trips, one for the main page and one for the content. (Or more, if you have e. g. includes in the CSS.) Here's what it would look like with HTTP/2 Push:

    Client: GET /index.html
    Server: <index.html>, <main.css> (PUSH)
This only takes 1 round trip; since the server knows that main.css will be required shortly it can preemptively send it. In particular, this might offer a significant speedup for high-latency connections; it also theoretically reduces the need for bundling tools since you can have the server just push all of the individual files.

The obvious problem with this scheme is that if the client already has main.css then it's a waste of bandwidth to send it again. The client can cancel the push, but by the time it finds out about it a bunch of data has already been sent. There is a proposal for 'Cache Digests' which will allow the client to send a Bloom filter of its cache so the server can tell whether or not it has the file already, but as far as I'm aware no major client or server has implemented this yet.

I think there's a common misconception with the term "push". HTTP2 doesn't push in terms of a push notification, but rather "pushes" assets down the connection that are known to be needed by the currently transferred document (whatever that may be).

That way, the web server can pro-actively push the named stylesheet to the client as it knows that the stylesheet is needed to render the page. That way the client doesn't have to ask the server (which would result in a new roundtrip).

"that are known to be needed by the currently transferred document"

How does the server knows what the browser/client "needs" ? The client can have the cached stylesheet already. Making the server "in control" seems wrong and make things even more complicated.

That's the thing, it doesn't. HTTP2 Push is one of the big "open field" of HTTP2, and to know whether a document needs to be pushed will rely on good heuristics and black magic.

There is however a small standard that's emerging, pioneered by h2o, called casper (https://h2o.examp1e.net/configure/http2_directives.html#http...). The idea is that all resources ever sent to the client are stored in a probabilistic data structure in a cookie. On every request the structure is sent back to the server, which can then check whether the resource has good chances to be already known by the browser.

By the way there are some benchmarks done by h2o's author here: http://blog.kazuhooku.com/2015/10/performance-of-http2-push-.... The conclusion is all yours.

The client can cancel the push. But yes, there's definitely wasted bandwidth here - the reason to still do it is that connections are now fast enough that the extra download time is small compared to the time required to parse HTML/send new HTTP request/receive response/render CSS.
That's a very Silicon Valley way to look at things. How much does that bandwidth cost on dial-up or 2G?
The client will kill the connection if it has the file cached, sooooo, not much.
I'm more concerned with a more "traditional" setup - say a festival providing WiFi to many people through limited upstream. Used to be, you could provide a caching proxy locally.

With the war on mitm, it's really hard to set up something that scales traffic in this way - even if the actual data requested by clients could readily scale.

I know it's a trade-off between security and features - but it still makes me sad.

It's 2G. By the time the cancel is received by the server, the server will have sent the resource, the bytes will have traveled and the user will be billed.
A copy of the spec can be found here:

https://http2.github.io/http2-spec/#PushResources

There's a few interesting things here that I want to point out: * "A client can request that server push be disabled" This is part an explicit parameter in the client request to a server for anything, https://http2.github.io/http2-spec/#SETTINGS_ENABLE_PUSH.

* "Pushed responses that are cacheable (see [RFC7234], Section 3) can be stored by the client, if it implements an HTTP cache. Pushed responses are considered successfully validated on the origin server (e.g., if the "no-cache" cache response directive is present ([RFC7234], Section 5.2.2)) while the stream identified by the promised stream ID is still open"

Note that pushed content first starts with a PUSH_PROMISE message to the client, which the client can decide on its own volition to reject. Note the spec for a PUSH_PROMISE frame is here, https://http2.github.io/http2-spec/#PUSH_PROMISE and it's extremely small. Even on 2G or dial-up it's by design negligible.

* "Once a client receives a PUSH_PROMISE frame and chooses to accept the pushed response, the client SHOULD NOT issue any requests for the promised response until after the promised stream has closed.

If the client determines, for any reason, that it does not wish to receive the pushed response from the server or if the server takes too long to begin sending the promised response, the client can send a RST_STREAM frame, using either the CANCEL or REFUSED_STREAM code and referencing the pushed stream's identifier. "

Wittingly or otherwise, your message comes across as "everyone on the standards boards are idiots, don't think about anything beyond the valley, and I'm smarter than they are." That's beyond ridiculous. The standard was designed by subject matter experts from right across the world, with interests in web technologies across all sorts of markets, including the developing nations where every single byte is important. There's a lot that has been designed in to the HTTP 2.0 specification to account for that and to explicitly try to improve end user experience under those conditions.

The server doesn't send the data every time. First it sends a data frame letting the client know "hey, I've got this thing if you need it" and the browser can respond with a frame saying "nah, don't need it".
The main problem with http2 push and why it’s pointless is that it’s not cache aware.

So you’re pushing unrequested data to everyone regardless.

h2o tries to solve this problem with a special cookie. More here:

http://blog.kazuhooku.com/2015/12/optimizing-performance-of-...

But without something like that it’s a feature that will never really gain traction.

The danger here is that you push too much, but the actual response will still be delivered almost as fast (due to non-blocking behavior in HTTP/2 connections), so sure, it's not optimal, but there are a lot of use cases besides static assets where it is very useful.
Basically to change this...

- Request index.html

- Parse index.html

- Request .js and .css and .jpg/png, etc found inside

- Parse .js and .css found inside

- Request additional .js, .css and .jpg/png, etc found within js/css

- Parse....

Into this...

- Request index.html, .css, .js, *.jpg/png

- Parse all

To avoid multiple round trips and improve initial page load time if you "know" what's going to be requested from the beginning.

So that the files can be downloaded before the html is parsed if I recall correctly.
You push it along with the initial page, before the browser has even requested it.
That command probably goes in a "location" block that matches the set of HTML pages that use main.css.

Normally, the browser parses HTML, finds a <link> tag that mentions main.css, and then requests main.css. With HTTP/2 push, by the time the browser has finished parsing the <link> tag, main.css has already been delivered.

If the browser already has main.css in its cache, it can reject the push.

If you agree that using a CDN for static content is a good idea, then it would seem HTTP/2 Push is useless. The website is served from your servers while the static content is served from a CDN so you can't "push" it in the same stream as you webpage content. Am I missing something here?
Yes, you can't push cross-origin.[1] However, there's still a lot of use-cases where this is useful, such as if your entire site is static content, or if your app servers are behind the CDN as well.

[1]: Yet. I believe the web packaging standard (intended, among other things, to replace AMP) allows pushing bundles signed by other origins.

So there are 3 basic cases that websites use:

1. Site and static content are served directly by your webserver. ( HTTP2/push helpful )

2. Site served by your web servers but static content is served by a CDN ( HTTP2/push NOT helpful )

3. Site and static content proxied by some service. ( HTTP2/push helpful )

It's a way to preemptively send assets to the client before they request them.
Pretty much the same effect is granted to any HTTP version by simply including stylesheets or scripts verbatim into the HTML.
That's anything but "simple" - you might want to reuse those stylesheets/scripts on other pages as well, for example; if you inline them into HTML, you're now wasting far more bandwidth, as you're unconditionally pushing them with every HTML response.
you guess what the page might need and 'push' the contents down so that the page loads faster.