Hacker News new | ask | show | jobs
by bgentry 4140 days ago
It's time to begin the long process of unwinding all the hacks that we've built to make HTTP/1.1 fast. No more concatenation of static assets, no more domain sharding.

The future looks more like this, as the default, with no special effort required: https://http2.golang.org/gophertiles

May nobody else have to suffer through writing an interoperable HTTP/1.1 parser!

6 comments

> May nobody else have to suffer through writing an interoperable HTTP/1.1 parser!

Yes, now it'll be much easier than parsing plain-text. Now they just have to write a TLS stack (several key exchange algorithms; block ciphers; stream ciphers; and data integrity algorithms); then implement the new HPACK compression; then finally a new parser for the HTTP/2 headers themselves.

Now instead of taking maybe one day to write an HTTP/1.1 server, it'll only take a single engineer several years to write an HTTP/2 server (and one mistake will undermine all of its attempts at security.)

If you are going to say, "well use someone else's TLS/HPACK/etc library!", then I'll say the same, "use someone else's HTTP/1.1 header parsing library!"

HTTP/2 may turn out to be great for a lot of things. But making things easier/simpler to program is certainly not one of them. This is a massive step back in terms of simplicity.

I was under the impression that the parent was making the simplicity argument for people on the other side of things - people writing frameworks and websites. Using someone else's HTTP1.1 stack doesn't solve those problems.

Then, separately, writing interoperable HTTP 1.1 is hard because it was designed/taken up ad-hoc in a time of relatively immature browsers. I would expect HTTP2 to increase standardisation in the same way newer HTML/CSS specs have relative to the late-nineties. That doesn't mean that initial implementation will not be more difficult, but it's done once every 20 years (per-vendor).

Have you tried writing anything more than a very simple HTTP/1.1 parser/server? It's actually not as easy as it seems at first - edge cases everywhere, different user agents doing subtly different things, etc. etc.

Your argument is invalid in my opinion. HTTP/1.1 is not simple to implement to any decent level of completeness and correctness, and HTTP/2 does fix a fair few things.

Anyway, there are already plenty of good tools for debugging HTTP/2 streams (Wireshark filters, etc.), and there's only going to be plenty more as time goes by.

> Have you tried writing anything more than a very simple HTTP/1.1 parser/server?

Honestly? No. I wrote an HTTP server that runs my site just fine. It also functions as a proxy server so that I can run Apache+PHP (for phpBB) on the same port 80. (The reason I don't just use Apache is that I generate my pages via C++, because I like that language a hell of a lot more than PHP.) I also have had the HTTP client download files from the internet for various projects (my favorite was to work around a licensing restriction.)

I get around 100,000 hits a month, and have not had any problems. If you think issues will arise when I start reaching into Facebook levels of popularity ... I'm sure they will. But, I'll never get there, so to me it doesn't really matter.

So for my use case, HTTP/2 is unbelievably more challenging and costly to support. Especially as I have about seven subdomains, and nobody's giving out free wildcard SSL certs.

I also didn't even say the added complexity is a bad thing+. Modern Windows/Linux/BSD are infinitely more complex than MS-DOS was, too. I was just pointing out that the OP's elation was misguided. (+ though to be fair, I do believe things should be kept as simple as is reasonable.)

Also, I strongly challenge this notion that you have to be 100% standards-complaint with the entire RFC to run an HTTP/1 server successfully. Because not even Apache is remotely close to that. The mantra is liberal on your input, conservative on your output. And everyone follows that. And as a result, no major projects out there are spitting out headers split into 300 lines using arcane edge cases of the RFCs.

HTTP/1.1 is only complicated if you want to support all of the optional features.
So won't HTTP/2 have these edge cases also?
No. Most of the edge cases arise from trying to parse an underspecified plaintext protocol. Everything in HTTP/2 is length-prefixed and unambiguously specified. That makes it dramatically easier to write a compliant parser or client.
Essentially, no.

HTTP/1.1 contains tons of optional features. Practially no two implementations support the same set.

HTTP/2 is all 100% mandatory. Any compliant HTTP/2 implementation will support an EXACT set of known features.

Won't that create the same problems as XML and XHTML where full compliance is/was mandatory -- and the reality turned out to be different?
Probably not, most, if not all, non-compliant XML/XHTML is either written by hand or by very bad generation tools. In the case of HTML/2 it's a protocol that needs to be implemented by browsers and web servers and there are only so many of those. In the case of XML/XHTML any person throwing up a website or sending a document does the generation with a different set of tools(or by hand)
> HTTP/2 is all 100% mandatory

That's a nice idea in theory, but what makes you think anyone is going to adhere to that?

Developers have always, and will always, do whatever they want when they implement your standards.

It was bad enough that when I worked on a binary delta patching format, I made sure there were absolutely zero possible undefined values, because I knew someone might try and use them to add new functionality in.

For something as complex as HTTP, I can guarantee you people will ignore parts of the spec they don't care about. And you can yell at them and say it's not a valid/legal HTTP/2 implementation, but they won't care. They'll keep on doing what they're doing.

It's still a much better situation than HTTP/1.1. At least an HTTP/2 compliant implementation has an exact definition, that it either is, or isn't. An HTTP/1.1 implementation has a vast number of corner cases and optional bits.

Sure, if you're out-of-spec, all bets are off. It's just that with hTTP/1.1 even 100% in-spec implementations are a pretty wide target zone.

> HTTP/2 is all 100% mandatory. Any compliant HTTP/2 implementation will support an EXACT set of known features.

But over the next couple of years, won't people come up with new ideas and add them as optional extensions? How is that handled?

I suspect some of these optional extensions will be really useful in special cases such as support for LZMA/LZHAM compression in addition to just gzip.

There is support for extensions, but they're, well, extensions. The only thing the protocol specifies is that a compliant implementation must pass-through unchanged any block it doesn't understand.

Compare with HTTP/1.1 where for instance the entire content negotiation mechanism is optional and clients need to be able to deal with it not being available.

Well in that case Apache Ngix Microsoft and the IETF need to keep control of the satndard any one that tries to add the http/2 equivelent of <blink> gets taken out and shot.

I have seen this before with OSI when MCI decided that part of the x.400 standard was optional. And not to mention ICL who thought that starting counting from 0 was a good idea when that standard said MUST START from 1 (and you wonder why the UK doesnt have a mainframe maker any more)

And the lack in simplicity is why it will fail.

People are good with dealing with a small number of simple things that can be stacked together. Throw in a human-readable data stream, and you're set to understand and use a stack of simple programs.

People are not good with dealing with a single monstrous object of unfathomable proportions, they will try to break it down in things they understand. If the thing is too complex, with too many inputs, too many outputs and too many states, this is a recipe for confusion. This is why overly complicated things always fail in face of simple things.

One could argue that FTP/SFTP was just as good as transferring bytes over network, but HTTP/1.0 won because it was simpler.

HTTP/2 was written to tickle the egos of its developers, following the principle - it is hard to write, it is hard to read. And its downfall is going to come from this problem.

I'm really confused by comments like these. Are you somebody who implements low-level network protocols?

I've written parsers and generators for plenty of binary protocols. It's actually really not that bad - you just need slightly different tooling. Yes, if somebody else hasn't written those it takes a bit longer because you have to do that yourself, but you save a lot of time because it's far easier to parse than text. And guess what - people have already written plenty of tooling for HTTP/2 already... And HTTP/2 is fairly straightforward as protocols go (you wouldn't believe the crazy proprietary control protocols around the place - trust me, HTTP/2 is not at all bad)

The 'downfall' of HTTP/2 is also a real long shot - most people are already browsing in browsers that support it, and for many web site owners, using it is literally adding two lines to an nginx configuration file...

> People are good with dealing with a small number of simple things that can be stacked together. Throw in a human-readable data stream, and you're set to understand and use a stack of simple programs.

Not true! Text parsing is a pain in the ass; give me a well-documented binary protocol any day. On the upside, binary protocols tend to force good documentation. HTTP/1.1 is far from simple; every browser supports a slightly different implementation and the server is expected to serve to all of them. But a binary protocol is not any more difficult than a text-based protocol for someone with a decent knowledge of CS. If you don't have a decent knowledge of CS, you probably shouldn't be writing code at the protocol level.

Besides, who in their right mind outputs directly to ANY protocol these days? Unless you're building a web server, you should be doing it through an abstraction layer because it's a proper architecture practice. Once abstraction layers are built for all of the major languages (which I'm willing to bet has already happened) it will become a non-issue.

I haven't looked more into that, but wouldn't it also be viable now to start HTTP/1.2 with e.g. a more restrictive header grammar, restricting all existing features to what's actually used, at least on the server side? Clients with 1.1 support would keep working, but future clients would be simplified.
That would be absolutely lovely.

Since we're not viewing headers manually on 80x25 terminals anymore, we could do away with multi-line header values. That alone would drop off most of the complexity. (Being perfectly honest, even though it's part of the standard, you don't have to parse them now, anyway. I don't, and I've never had anyone complain to me about the site not working. Nothing mainstream sends them for the important fields.)

Add a Server-Push header (filename+ETag), and we could eliminate most extraneous 304 Not Modified requests. Have browsers actually acknowledge Connection: keep-alive instead of opening tons of parallel requests. And leave this as the "hobbyist level, can't afford wildcard SSL certs" option, and I think it'd be quite beneficial.

If browsers want to warn that it's not encrypted, fine. So long as they don't go into ridiculous hysteria levels like they do now with self-signed certs.

One immediate potential downside is Apache. It completely ignores the protocol request. If you ask for "GET / HTTP/1.2", or even "GET / HackerNewsTP/3.141e", it will happily reply with "HTTP/1.1 200 OK"

As a result, the negotiation would be trickier than with HTTP/2.

But like you said, it could be done in a way that it's 100% backward-compatible with existing 1.1 software, so long as their responses are also in a compatible, simplified format (and most already are.)

> If browsers want to warn that it's not encrypted, fine. So long as they don't go into ridiculous hysteria levels like they do now with self-signed certs.

I don't believe they can do anything apart from what happens now. Imagine someone manages to redirect your traffic. You were talking to some website which used known certificate, but this time you got a self-signed one. The browser has two options essentially:

- continue the connection - in this case you just handed over your session cookie, the person on the other side can act as you on that website

- go into "ridiculous hysteria levels" and tell you that the cert presented by the server is not trusted - so do what browsers do right now

There's really no situation where the first option should be allowed. How option 2 is implemented is the interesting detail.

I am with you up to letting people use unencrypted HTTP. I assume people in this can use telnet instead of ssh because it is simpler. No, browsers should drop support for unencrypted HTTP soon after Let's Encrypt goes live.
> browsers should drop support for unencrypted HTTP soon after Let's Encrypt goes live.

To the people who keep insisting everything needs encryption: No it doesn't. Fuck off!

You don't see me forcing PGP on on your email, do you? No? Fine, then let us non-weirdoes keep using plain HTTP where we want it, where we have determined that it is a good fit for our needs.

Besides, this is purely a theoretical concern because a browser which drops support for plain HTTP wont have any user-base as soon people discover that 95% of the internet will broken when using that browser.

> You don't see me forcing PGP on on your email, do you?

Not pgp (as in, not end-to-end encryption). But hopefully in most cases you are forced to encrypt your email (smtp/tls), servers forwarding your email are likely using encryption (smtp/tls between servers), and you're pulling the email over encrypted channel (imaps). Alternatively your mail submission/collection goes over https to the email provider.

And yes, I will insist on everyone using encryption in mail, web, everything. Because once you actually want to use it for some reason, you don't want it to be completely different from all your other traffic, basically screaming "hey, I'm trying to hide some data here, because all my other connections are in plaintext".

Fortunately we're at the stage where everyone is actually forced to use encryption for a lot of their traffic.

So, is the problem encryption or manual encryption ?

I presume you don't care about encryption when you send emails, and yet if you're using a big name your emails will be encrypted without you even knowing it.

That's why I keep wanting to put "automatic" encryption everywhere, and would rather have your browser demote plain HTTP as insecure as a TLS connection with RC4-MD5, and display good security connection with a higher "indicator" than those, even if the certificate is self-signed (yet not as high as a trusted communication)

In practice that would mean "this connection is PROBABLY secure. If you really care about what you're about to do, STOP NOW. If you don't care just go on".

"Automatic" PGP (or really E2E encryption) would be awesome, but there is still far too much manual work for it to happen. Maybe one day we'll be there.

I assume you use telnet instead of ssh too. Once you do a little research and spend a little time figuring out what can actually be done (and is done constantly) to the unencrypted HTTP (anything from user tracking, to ad injections, to identity theft), you will realize just how wrong you are. Yes, HTTP needs to die. Sorry it's taking you a while to see it.
HTTP/2 is easier to parse than HTTP/1.1 because there's less edge cases

Also a bonus: no more "Referer" (sic)

If anyone wants to learn more about optimizing for HTTP/2, unwinding HTTP/1.1 hacks, and strategies to optimize for both versions at the same time, Ilya Grigorik's "High Performance Browser Networking" is an excellent resource:

http://chimera.labs.oreilly.com/books/1230000000545/ch13.htm...

Couldn't agree more, its also one of the best written tech books I've ever read.
Thanks. That book is a great resource.
If it requires a book to optimize for HTTP2, doesn't that counter your comment's parent's point? It's supposed to be simple.
One can write a book about literally anything. But besides that, it already "required" a book – at least two of them in fact, both published before Google even announced SPDY: "High Performance Web Sites" [1] in 2007 and its sequel "Even Faster Web Sites" [2] in 2009, both by Steve Souders.

What are they about? Essentially, optimizing your HTTP responses for the ways in which actual web browsers make HTTP requests. Any web performance analysis tool worth using (like YSlow and PageSpeed – both of which Steve Souders was involved in btw) recommended the practices outlined in those books.

So, no. I don't think a new book with updated practices says anything about the protocol. The optimization tips from this book will simply become widespread common knowledge the same way they did in the past.

[1] http://shop.oreilly.com/product/9780596529307.do [2] http://shop.oreilly.com/product/9780596522315.do

Simple to use doesn't mean simple to create. Even a simple and small code base doesn't mean the think-process preceding the actual programming was simple.
It requires a small section in a book to tell you how to undo all the tricks you've had to learn in the past 10 years to make an HTTP/1.1 website fast. Most of that is irrelevant and detrimental with HTTP/2, which is kind of the point. You'll get the benefits of those optimizations without having to do anything special to get them.

That book is also a phenomenal resource on the performance of all things web-related, so you should check it out regardless of any concerns you have about HTTP/2.

The real problem, I think, will be moving to "idiomatic" HTTP2-centric design (lots of little resources, relying on parallel chunked delivery and server-suggested retrieval) while still keeping HTTP/1.1 clients fast.

I'm betting there will come a polyfill to make HTTP/2 servers able to deliver content to HTTP/1.1-but-HTML5 web browsers in an HTTP/2-idiomatic way—perhaps, for example, delivering the originally-requested page over HTTP/1.1 but having everything else delivered in HTTP2-ish chunks over a websocket.

Now that all of the browsers have moved to an auto update model and the only ones that haven't are on mobiles that have short lifespans, that's really only going to be important to the big players for about a year or two.

Or maybe I'm overly optimistic!

Think all software written for all devices involving every single embedded thingie with a webclient reporting in or polling data everywhere: Every ATM, every POS device, every IOT-device made yet and everyone to be made in the future. Every little gadget with a network stack ever made.

And you're telling me all of those will have an updated HTTP-stack within 2 years?

You're not being "overly optimistic", you're being tragically unrealistic.

Like every other published internet-standard, HTTP/1.0 and HTTP/1.1 will be here until the end of the internet. Sadly the clusterfuck that is HTTP/2.0 will too now.

The people talking about HTTP/3.0 already really seems to have missed this bit. (They're talking about 3.0 because HTTP/2.0 didn't really solve the problems we have with HTTP/1.1, but nevermind that, Google steam-rolled this one through and we want to be trendy)

The question is now: How many HTTP-stacks do you want to support? Is 2 OK? 3? 4? When do you say enough is enough?

You're missing the point. For all of them ut doesn't matter if my consumer website serves multiple assets.

That's what I'm talking about.

Our point is that some people are treating this internet-protocol with a lifetime of decades like it was this week's update of Chrome.

It isn't. And it needs to be treated differently.

> The people talking about HTTP/3.0 already really seems to have missed this bit. (They're talking about 3.0 because HTTP/2.0 didn't really solve the problems we have with HTTP/1.1, but nevermind that, Google steam-rolled this one through and we want to be trendy)

Exactly! There are more important problems to solve, page load time isn't one of them. My list includes:

* Better authentication

* More secure caching

* Improved ability to download large files

* Better methods to find alternate downloads locations

* Making each request contain less information about the sender

* Improved Metadata

I brain-dump a bit here: https://github.com/jimktrains/http_ng

EDIT: Formatting

I'm afraid you are - while things have improved for the reasons you outlined, corporate IT still going to be a major party pooper.
If at any point you need a reference guide of all the HTTP/1.1 hacks that we {could,should,would} change for HTTP/2, I found the following post really useful: http://ma.ttias.be/architecting-websites-http2-era/
nginx doesn't support http2, apache too.. So maybe it's time to make some efforts to implement it on servers first. Maybe make some donations to apache, nginx dev. teams to speed up this process.
nginx fully supports SPDY. HTTP/2 is largely based on SPDY so I expect that nginx will support HTTP/2 very soon.
This is my real worry, at least nginx supports SPDY so has a base to evolve from, Apache support appears to be in a really bad place last time I checked.
The two most largely deployed servers on the Internet don't have HTTP2 support so their experience implementing and using it was never factored into the protocol.

And people wonder why I don't like it.

MS seem to be doing a good job of building it into IIS, Traffic Server has it, as does H2O, nghttp2 and others.

Google, FB, Twitter, Akamai have adapted their http daemons for it.

nginx has SPDY support so HTTP/2 should be forthcoming but I wonder if this will be the death of Apache - they didn't seem to be able to update the SPDY plugin to 2.4

Yes it's a pitty, Apache and mod SPDY is really a pain currently.
Try hacking together a rudimentary one over a socket opened by AT commands. Embedded programming including cellular modems for the win, Bob.