Hacker News new | ask | show | jobs
by byuu 4139 days ago
> May nobody else have to suffer through writing an interoperable HTTP/1.1 parser!

Yes, now it'll be much easier than parsing plain-text. Now they just have to write a TLS stack (several key exchange algorithms; block ciphers; stream ciphers; and data integrity algorithms); then implement the new HPACK compression; then finally a new parser for the HTTP/2 headers themselves.

Now instead of taking maybe one day to write an HTTP/1.1 server, it'll only take a single engineer several years to write an HTTP/2 server (and one mistake will undermine all of its attempts at security.)

If you are going to say, "well use someone else's TLS/HPACK/etc library!", then I'll say the same, "use someone else's HTTP/1.1 header parsing library!"

HTTP/2 may turn out to be great for a lot of things. But making things easier/simpler to program is certainly not one of them. This is a massive step back in terms of simplicity.

5 comments

I was under the impression that the parent was making the simplicity argument for people on the other side of things - people writing frameworks and websites. Using someone else's HTTP1.1 stack doesn't solve those problems.

Then, separately, writing interoperable HTTP 1.1 is hard because it was designed/taken up ad-hoc in a time of relatively immature browsers. I would expect HTTP2 to increase standardisation in the same way newer HTML/CSS specs have relative to the late-nineties. That doesn't mean that initial implementation will not be more difficult, but it's done once every 20 years (per-vendor).

Have you tried writing anything more than a very simple HTTP/1.1 parser/server? It's actually not as easy as it seems at first - edge cases everywhere, different user agents doing subtly different things, etc. etc.

Your argument is invalid in my opinion. HTTP/1.1 is not simple to implement to any decent level of completeness and correctness, and HTTP/2 does fix a fair few things.

Anyway, there are already plenty of good tools for debugging HTTP/2 streams (Wireshark filters, etc.), and there's only going to be plenty more as time goes by.

> Have you tried writing anything more than a very simple HTTP/1.1 parser/server?

Honestly? No. I wrote an HTTP server that runs my site just fine. It also functions as a proxy server so that I can run Apache+PHP (for phpBB) on the same port 80. (The reason I don't just use Apache is that I generate my pages via C++, because I like that language a hell of a lot more than PHP.) I also have had the HTTP client download files from the internet for various projects (my favorite was to work around a licensing restriction.)

I get around 100,000 hits a month, and have not had any problems. If you think issues will arise when I start reaching into Facebook levels of popularity ... I'm sure they will. But, I'll never get there, so to me it doesn't really matter.

So for my use case, HTTP/2 is unbelievably more challenging and costly to support. Especially as I have about seven subdomains, and nobody's giving out free wildcard SSL certs.

I also didn't even say the added complexity is a bad thing+. Modern Windows/Linux/BSD are infinitely more complex than MS-DOS was, too. I was just pointing out that the OP's elation was misguided. (+ though to be fair, I do believe things should be kept as simple as is reasonable.)

Also, I strongly challenge this notion that you have to be 100% standards-complaint with the entire RFC to run an HTTP/1 server successfully. Because not even Apache is remotely close to that. The mantra is liberal on your input, conservative on your output. And everyone follows that. And as a result, no major projects out there are spitting out headers split into 300 lines using arcane edge cases of the RFCs.

HTTP/1.1 is only complicated if you want to support all of the optional features.
So won't HTTP/2 have these edge cases also?
No. Most of the edge cases arise from trying to parse an underspecified plaintext protocol. Everything in HTTP/2 is length-prefixed and unambiguously specified. That makes it dramatically easier to write a compliant parser or client.
Essentially, no.

HTTP/1.1 contains tons of optional features. Practially no two implementations support the same set.

HTTP/2 is all 100% mandatory. Any compliant HTTP/2 implementation will support an EXACT set of known features.

Won't that create the same problems as XML and XHTML where full compliance is/was mandatory -- and the reality turned out to be different?
Probably not, most, if not all, non-compliant XML/XHTML is either written by hand or by very bad generation tools. In the case of HTML/2 it's a protocol that needs to be implemented by browsers and web servers and there are only so many of those. In the case of XML/XHTML any person throwing up a website or sending a document does the generation with a different set of tools(or by hand)
I've seen tons of computer-generated terrible xml. It's in use in so many custom API's I can't even describe to you. Poor escaping, nesting, &c.
> HTTP/2 is all 100% mandatory

That's a nice idea in theory, but what makes you think anyone is going to adhere to that?

Developers have always, and will always, do whatever they want when they implement your standards.

It was bad enough that when I worked on a binary delta patching format, I made sure there were absolutely zero possible undefined values, because I knew someone might try and use them to add new functionality in.

For something as complex as HTTP, I can guarantee you people will ignore parts of the spec they don't care about. And you can yell at them and say it's not a valid/legal HTTP/2 implementation, but they won't care. They'll keep on doing what they're doing.

It's still a much better situation than HTTP/1.1. At least an HTTP/2 compliant implementation has an exact definition, that it either is, or isn't. An HTTP/1.1 implementation has a vast number of corner cases and optional bits.

Sure, if you're out-of-spec, all bets are off. It's just that with hTTP/1.1 even 100% in-spec implementations are a pretty wide target zone.

> HTTP/2 is all 100% mandatory. Any compliant HTTP/2 implementation will support an EXACT set of known features.

But over the next couple of years, won't people come up with new ideas and add them as optional extensions? How is that handled?

I suspect some of these optional extensions will be really useful in special cases such as support for LZMA/LZHAM compression in addition to just gzip.

There is support for extensions, but they're, well, extensions. The only thing the protocol specifies is that a compliant implementation must pass-through unchanged any block it doesn't understand.

Compare with HTTP/1.1 where for instance the entire content negotiation mechanism is optional and clients need to be able to deal with it not being available.

> There is support for extensions, but they're, well, extensions.

So down the line, it will be pretty much exactly like HTTP 1.0 and 1.1 then.

Good to hear someone thought this thoroughly through before creating a mega-complex protocol unimplementable by most industry-grade engineers, which will also need to be debugged and maintained for all internet-eternity.

Well in that case Apache Ngix Microsoft and the IETF need to keep control of the satndard any one that tries to add the http/2 equivelent of <blink> gets taken out and shot.

I have seen this before with OSI when MCI decided that part of the x.400 standard was optional. And not to mention ICL who thought that starting counting from 0 was a good idea when that standard said MUST START from 1 (and you wonder why the UK doesnt have a mainframe maker any more)

And the lack in simplicity is why it will fail.

People are good with dealing with a small number of simple things that can be stacked together. Throw in a human-readable data stream, and you're set to understand and use a stack of simple programs.

People are not good with dealing with a single monstrous object of unfathomable proportions, they will try to break it down in things they understand. If the thing is too complex, with too many inputs, too many outputs and too many states, this is a recipe for confusion. This is why overly complicated things always fail in face of simple things.

One could argue that FTP/SFTP was just as good as transferring bytes over network, but HTTP/1.0 won because it was simpler.

HTTP/2 was written to tickle the egos of its developers, following the principle - it is hard to write, it is hard to read. And its downfall is going to come from this problem.

I'm really confused by comments like these. Are you somebody who implements low-level network protocols?

I've written parsers and generators for plenty of binary protocols. It's actually really not that bad - you just need slightly different tooling. Yes, if somebody else hasn't written those it takes a bit longer because you have to do that yourself, but you save a lot of time because it's far easier to parse than text. And guess what - people have already written plenty of tooling for HTTP/2 already... And HTTP/2 is fairly straightforward as protocols go (you wouldn't believe the crazy proprietary control protocols around the place - trust me, HTTP/2 is not at all bad)

The 'downfall' of HTTP/2 is also a real long shot - most people are already browsing in browsers that support it, and for many web site owners, using it is literally adding two lines to an nginx configuration file...

> People are good with dealing with a small number of simple things that can be stacked together. Throw in a human-readable data stream, and you're set to understand and use a stack of simple programs.

Not true! Text parsing is a pain in the ass; give me a well-documented binary protocol any day. On the upside, binary protocols tend to force good documentation. HTTP/1.1 is far from simple; every browser supports a slightly different implementation and the server is expected to serve to all of them. But a binary protocol is not any more difficult than a text-based protocol for someone with a decent knowledge of CS. If you don't have a decent knowledge of CS, you probably shouldn't be writing code at the protocol level.

Besides, who in their right mind outputs directly to ANY protocol these days? Unless you're building a web server, you should be doing it through an abstraction layer because it's a proper architecture practice. Once abstraction layers are built for all of the major languages (which I'm willing to bet has already happened) it will become a non-issue.

I haven't looked more into that, but wouldn't it also be viable now to start HTTP/1.2 with e.g. a more restrictive header grammar, restricting all existing features to what's actually used, at least on the server side? Clients with 1.1 support would keep working, but future clients would be simplified.
That would be absolutely lovely.

Since we're not viewing headers manually on 80x25 terminals anymore, we could do away with multi-line header values. That alone would drop off most of the complexity. (Being perfectly honest, even though it's part of the standard, you don't have to parse them now, anyway. I don't, and I've never had anyone complain to me about the site not working. Nothing mainstream sends them for the important fields.)

Add a Server-Push header (filename+ETag), and we could eliminate most extraneous 304 Not Modified requests. Have browsers actually acknowledge Connection: keep-alive instead of opening tons of parallel requests. And leave this as the "hobbyist level, can't afford wildcard SSL certs" option, and I think it'd be quite beneficial.

If browsers want to warn that it's not encrypted, fine. So long as they don't go into ridiculous hysteria levels like they do now with self-signed certs.

One immediate potential downside is Apache. It completely ignores the protocol request. If you ask for "GET / HTTP/1.2", or even "GET / HackerNewsTP/3.141e", it will happily reply with "HTTP/1.1 200 OK"

As a result, the negotiation would be trickier than with HTTP/2.

But like you said, it could be done in a way that it's 100% backward-compatible with existing 1.1 software, so long as their responses are also in a compatible, simplified format (and most already are.)

> If browsers want to warn that it's not encrypted, fine. So long as they don't go into ridiculous hysteria levels like they do now with self-signed certs.

I don't believe they can do anything apart from what happens now. Imagine someone manages to redirect your traffic. You were talking to some website which used known certificate, but this time you got a self-signed one. The browser has two options essentially:

- continue the connection - in this case you just handed over your session cookie, the person on the other side can act as you on that website

- go into "ridiculous hysteria levels" and tell you that the cert presented by the server is not trusted - so do what browsers do right now

There's really no situation where the first option should be allowed. How option 2 is implemented is the interesting detail.

I am with you up to letting people use unencrypted HTTP. I assume people in this can use telnet instead of ssh because it is simpler. No, browsers should drop support for unencrypted HTTP soon after Let's Encrypt goes live.
> browsers should drop support for unencrypted HTTP soon after Let's Encrypt goes live.

To the people who keep insisting everything needs encryption: No it doesn't. Fuck off!

You don't see me forcing PGP on on your email, do you? No? Fine, then let us non-weirdoes keep using plain HTTP where we want it, where we have determined that it is a good fit for our needs.

Besides, this is purely a theoretical concern because a browser which drops support for plain HTTP wont have any user-base as soon people discover that 95% of the internet will broken when using that browser.

> You don't see me forcing PGP on on your email, do you?

Not pgp (as in, not end-to-end encryption). But hopefully in most cases you are forced to encrypt your email (smtp/tls), servers forwarding your email are likely using encryption (smtp/tls between servers), and you're pulling the email over encrypted channel (imaps). Alternatively your mail submission/collection goes over https to the email provider.

And yes, I will insist on everyone using encryption in mail, web, everything. Because once you actually want to use it for some reason, you don't want it to be completely different from all your other traffic, basically screaming "hey, I'm trying to hide some data here, because all my other connections are in plaintext".

Fortunately we're at the stage where everyone is actually forced to use encryption for a lot of their traffic.

So, is the problem encryption or manual encryption ?

I presume you don't care about encryption when you send emails, and yet if you're using a big name your emails will be encrypted without you even knowing it.

That's why I keep wanting to put "automatic" encryption everywhere, and would rather have your browser demote plain HTTP as insecure as a TLS connection with RC4-MD5, and display good security connection with a higher "indicator" than those, even if the certificate is self-signed (yet not as high as a trusted communication)

In practice that would mean "this connection is PROBABLY secure. If you really care about what you're about to do, STOP NOW. If you don't care just go on".

"Automatic" PGP (or really E2E encryption) would be awesome, but there is still far too much manual work for it to happen. Maybe one day we'll be there.

I assume you use telnet instead of ssh too. Once you do a little research and spend a little time figuring out what can actually be done (and is done constantly) to the unencrypted HTTP (anything from user tracking, to ad injections, to identity theft), you will realize just how wrong you are. Yes, HTTP needs to die. Sorry it's taking you a while to see it.
HTTP/2 is easier to parse than HTTP/1.1 because there's less edge cases

Also a bonus: no more "Referer" (sic)