Hacker News new | ask | show | jobs
by throwawasiudy 3402 days ago
It's kinda sad that TCP was chosen so long ago for HTTP that there's effectively no changing it. With modern TLS the underlying data guarantees TCP gives you just aren't that useful. We have kinda a weird situation where we have TCP->TLS->HTTP in layers when it could all be one protocol layer. We also wrap a stateless protocol (HTTP) inside a stateful (TCP) one which causes some insanity.

What the problem with doing it the current way? Massive routing inefficiency at scale. Since the layers for persistence and routing (L2-4) don't carry all the info needed to connect to a server (some like headers and URL are up in HTTP - L7) it's mandatory to "unwrap" through the protocol layers before you can determine where a stream/packet/HTTP req is supposed to go.

This means you can use something like IPVS as your L2-3 load balancer, but once the streams are divided out by IP/port you need to do the TLS+HTTP in one step. There's also some hard limits on how much traffic a single IVPS instance can handle because balancing TCP even at low level requires the router to keep track of connection state (conntrack). So we have this situation where there's a main low-level balancer with some arbitrary traffic limit imposed from TCP overhead, and behind that we have a bunch of child balancers doing way more work than they should be handling the connection from the TCP level through TLS and HTTP before they can pass on the connection to a back-end app server.

This could all be avoided if HTTP was a stateless UDP based protocol, and TLS was baked in rather than being an additional layer. It would make routing and load balancing far more effective at scale. You probably wouldn't see nearly as many DDoS attacks succeeding, because the vast majority of them exhaust CPU power far before they actually flood you off the net.

9 comments

It makes sense to separate the problem into multiple different protocols, because that gives you re-usability and greatly reduces the complexity of implementation. Can you imagine the amount of effort if every application protocol defined its own mechanisms for retransmissions, congestion control, security, etc.?

It also makes it possible to swap out parts more easily. For example, to put SSL under HTTP. Or QUIC under HTTP/2 when people realized TCP is not a good fit for it. You could, if you wanted to, run HTTP over UDP. Although you'd quickly realize you actually want many things that TCP and TLS give you for free, so you'd have to start re-implementing the same functionality on top of UDP.

Don't forget the history of TCP. TCP's congestion avoidance is the evolution of 20 years of managing large scale networks. Many papers later and the CA algorithms of TCP have been proven to work at large scale for a variety of traffic types. The internet as we know it simply wouldn't work without massive adoption of TCP. Even QUIC looks very similar to TCP, and is just an evolution of it in many ways.

It's a hard distributed problem: how do you coordinate many independent flows of traffic to efficiently utilise the many network links in the internet? Do this wrong at scale, and even broadband internet will feel like you're on a 2G mobile network.

Also I don't think it makes sense for HTTP to be on UDP. While it's stateless, you still can't tolerate packet loss with HTTP. Otherwise you'll try loading a page and maybe nothing comes back, or only part of it comes back. What then?

TCP makes a lot of sense for HTTP, since once you strip of the HTTP headers on the request and response HTTP is basically just TCP: A continuous bidirectional stream of bytes in both directions. If you want to rebuild that with all it's properties (flow-control, ordered and reliable) you rebuild half of TCP anyway.

The CoAP protocol for embedded devices tries to achieve HTTP semantics on top of UDP. But it's a lot harder to implement correctly, especially if you want to support large request payloads. And I think it isn't easier to load balance or proxy then the TCP based HTTP, since it's also not sufficient to look at a packet header but you would also need to parse and keep around the whole request response states between packets - only now the state is no longer associated to a TCP connection but to a plain UDP socket which must handle dozens of parallel requests.

Perhaps the massive pervasiveness of the internet is not obvious on a day to day basis, but you should consider it when judging whether or not TCP was a good decision.
I think the question is not so much could http have used UDP, but rather is the browser the right place to build a complex piece of software. The fact that you can doesn't mean that you should. I am a bit concerned that the browser is becoming the only cross platform API and we are forced to build complex software on top of a scripting language that was only designed to fire up porn advertisement pop ups.
Ignoring the (in)validity of your characterization of JS, web assembly will remove this reliance on JS
When one consider what HTML and HTTP was meant for initially, TCP makes perfect sense. That it has since been bastardized into a UI framework is a very different matter.
I'm not so sure the internet would have been as successful without the guaranteed transmission of tcp.
TCP gets changed all the time. In fact modern TCP stacks are highly tuned for HTTP.
Perhaps this is also why HTTP2 and QUIC have been proposed, but I am not a network guy, so take my guess with a grain of salt.