Hacker News new | ask | show | jobs
by samtho 1125 days ago
This may sound sort of “old man waves at cloud” of me but one thing I’ve found sad is the gross over-complication of later versions of standards such that the sort of project linked here may not be as practical for something like HTTP/3 for example. Similarly, the large, muddled tool chain that is “required” to make modern JavaScript applications makes it hard for newer learners to really understand what is going on because the minimal code version still needs its own transpiler, build system, linter, process managers, etc. Maybe we need all this complexity, but I suspect that some of the overzealous, solve-everything systems design we have come accustomed to is mainly serving to create a larger problem set instead of creating elegant abstractions that are agreed upon.
9 comments

The big thing that makes HTTP/2, HTTP/3 and even Gemini hard to implement is TLS. I theory, HTTP/2 don't require it, but in practice, it does.

You need to implement a variety of ciphers, manage certificates with expiration dates, etc... And if you wanted to implement all that yourself, people will yell at you for doing your own crypto. So yeah, you need a library. But not just that. You need a way to update your certificates, so you can't have a package (or even a single executable) that you can just run and have a server that serves static pages. You could make a self-signed certificate that lasts a thousand years, but good luck getting it accepted.

Generally speaking, you also need some sort of an operating system to make use of HTTP, and yet that doesn't figure into the complexity of HTTP.

In classic HTTP TLS was layered beneath it, providing important degrees of freedom, including the freedom to not use TLS, which can be especially important for experimentation and development.

Prediction: If HTTP/3 manages to substantially replace classic HTTP+TLS, QUIC is destined to become a kernel-provided service like TCP, shunting all that complexity behind an OS abstraction and freeing user space. The fact QUIC uses UDP is an important aspect here because a performant userspace QUIC stack conflicts with classic, high-value abstractions like file descriptors and processes; abstractions which make it viable (i.e. cheap) to have a rich, diverse ecosystem of languages and execution environments in userspace. More importantly, HTTP will have come full circle.

> QUIC is destined to become a kernel-provided service like TCP

I think this might happen in the opposite direction to how you think: the prevalence of containers is likely to thin down the OS layer to just arbitrating the PCIe bus, and you'll get monolithic applications that handle all the layers inside themself. So the QUIC layer and the web server and the app-routing and the app itself will all happen inside the same process.

Service VMs, in IBM-speak, where the "OS" is a hypervisor called VM.

Reinvented a bit later as libOS (library operating systems) and unikernels.

Even Linux got in on that a bit:

https://lwn.net/Articles/637658/

It isn't a stupid idea, but the only way to update anything is to rebuild the world. Luckily it's a small world after all.

> The fact QUIC uses UDP is an important aspect here because a performant userspace QUIC stack conflicts with classic, high-value abstractions like file descriptors and processes

Only because the implementations of fds in popular kernels are absolute rubbish at allowing userspace to extend them, except for perhaps ptys. A userspace implementation of a protocol can make a domain socket or just pass the client one half of a socketpair, but it cannot change the way accept() behaves, let alone add its own sockopts.

Even if you extend a kernel to do this, I suspect there are going to be interesting security implications when processes can suddenly receive fds that behave in funny unexpected ways. Much like the current mess with Linux namespaces: not because the idea is inherently bad, simply because we waited this long to try it.

> QUIC stack conflicts with classic, high-value abstractions like file descriptors and processes; abstractions which make it viable (i.e. cheap) to have a rich, diverse ecosystem of languages and execution environments in userspace

Do you mind expanding on that with a sentence or two?

UDP is streams in the very classical definition: stream objects (bytes, probably, though a more faithful implementation would contain chunks/arrays) go out without predefined start/end. In other words, imagine that a single `read()` (or whatever) call returns random chunk from HTML source and it is your job to make sense of where in the space of whole page that chunk comes from. There is even no guarantee that consecutive read will return data "in order".

"File" abstraction layer has predefined start, end, length, order, and so on. You need some buffering magic layer on top of UDP that provides facilities mandated by file abstraction layer.

Worse, a socket read can return a UDP packet from any client as QUIC associates connections using 64-bit identifiers in the UDP payload, not using IP/port pairs. Ditto for the fact there can be multiple logical streams per connection. So any user space QUIC stack is responsible for routing packets for any particular connection and logical stream to the correct context using whatever bespoke API the stack provides.

IOW, with QUIC a file descriptor no longer represents a particular server/client connection, let alone a particular logical stream. From a server perspective, a user space QUIC stack is effectively equivalent to a user space TCP/IP stack. (Relatedly, user space QUIC server stacks are less CPU performant than HTTP stacks using TCP sockets, unless they use something like DPDK to skip the kernel IP stack altogether, avoiding the duplicative processing.)

The BSD Sockets API for SCTP (the original, non-UDP encapsulated version) permits 1-to-1 socket descriptors for associations--logical streams within a connection--in addition to an API for 1-to-many--1 socket descriptor representing a connection, over which you can use recvmsg/sendmsg to multiplex associations (logical streams). So you have 3 options for using SCTP from user space, from a dead simple, backwards compatible sockets API that matches how TCP/IP sockets work, to the low-level QUIC model where user space handles all routing and reassembly. I would expect kernel-based QUIC to work similarly, except combined with kernel-based TLS extensions, which is already a thing for TCP sockets.

> UDP is streams in the very classical definition

What do you mean exactly? UDP is not a stream protocol, it is an unreliable datagram one. A single read will return exactly one UDP datagram.

It is TCP where a read will return incremental data which is not necessarily aligned to what the other end wrote.

Of course QUIC builds a reliable stream abstraction on top of it, but that's not different from TCP.

In the classical definition stream is "sequence of data elements", without any additional guarantees. In this context TCP is stream transformer/filter: stream of "raw" packets is transformed into payload stream.

> What do you mean exactly? UDP is not a stream protocol, it is an unreliable datagram one. A single read will return exactly one UDP datagram.

This is exactly what makes UDP a stream of datagrams. Data is defined over one single UDP datagram - UDP gives exactly zero guarantees about data across datagram boundaries. There is no first, there is no last, there is no next/previous datagram, there no guarantees and expectations whether more datagrams will follow.

> Of course QUIC builds a reliable stream abstraction on top of it, but that's not different from TCP.

Key word "reliable", which is subset of streams. In goes stream of datagrams, out goes ordered (and otherwise identified) stream of bytes. Reading your comment it seems that you equate "stream" with "asynchronously fetched buffer" meaning from C++ and others, which is a stricter definition of streams.

Seems like there are some unspecified assumptions here. For one, an assumption that TLS is the only game in town. Another is that "you", as in "you must do this" or "you must not do that", applies to every person equally,

When one uses TLS today, chances are very good that it's using something from djb, someone who "made his own crypto". Maybe the assumptions, stated as "rules", do not apply equally to everyone. For example,

"And if you wanted to implement all that yourself, people will yell at you for doing your own crypto."

In fact, before HTTP/2 existed, djb did exactly that, as a demonstration that it could be done.^1 It succeeded IMHO because it worked. People can "yell" all they want, but as above, these same people if they use TLS are probably using cryptography developed by the person at which they are "yelling". Someone who broke the "rules". Perhaps there is evidence that HTTP/2 would exist even were it not for the prior CurveCP experiment. But I have yet to find it.

The word used in the parent comment was "implement" and the suggestion is that attempts to "implement" would not succeed. Perhaps the reason they might "fail" is not a technical one. Perhaps "success" in this instance really refers to acceptance by certain companies that are making "rules" (standards) for the internet to benefit their own commercial interests. It may be possible to implement a system that works even if these companies do not "accept" it. If so, then the problem here is the companies, their fanboys/fangirls (watch for them in the comment replies), and the undue influence they can exert, not the difficulty of implementing something that works.

IMHO, getting something "accepted" by some third party or group of third parties is a different type of "success" that getting something to work (i.e., "implementing"). It's the later I find more interesting.

1. https://curvecp.org

i'm concerned about this too

google isn't really concerned about creating elegant abstractions so much as they are about improving the performance of their browser talking to their server, though sometimes these do coincide

> Similarly, the large, muddled tool chain that is “required” to make modern JavaScript applications makes it hard for newer learners to really understand what is going on because the minimal code version still needs its own transpiler, build system, linter, process managers, etc.

We built pianojacq.com without all that, just plain vanilla JS and a minimum of dependencies, see: https://gitlab.com/jmattheij/pianojacq

Hey, I really like the choice of default song you chose :)
Bach WTK prelude 1? It was the first real piece I ever learned so that's why it's there :)
I was thinking about Frère Jacques :)
Haha, ah, that one :) Yes, but that was just to have something (anything) really on the display after you load the program, pure coincidence ;)
This is a result of trying to make HTTP "do everything well", right? Instead of leaving it to each application to select a layering of successive protocols tailored to it's needs, everything has been jammed into the HTTP spec, which has the nice property of giving you all its features in web browser clients (which is the delivery mechanism for this wide array of apps).

Certainly some applications need not have access to the entire HTTP suite. If the goal is not to offer a "full-featured" client then an HTTP client may not be terribly difficult, it can just fail gracefully if the other party tries to do something it doesn't support?

I will yell at the cloud too. I get why TypeScript exists. For large projects it really helps.

But loosing one of the main benefits of scripting languages for some convenience? You could argue that you use webpack anyway. Same question applies here...

If you have to use JS anyway, be glad that you can have an extremely shallow toolchain for deployment for smaller projects. I am not a JS dev, I just use it for stabbing at bits occasionally.

I relate to the general concern about complexity creep a lot, but in the HTTP/1/2/3 case I'm just not terribly worried. So much effort has been put into backwards compatibility. The only potential true new concept that comes to mind is PUSH, which clients can ignore and mostly failed. HTTP/2 has been around for almost 10 years and I haven't seen a single implementation require it.
I see what you’re saying and agree that HTTP3 is complicated but I would that since it’s a backwards compatible standard, the added complexities are completely optional. For most use cases the basic protocol is perfectly suitable and only as the scale evolves does it require the additional complexity.
I understand what you’re saying, but if someone decides to post a link to their project that is an HTTP/3 server in under X lines of code but only implements HTTP/2 features, is it really an HTTP/3 web server?
the HTTP/3 standard for servers is not backward compatible with HTTP/2 servers, it is backward compatible with HTTP/2 clients. And vice versa

And therefore, an HTTP/2 server calling itself compatible with HTTP/3 clients is OK

It might be OK on a technical level, but my concern is where someone is showing off a project they used to supposedly learn how to implement HTTP/3 but only goes far enough to use an HTTP/2 implementation because fully implementing 3 is too complex or not worth it.
Where does it say it is an HTTP/3 server? Looking at the tests it seems that it’s just a HTTP/1 server I think.
Some elegant abstractions are created too. JS modules, for example. Or, in another field, the Language Server Protocol.

But yes, humans are humans and everything that is "simple" will get built upon and then become the backbone of something complex that will engulf and smother it as it evolves.

It can be simple, or it can be fast.

Or it can be neither. But it can't be both.

If you want fast web at all cost (and you need encryption), you get HTTP3/QUIC.

At the end of the day, the question is never one debating simplicity. If you consider the amount of complexity implemented just to even boot into an operating system to run these systems, the lack of simplicity is already a forgone conclusion. The crux of issue is going back to abstractions and my worry is specific to how our abstractions are getting larger and larger and doing more work at even level with very tight coupling rather than creating systems that use more levels as needed.
And that's a question that goes back to the very first network discussions.

The IP model won over the OSI model due to lower number of larger layers.

I think it's a very reasonable question, but also the HTTP3/QUIC advantages do seem to offer advantages.