This may sound sort of “old man waves at cloud” of me but one thing I’ve found sad is the gross over-complication of later versions of standards such that the sort of project linked here may not be as practical for something like HTTP/3 for example. Similarly, the large, muddled tool chain that is “required” to make modern JavaScript applications makes it hard for newer learners to really understand what is going on because the minimal code version still needs its own transpiler, build system, linter, process managers, etc. Maybe we need all this complexity, but I suspect that some of the overzealous, solve-everything systems design we have come accustomed to is mainly serving to create a larger problem set instead of creating elegant abstractions that are agreed upon.
The big thing that makes HTTP/2, HTTP/3 and even Gemini hard to implement is TLS. I theory, HTTP/2 don't require it, but in practice, it does.
You need to implement a variety of ciphers, manage certificates with expiration dates, etc... And if you wanted to implement all that yourself, people will yell at you for doing your own crypto. So yeah, you need a library. But not just that. You need a way to update your certificates, so you can't have a package (or even a single executable) that you can just run and have a server that serves static pages. You could make a self-signed certificate that lasts a thousand years, but good luck getting it accepted.
Generally speaking, you also need some sort of an operating system to make use of HTTP, and yet that doesn't figure into the complexity of HTTP.
In classic HTTP TLS was layered beneath it, providing important degrees of freedom, including the freedom to not use TLS, which can be especially important for experimentation and development.
Prediction: If HTTP/3 manages to substantially replace classic HTTP+TLS, QUIC is destined to become a kernel-provided service like TCP, shunting all that complexity behind an OS abstraction and freeing user space. The fact QUIC uses UDP is an important aspect here because a performant userspace QUIC stack conflicts with classic, high-value abstractions like file descriptors and processes; abstractions which make it viable (i.e. cheap) to have a rich, diverse ecosystem of languages and execution environments in userspace. More importantly, HTTP will have come full circle.
> QUIC is destined to become a kernel-provided service like TCP
I think this might happen in the opposite direction to how you think: the prevalence of containers is likely to thin down the OS layer to just arbitrating the PCIe bus, and you'll get monolithic applications that handle all the layers inside themself. So the QUIC layer and the web server and the app-routing and the app itself will all happen inside the same process.
> The fact QUIC uses UDP is an important aspect here because a performant userspace QUIC stack conflicts with classic, high-value abstractions like file descriptors and processes
Only because the implementations of fds in popular kernels are absolute rubbish at allowing userspace to extend them, except for perhaps ptys. A userspace implementation of a protocol can make a domain socket or just pass the client one half of a socketpair, but it cannot change the way accept() behaves, let alone add its own sockopts.
Even if you extend a kernel to do this, I suspect there are going to be interesting security implications when processes can suddenly receive fds that behave in funny unexpected ways. Much like the current mess with Linux namespaces: not because the idea is inherently bad, simply because we waited this long to try it.
> QUIC stack conflicts with classic, high-value abstractions like file descriptors and processes; abstractions which make it viable (i.e. cheap) to have a rich, diverse ecosystem of languages and execution environments in userspace
Do you mind expanding on that with a sentence or two?
UDP is streams in the very classical definition: stream objects (bytes, probably, though a more faithful implementation would contain chunks/arrays) go out without predefined start/end. In other words, imagine that a single `read()` (or whatever) call returns random chunk from HTML source and it is your job to make sense of where in the space of whole page that chunk comes from. There is even no guarantee that consecutive read will return data "in order".
"File" abstraction layer has predefined start, end, length, order, and so on. You need some buffering magic layer on top of UDP that provides facilities mandated by file abstraction layer.
Worse, a socket read can return a UDP packet from any client as QUIC associates connections using 64-bit identifiers in the UDP payload, not using IP/port pairs. Ditto for the fact there can be multiple logical streams per connection. So any user space QUIC stack is responsible for routing packets for any particular connection and logical stream to the correct context using whatever bespoke API the stack provides.
IOW, with QUIC a file descriptor no longer represents a particular server/client connection, let alone a particular logical stream. From a server perspective, a user space QUIC stack is effectively equivalent to a user space TCP/IP stack.
(Relatedly, user space QUIC server stacks are less CPU performant than HTTP stacks using TCP sockets, unless they use something like DPDK to skip the kernel IP stack altogether, avoiding the duplicative processing.)
The BSD Sockets API for SCTP (the original, non-UDP encapsulated version) permits 1-to-1 socket descriptors for associations--logical streams within a connection--in addition to an API for 1-to-many--1 socket descriptor representing a connection, over which you can use recvmsg/sendmsg to multiplex associations (logical streams). So you have 3 options for using SCTP from user space, from a dead simple, backwards compatible sockets API that matches how TCP/IP sockets work, to the low-level QUIC model where user space handles all routing and reassembly. I would expect kernel-based QUIC to work similarly, except combined with kernel-based TLS extensions, which is already a thing for TCP sockets.
Seems like there are some unspecified assumptions here. For one, an assumption that TLS is the only game in town. Another is that "you", as in "you must do this" or "you must not do that", applies to every person equally,
When one uses TLS today, chances are very good that it's using something from djb, someone who "made his own crypto". Maybe the assumptions, stated as "rules", do not apply equally to everyone. For example,
"And if you wanted to implement all that yourself, people will yell at you for doing your own crypto."
In fact, before HTTP/2 existed, djb did exactly that, as a demonstration that it could be done.^1 It succeeded IMHO because it worked. People can "yell" all they want, but as above, these same people if they use TLS are probably using cryptography developed by the person at which they are "yelling". Someone who broke the "rules". Perhaps there is evidence that HTTP/2 would exist even were it not for the prior CurveCP experiment. But I have yet to find it.
The word used in the parent comment was "implement" and the suggestion is that attempts to "implement" would not succeed. Perhaps the reason they might "fail" is not a technical one. Perhaps "success" in this instance really refers to acceptance by certain companies that are making "rules" (standards) for the internet to benefit their own commercial interests. It may be possible to implement a system that works even if these companies do not "accept" it. If so, then the problem here is the companies, their fanboys/fangirls (watch for them in the comment replies), and the undue influence they can exert, not the difficulty of implementing something that works.
IMHO, getting something "accepted" by some third party or group of third parties is a different type of "success" that getting something to work (i.e., "implementing"). It's the later I find more interesting.
google isn't really concerned about creating elegant abstractions so much as they are about improving the performance of their browser talking to their server, though sometimes these do coincide
> Similarly, the large, muddled tool chain that is “required” to make modern JavaScript applications makes it hard for newer learners to really understand what is going on because the minimal code version still needs its own transpiler, build system, linter, process managers, etc.
This is a result of trying to make HTTP "do everything well", right? Instead of leaving it to each application to select a layering of successive protocols tailored to it's needs, everything has been jammed into the HTTP spec, which has the nice property of giving you all its features in web browser clients (which is the delivery mechanism for this wide array of apps).
Certainly some applications need not have access to the entire HTTP suite. If the goal is not to offer a "full-featured" client then an HTTP client may not be terribly difficult, it can just fail gracefully if the other party tries to do something it doesn't support?
I will yell at the cloud too. I get why TypeScript exists. For large projects it really helps.
But loosing one of the main benefits of scripting languages for some convenience? You could argue that you use webpack anyway. Same question applies here...
If you have to use JS anyway, be glad that you can have an extremely shallow toolchain for deployment for smaller projects. I am not a JS dev, I just use it for stabbing at bits occasionally.
I relate to the general concern about complexity creep a lot, but in the HTTP/1/2/3 case I'm just not terribly worried. So much effort has been put into backwards compatibility. The only potential true new concept that comes to mind is PUSH, which clients can ignore and mostly failed. HTTP/2 has been around for almost 10 years and I haven't seen a single implementation require it.
I see what you’re saying and agree that HTTP3 is complicated but I would that since it’s a backwards compatible standard, the added complexities are completely optional. For most use cases the basic protocol is perfectly suitable and only as the scale evolves does it require the additional complexity.
I understand what you’re saying, but if someone decides to post a link to their project that is an HTTP/3 server in under X lines of code but only implements HTTP/2 features, is it really an HTTP/3 web server?
It might be OK on a technical level, but my concern is where someone is showing off a project they used to supposedly learn how to implement HTTP/3 but only goes far enough to use an HTTP/2 implementation because fully implementing 3 is too complex or not worth it.
Some elegant abstractions are created too. JS modules, for example. Or, in another field, the Language Server Protocol.
But yes, humans are humans and everything that is "simple" will get built upon and then become the backbone of something complex that will engulf and smother it as it evolves.
At the end of the day, the question is never one debating simplicity. If you consider the amount of complexity implemented just to even boot into an operating system to run these systems, the lack of simplicity is already a forgone conclusion. The crux of issue is going back to abstractions and my worry is specific to how our abstractions are getting larger and larger and doing more work at even level with very tight coupling rather than creating systems that use more levels as needed.
Cool project, but this project demonstrates the reason I've stopped writing things in C. The standard library has garbage string functions and it seems every project has its own version of this file:
That's the "lazy, dumb" way of doing it --- write another string library. A much better way is to design your algorithms so they need a minimum of string manipulation, which is unfortunately on the more difficult side for text-based protocols like HTTP.
Personally, I wish HTTP messages were closer to something like ASN.1 DER; there's little in the way of string manipulation necessary for those, and all the lengths are prefixes instead of "try to find the terminator" (and don't forget to not run past the end of the buffer...)
Impossible– no part of the thread has involved someone suggesting rewriting it in Rust, which means there's no opportunity for someone else to reply "This project seems like a perfect fit for golang, why would you suggest they use Rust instead?"
I have the same issue, but I blame the absence of good package management. If it had that, one of the thousands of these libraries would have won out and become quasi-standard.
glib is a massive dependency just for this, and I think many would argue that gtk in general is not a good idea for painless cross-platform development in 2023.
This is how I felt writing Go. Writing the same nongeneric functions with slightly different type signatures and/or endless interfaces and/or interface{} signatures. It's 2023.
never mind than MANY people dont need generics, and that generics have a significant compilation and runtime cost, in terms of time and memory. who cares right?
and never mind that Go has had generics for over a year now right? sometimes having a small, stripped down language is better than having a huge bloated monster. I would point to examples, but you know what they are.
> 1. The current proposals for generics are bloated and don't fit properly with our vision for Go.
> 2. We will do heaps and heaps of work over many years until we get something we like.
I suppose repeatedly suffering the pain of not having generics until you finally get enough of your community to admit generics aren't just bloat can be described as "heaps and heaps of work". ;)
I'd love to hear what meaningful differences you think exist between early proposals for Go generics, and later proposals for Go generics. Or Go generics and generics which had existed in other languages for decades, for that matter.
The result of this silly delay will be felt basically forever, because millions of lines of go were written with error codes, which could have been much-less error-prone option types. I'm not sure it would even be a good idea to start using option types at this point because it would create such inconsistencies in codebases.
> Also, AFAIK Go is not a single-pass compiler, at least not in the way I learned about compilers.
Correct! But it was single pass and that was very important because multiple passes were bloat. But multiple passes aren't bloat any more, because now Go has multiple passes and Go doesn't have the bloat of other languages. See how that works?
"The past was alterable. The past never had been altered. Oceania was at war with Eurasia. Oceania had always been at war with Eurasia."
> never mind than MANY people dont need generics, and that generics have a significant compilation and runtime cost, in terms of time and memory. who cares right?
I'm saying that I wanted them...? I didn't say need. I was able to operate without them. I simply _wanted_ them.
Also maybe consider the way you're communicating? I didn't say Go was wrong. I said I didn't enjoy it. I'm allowed to not like things, and I'm allowed to post about them.
_Also_ this concept of "need" is amusing to me. You don't _need_ a garbage collector, or a statically linked binary, or static typing, or IDE support, or a debugger, or, or ,or.
And yet, people _want_ them.
> and never mind that Go has had generics for over a year now right?
Their generics implementation is pretty bad IME. Interfaces lacking type parameters seems pretty untenable to me. I'd rather just not use them.
> sometimes having a small, stripped down language
Of all the things Go is, I wouldn't say its either of these.
But also, if you respond, please try to not be so defensive. You can like Go, that is a valid thing to do.
Very good points about need vs want. Every time I hear a response to a reasonable software request at work with, "Well, do you really need it?". I pat my left leg. It's a good leg; I like it; Very helpful, but, sadly not strictly necessary.
It's still on my "I keep meaning to play with this" list because I keep getting distracted by other shiny things but you might find gomacro interesting.
Many people don't need generics? Significant runtime cost? I would argue that everyone should be using type-safe collections like maps, lists, etc. (I know these are built into the language in Go, but they're basically special-cased generics). And, if anything, not having generics has more cost because you end up boxing your values to place into collections rather than letting the compiler monomorphize it. The compilation cost can be more expensive, yes, but it saves a lot of work you would be doing by hand.
While Go has had generics for a while now, the vast swath of existing Go code doesn't. The situation is definitely improving but there is still a lot of non-generic legacy cruft.
> you end up boxing your values to place into collections rather than letting the compiler monomorphize it.
you know that Go has interfaces right? and I dont just mean the "any" interface. you can write your own interface similar to io.Reader and others, then add types that implement that interface, then no type matching is needed.
> non-generic legacy cruft
AKA normal, fine code. generic code is not some magic pixie dust that makes bad code into good code. plenty of awful bloated slow generic code around as well.
If you are using interfaces, the value is necessarily boxed as the storage for the value may be heterogeneous. Once a value is typed as e.g. `io.Reader`, dispatching to its methods necessarily requires a vtable lookup (i.e. runtime cost!). Compare this to parametric polymorphism where you can avoid the type erasure and perform static dispatch at compile time. Though, unfortunately, Go's implementation of generics ("GC shape with stenciling" instead of full monomorphization) still ends up incurring some runtime cost.
With regards to your second point, there are definitely situations where generics are vastly preferred: type-safe collections being a big one. For instance, the standard library containers (https://pkg.go.dev/container) are still non-generic with no generic versions in the standard library yet. This is the kind of cruft I mean: generic collections can be turned into concrete collections with type safety, but not the other way around. I make no claims about use of generics making your code being absolutely good or bad, but I do make claims that use of generics can make your code less error-prone and more safe.
What kind of argument is this? Nobody says it’s “magical pixie dust that makes bad code into good code.” It’s a way to avoid writing repetitive code over and over and the runtime cost can be insignificant depending on implementation.
That assert is checking a library invariant; it should never fail unless there’s a bug in the string library itself (although I’m not entirely sure this string library would tolerate a malloc failure from a quick glance through).
This is distinct from checking the parameters; if lks is null then the user of the API has made an error. Some libraries may sanitise user parameters, others don’t. At any rate, an assert would be the wrong choice to check user parameters since this would result in a (recoverable) user error leading to an abort unless the assert is disabled at compile time (-DNDEBUG), returning an error would be a better choice.
Thanks, much of the asserts were scaffolding while I was programming the helper functions. They function similar to comments, reminding me of the internal conditions that should always be valid.
Take care that select() is not good for a webserver. From manpage:
WARNING: select() can monitor only file descriptors numbers that
are less than FD_SETSIZE (1024)—an unreasonably low limit for
many modern applications—and this limitation will not change.
All modern applications should instead use poll(2) or epoll(7),
which do not suffer this limitation.
22 years ago, I worked for Zeus Web Server, which was built entirely around one-process-per-core webserving off select(), and it was so much faster than Apache for serving static content that the developers had built a business out of it. At the time it could saturate a gigabit ethernet link off the largest HP-UX server we could find.
Sure, you should use the modern interfaces, but 1024 connections per process can get you surprisingly far.
Thanks for this. Saw the bug you logged on github too. Agree that poll() should be used instead to avoid the FD_SETSIZE limitation. Maybe in the future when littlekitten webserver grows to be a big cat...
it's a 2-kilobyte executable written in i386 assembly that can handle 20000 requests per second on my laptop, but only serves up files from the filesystem; no cgi or reverse proxy
instead of being single-threaded or preforking it just forks a child per request
this is a great opportunity to ask something I've been wondering about for awhile:
what are the best options out there for hosting websites built as HTTP-serving executables (either Windows or Linux)? is it possible to do this relatively cheaply?
I ask because I've been working on a framework[0] for building websites in a compiled language recently, and while it's been a ton of fun to build and test locally as a hobby project, I have absolutely no idea if it's even remotely financially viable to host a (small- to medium-sized) website made this way, compared to all of the managed hosting solutions out there for PHP/Node/etc.
I don't want/need to pay for a whole dedicated server—I just want to serve HTTP (eventually HTTPS) from a single executable, using one or more SQLite database files. ideally, it would cost as close to your typical shared PHP host as possible.
I have almost zero experience with "cloud" hosting—I made a small game with Node on Azure years ago, and accidentally racked up charges just playing around with it in development—so I don't know if this, or AWS, or whatever else is a viable solution for this. I've seen that it is indeed possible to host a single executable on Azure, but I haven't actually tried it myself, or determined what the pricing for this would end up being.
I am currently working on a low traffic (back office) web app for a client. It runs off of a single AWS EC2 instance (2 vcpu arm). When I looked at the hourly pricing I thought it would be $30/month, but the actual bill has been more like $4/month, I suppose due to an idling discount or perhaps some free tier credit or something else.
The web app is fronted cloudflare (free tier). On the box itself I have caddy set up as a reverse proxy with the cloudflare cert, then uvicorn serving my python asgi (starlette) app. The app uses a local SQLite db.
I’m still working out some of the operational stuff like backups and monitoring but so far I am very pleased with the setup. I’m learning a lot and for the first time I do not feel like there is some monstrous pile of complexity behind a curtain.
Setup takes some time but I have detailed notes and it gets easier every time I run through. Feel free to get in touch if you’d like to hear more details.
Maybe look at container hosting with your (statically linked) binary being the only thing in the container?
Otherwise, why not just get a cheap VPS and host the binary there? I’ve used Vultr and it’s $3.50/month all-in, at the low end. There are even cheaper providers, although I don’t know about their quality. I bet this option would be the cheapest.
This is the beauty of a single binary—it’s trivial to deploy!
that was my thought exactly! I wonder if we're ever going to see a shift toward single-compiled-binary websites, away from increasingly complex deployments that exist solely to facilitate interpreted language stacks.
this is the question driving the framework I'm building—it even has support for simple HTML templates, but they're interpreted, type-checked against the structs that get passed into them, and baked into the executable, all at compile time. this is all coming off of building a website for a client using PHP for the first time in over a decade—on one hand, I appreciate the relative simplicity and ease of deployment compared to modern backend stacks, but on the other hand, it's still an interpreted language, with all the baggage associated with that. I believe it is possible to take the ease of use and speed of iteration of interpreted languages, and the benefits of strongly-typed compiled languages, and get the best of both worlds—at least, for the scale and complexity of website that I want to build and maintain.
For AWS simplest thing is to throw it into docker container (`FROM scratch` can work) and run in Fargate (AWS' container runtime). But its not super-cheap solution; baseline cost is something like $20/mo for Fargate and additional $20/mo for load-balancer. Lambda can work too, but it is in some ways bit more involved.
Cheapest AWS solution would be just EC2 instances (=basic virtual machines). t3a.nano instances cost just $3.5/mo and do not require additional load-balancers.
The modern cloudy approach would be to look into stuff like CloudFlare Workers, iirc they can run WASM, so if you manage to compile your code to that then it might work.
I hate to say it, old man, we have Rust today and we can do things as efficient and as performant as you while being safer overall.
We have tokio to handle all the IO stuff, we have hyper to handle HTTP parsing, and we even have tungstenite to handle websocket out of the box. While I appreciate your work but it will not be practical to write C anymore in the modern age. Well, unless you need to target something LLVM isn't there yet and maybe you need some weird GCC toolchain (cough cough AVR)
If you goal is to just reuse as much stuff as possible, which you seem allude to, then you should just use Apache/nginx/caddy directly. But the goal here doesn't seem to be that, so it makes sense to implement it without using as many libraries as possible.
Both approaches are valid and serve different purposes, you seem to have misunderstood the purpose here.
Confession: I write in C because it's fun and feels like you can do anything in it. Agree it's probably not the most practical or commercial-friendly solution. I like C!
C is like a footgun with naive bullets while Rust is also a footgun but with smart AI assistant. One will faithfully cripple you, but the other will cripple your mind before you shoot. I prefer the latter as it makes you consider