It's great to see people still using C to make web servers !
But can you clarify what are the "state-of-the-art technologies and solutions?"
I just skim the project and think this is not a lot better in term of performance than other servers like Nginx or H2O. There is also Lwan[0], which uses epoll too and has a cool coroutine, router implementation (I implemented a framework for Kotlin/Native on top of it[1]).
Another framework having really fast and fancy technologies in it is Seastar[2]. I think implementing kernel bypass, AIO, DPSK stuff makes it the fastest web server around currently. May be you can try to push the boundary with those things, SPDK, aggressive polling, fast router...?
Currently, it's architecture is similar to modern nginx, while simplicity is the point that makes it faster.
I'll surely, look at your links and ideas, as this is still highly WIP.
If you want to help creating this, feel free to join! :)
If you're looking for low-hanging-fruit to optimise, there's a ton of "foo = create_string(<some constant>)" and then "delete_string(foo)" not long afterwards in the code.
I also think I've found a bug: the code seems to assume that the colon in a header will be followed by one space (see parse_request_line in http.c) but according to https://tools.ietf.org/html/rfc7230#section-3.2 and my experience, that space is optional and there may be multiple of them.
(This is part of the reason I'm not a fan of text-based protocols: parsing is full of annoying edge-cases.)
I feel like it should be possible to avoid dynamic memory allocation completely during the request processing/parsing. Sure, fixed-sized buffers generally imply some memory overhead, but I'd think the overall effect on performance would be beneficial.
This is what I'm trying to do with wwwee [1], a low-resource web server written in Rust. The request / response buffer is a growing anonymous mmap mapping, but all parsing (http headers, ...) and decoding (base64, JSON) is done borrowing from the buffer. It works especially well in Rust, with it's borrow checker preventing use-after-free.
- What are the advantages of using this over something more established such as Nginx or H2O?
- README mentioned "fully non-blocking architecture," this only refers to network IO, correct? My understanding is that Linux doesn't have truly non-blocking file IO. Is that right?
The linux aio syscalls (io_submit, ...) work well on some filesystems (xfs, ext4, ...) but block on others (btrfs). It limits your file system choices but allows for a single threaded, concurrent web-server. This is what I ended up using for my hobby web-server wwwee [1] as low memory usage and good performance on a single core was an important constraint. It is sort of an opinionated design though.
Some of the lessons not learned from publicfile by this include:
* It entirely lacks doco, having zero manual pages or even --help text. I strongly encourage remedying this as soon as possible. Start as you mean to go on, with decent user doco right from the beginning. Instil a culture of keeping the doco up to date as the program changes, and rejecting changes that do not keep the doco in synch.
* It uses a single logfile, with unbounded growth, potentially with superuser privileges, and yet another idiosyncratic logfile configuration mechanism. Just write to standard error and let the system's service/logging management take care of things. daemontools family systems will run it through multilog, cyclog, or similar. systemd will run it through systemd-journald. Both will do the proper rotation by the writer; and daemontools family loggers will even (conventionally) use unprivileged logging processes that cannot eat into the superuser-reserved emergency disc space.
* There is very poor error handling and recovery in some places. A particularly noteworthy example is that if the master cannot fork enough worker processes, which does happen in real life, it carries on regardless and erroneously falls into the child process code. M. Bernstein's approach was to error check everything, from out of memory conditions in string concatenations to the result of chdir().
* There's no protection at all against malicious client requests. Do not think that a server being read-only is enough. publicfile documents (q.v.) what it does to stop requests escaping the data directory root, to stop upwards directory traversals, and to avoid things like attempts to read from non-regular files.
* There's not even rudimentary virtual hosting.
There are a few other problems with this, such as the amount of static configuration information that is needlessly re-decoded on every read(), the laborious string handling and head-body parsing, and the faulty implementation of HTTP/1.1; but those aren't direct cases of not learning the fundamentals from existing static-content-servers as the aforementioned are.
- Main advantage is simplicity, but architecture is in fact very similar to nginx. So, server should be faster than nginx but scale similarly. But it's all still highly DIY
- You're right, I mean non-blocking architecture of network IO.
I agree, I don't claim it's ready
Please, treat it as interesting project to follow that has just started, and is surely not meant to be deployed in production environments, at least not in this state
Again, I'm a noob in C but I really want to understand state of the art in file serving.
I imagine it isn't as easy as it's for thing like server a big file by reading byte by byte and write the response. It has to have some tricks here which I don't know.
Kind of off topic but it doesn’t look like there are any tests in this repo. Is that common in the open source C community? I see several people suggesting benchmarks but what about functional tests? I don’t write much C and am curious about community norms.
Kind of sad that compliance never given as much importance as it should have... What good is a fast server if it doesn't fully comply w/ the spec? Luckily, there are some, like Apache httpd, that do both.
Looking at the github page I'm wondering how it performs compared to something like nginx.
Since I won't use something without https for production, performance testing seems the only real usecase...
https is one of the first features I'd like to implement in the near future :)
Do you know if there is some framework that I could test my server with, against eg. nginx? I was looking for one to do profiling, but am unsure if there exist some that allows you to create high traffic with many clients.
Currently we have only what CapacitorSet prepared: https://github.com/Glorf/lear/issues/1
I'll try to fix HTTP/1.0 support and post some more detailed benchmark in few hours
Why? Performance doesn't matter for that use-case. Maybe less features means more secure, on the other hand established projects (or even the basic HTTP implementations in other languages stdlibs) are probably better covered security-review wise. (edit: unless I missed it, it doesn't even check against directory traversal?)
But can you clarify what are the "state-of-the-art technologies and solutions?"
I just skim the project and think this is not a lot better in term of performance than other servers like Nginx or H2O. There is also Lwan[0], which uses epoll too and has a cool coroutine, router implementation (I implemented a framework for Kotlin/Native on top of it[1]).
Another framework having really fast and fancy technologies in it is Seastar[2]. I think implementing kernel bypass, AIO, DPSK stuff makes it the fastest web server around currently. May be you can try to push the boundary with those things, SPDK, aggressive polling, fast router...?
[0]: https://github.com/lpereira/lwan [1]: https://github.com/KwangIO/kwang [2]: https://github.com/scylladb/seastar