Show HN: LEAR – A simple and fast HTTP server to serve static resources | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Show HN: LEAR – A simple and fast HTTP server to serve static resources (github.com)
	66 points by mbien 2739 days ago

15 comments

quangio 2739 days ago

It's great to see people still using C to make web servers !

But can you clarify what are the "state-of-the-art technologies and solutions?"

I just skim the project and think this is not a lot better in term of performance than other servers like Nginx or H2O. There is also Lwan[0], which uses epoll too and has a cool coroutine, router implementation (I implemented a framework for Kotlin/Native on top of it[1]).

Another framework having really fast and fancy technologies in it is Seastar[2]. I think implementing kernel bypass, AIO, DPSK stuff makes it the fastest web server around currently. May be you can try to push the boundary with those things, SPDK, aggressive polling, fast router...?

[0]: https://github.com/lpereira/lwan [1]: https://github.com/KwangIO/kwang [2]: https://github.com/scylladb/seastar

newaccoutnas 2739 days ago

If C based web servers float your boat, have you seen one written in asm? https://2ton.com.au/rwasa/

mbien 2739 days ago

Currently, it's architecture is similar to modern nginx, while simplicity is the point that makes it faster. I'll surely, look at your links and ideas, as this is still highly WIP. If you want to help creating this, feel free to join! :)

userbinator 2739 days ago

If you're looking for low-hanging-fruit to optimise, there's a ton of "foo = create_string(<some constant>)" and then "delete_string(foo)" not long afterwards in the code.

I also think I've found a bug: the code seems to assume that the colon in a header will be followed by one space (see parse_request_line in http.c) but according to https://tools.ietf.org/html/rfc7230#section-3.2 and my experience, that space is optional and there may be multiple of them.

(This is part of the reason I'm not a fan of text-based protocols: parsing is full of annoying edge-cases.)

zokier 2739 days ago

I feel like it should be possible to avoid dynamic memory allocation completely during the request processing/parsing. Sure, fixed-sized buffers generally imply some memory overhead, but I'd think the overall effect on performance would be beneficial.

bwindels 2739 days ago

This is what I'm trying to do with wwwee [1], a low-resource web server written in Rust. The request / response buffer is a growing anonymous mmap mapping, but all parsing (http headers, ...) and decoding (base64, JSON) is done borrowing from the buffer. It works especially well in Rust, with it's borrow checker preventing use-after-free.

1: https://github.com/bwindels/wwwee/

continuations 2739 days ago

- What are the advantages of using this over something more established such as Nginx or H2O?

- README mentioned "fully non-blocking architecture," this only refers to network IO, correct? My understanding is that Linux doesn't have truly non-blocking file IO. Is that right?

bwindels 2739 days ago

The linux aio syscalls (io_submit, ...) work well on some filesystems (xfs, ext4, ...) but block on others (btrfs). It limits your file system choices but allows for a single threaded, concurrent web-server. This is what I ended up using for my hobby web-server wwwee [1] as low memory usage and good performance on a single core was an important constraint. It is sort of an opinionated design though.

1: https://github.com/bwindels/wwwee

JdeBP 2739 days ago

... or indeed publicfile.

* http://cr.yp.to/publicfile.html

* http://jdebp.eu./Softwares/djbwares/guide/httpd.html

* http://jdebp.eu./Softwares/nosh/guide/tcp-socket-accept.html

Some of the lessons not learned from publicfile by this include:

* It entirely lacks doco, having zero manual pages or even --help text. I strongly encourage remedying this as soon as possible. Start as you mean to go on, with decent user doco right from the beginning. Instil a culture of keeping the doco up to date as the program changes, and rejecting changes that do not keep the doco in synch.

* It uses a single logfile, with unbounded growth, potentially with superuser privileges, and yet another idiosyncratic logfile configuration mechanism. Just write to standard error and let the system's service/logging management take care of things. daemontools family systems will run it through multilog, cyclog, or similar. systemd will run it through systemd-journald. Both will do the proper rotation by the writer; and daemontools family loggers will even (conventionally) use unprivileged logging processes that cannot eat into the superuser-reserved emergency disc space.

* There is very poor error handling and recovery in some places. A particularly noteworthy example is that if the master cannot fork enough worker processes, which does happen in real life, it carries on regardless and erroneously falls into the child process code. M. Bernstein's approach was to error check everything, from out of memory conditions in string concatenations to the result of chdir().

* There's no protection at all against malicious client requests. Do not think that a server being read-only is enough. publicfile documents (q.v.) what it does to stop requests escaping the data directory root, to stop upwards directory traversals, and to avoid things like attempts to read from non-regular files.

* There's not even rudimentary virtual hosting.

There are a few other problems with this, such as the amount of static configuration information that is needlessly re-decoded on every read(), the laborious string handling and head-body parsing, and the faulty implementation of HTTP/1.1; but those aren't direct cases of not learning the fundamentals from existing static-content-servers as the aforementioned are.

mbien 2739 days ago

- Main advantage is simplicity, but architecture is in fact very similar to nginx. So, server should be faster than nginx but scale similarly. But it's all still highly DIY

- You're right, I mean non-blocking architecture of network IO.

sigi45 2739 days ago

Lucky enough 'should be' is not a accepted benchmark ;)

Don't get me wrong, nginx is big and has more experience. Different headers, compression, security etc.

I would not even advice people using something like LEAR alone for the fact of undetected security issues.

mbien 2739 days ago

I agree, I don't claim it's ready Please, treat it as interesting project to follow that has just started, and is surely not meant to be deployed in production environments, at least not in this state

tjpnz 2739 days ago

I think what's missing from this is "Show HN". I don't think anybody is suggesting it be used in place of NGINX or H2O - not yet.

kureikain 2739 days ago

I'm not a C expert. Trying to understand if this offer other novel techniques.

So in Go we already have this capability in

    http.FileServer(http.Dir("static"))

Also, in Rust, actix web has similar built-in function:

https://github.com/actix/actix-website/blob/master/content/d...

Again, I'm a noob in C but I really want to understand state of the art in file serving.

I imagine it isn't as easy as it's for thing like server a big file by reading byte by byte and write the response. It has to have some tricks here which I don't know.

haney 2739 days ago

Kind of off topic but it doesn’t look like there are any tests in this repo. Is that common in the open source C community? I see several people suggesting benchmarks but what about functional tests? I don’t write much C and am curious about community norms.

rkeene2 2739 days ago

In this same vein is Filed, it does the minimal amount of work possible before handing off to the kernel to sendfile() the static resource.

http://filed.rkeene.org/

jimjag 2739 days ago

Kind of sad that compliance never given as much importance as it should have... What good is a fast server if it doesn't fully comply w/ the spec? Luckily, there are some, like Apache httpd, that do both.

Traubenfuchs 2739 days ago

Why use a multiprocess architecture instead of a multithreaded one?

mbien 2739 days ago

TBH, it was simpler in implementation and maintance, while creating just a little bit slower than pthread on initialization time.

mbien 2739 days ago

I'm truly impressed how wide the response is. It makes me sure that this project is truly worth continuing. Thanks HN!

known 2739 days ago

Check performance of khttpd https://github.com/cobyism/edimax-br-6528n/tree/master/linux... in http://www.fenrus.demon.nl/performance.html

StreamBright 2739 days ago

Why not to use Nginx for this task?

guidedlight 2739 days ago

This would be very useful for performance testing

BlackLotus89 2739 days ago

Was one of the first things I thought as well.

Looking at the github page I'm wondering how it performs compared to something like nginx. Since I won't use something without https for production, performance testing seems the only real usecase...

mbien 2739 days ago

https is one of the first features I'd like to implement in the near future :) Do you know if there is some framework that I could test my server with, against eg. nginx? I was looking for one to do profiling, but am unsure if there exist some that allows you to create high traffic with many clients.

CapacitorSet 2739 days ago

You can use `ab`. I'll post an issue about that.

atakurt 2739 days ago

wrk is also a good tool.

matte_black 2739 days ago

Any benchmarks?

mbien 2739 days ago

Currently we have only what CapacitorSet prepared: https://github.com/Glorf/lear/issues/1 I'll try to fix HTTP/1.0 support and post some more detailed benchmark in few hours

mirimir 2739 days ago

This would be great for Tor .onion servers, yes?

detaro 2739 days ago

Why? Performance doesn't matter for that use-case. Maybe less features means more secure, on the other hand established projects (or even the basic HTTP implementations in other languages stdlibs) are probably better covered security-review wise. (edit: unless I missed it, it doesn't even check against directory traversal?)

mirimir 2739 days ago

Yes, less attack surface is good. So is low resource usage.

I used to like tinyhttpd, but Debian dropped it, and I find it painful to build, so I use nginx now.