Hacker News new | ask | show | jobs
by tokenizer 4572 days ago
As a web developer who isn't familiar with assembly or any web server more barebones than nginx, what benefits does something like this provide? Speed? Could this be a solution for an extremely simple directory/static file web server?
4 comments

This is a simple, single-threaded single-process accept-read-respond-loop web server. It's vulnerable to trivial trickle DoS attacks and probably has other issues. There are no advantages, the author just did this for fun.

The TCP part comes from C code in the kernel, so this headline is a little misleading ;-).

Agreed. However, it should be safe from buffer overflows, path traversal attacks, XSS, and obviously CSRF. It should be fine other than DoS. Let me know if you find any exceptions.
It's hard to be vulnerable to XSS and CSRF with all-static content, no?

So, not only will a trickle DoS other clients, each byte will also force an O(n) traversal of $buf (burning CPU). Granted, buf is only 1000 bytes, but that's not great.

It looks like a request with no space could force you to walk (`repne scasb`) through invalid memory after $buf. Also maybe corrupt it (unescape_request_path).

It will also fail to correctly parse HTTP/0.9 (not a big deal, but part of spec). The parsing code ignores the existence of verbs other than GET. (Doesn't check that the verb is GET either.)

We don't validate that paths start with /, we just skip that byte. Okay:

        mov (path), %al
        ...
        cmp $'/, %al
        je badreq
Since valid GETs are of the form:

    GET /foo.txt HTTP/1.0
         ^-- path=buf+5
As you point out, a client close will cause SIGPIPE causing a crash (DoS).

That's all I see. But I'm not an asm expert and I'm sure I've missed something.

> It's hard to be vulnerable to XSS and CSRF with all-static content, no?

You would think, but actually Apache managed to be vulnerable to XSS by including bits of the request URL in its error paegs, if I remember right. Last millennium, I think.

> So, not only will a trickle DoS other clients, each byte will also force an O(n) traversal of $buf (burning CPU). Granted, buf is only 1000 bytes, but that's not great.

Hmm, while I hadn't thought about that, and I should have, I think that's probably okay; basically you're saying that you can get the machine to burn up to, say, 2048 cycles by sending it a small TCP packet. Which means that a 4-core 2GHz server machine can't handle more than about four million packets per second (well, one million until I parallelize), which is about 85 megabytes per second, or 680 megabits per second. There are probably other bottlenecks in the code, the kernel, or your data center that will kick in first. It's probably more effective to DoS the server by just requesting files from it.

> It looks like a request with no space could force you to walk (`repne scasb`) through invalid memory after $buf.

It's possible I could have gotten this wrong, but I did try to limit the number of bytes it would scan to the bytes that it had actually read, by doing

    mov (bufp), %ecx
before the repne scasb. Did I screw that up?

> HTTP/0.9 ...verbs other than GET.

Yes, those are unimplemented features, and you're right that their lack makes the server behave incorrectly; hopefully they don't result in security bugs. I think they don't matter in practice, since nobody sends HTTP/0.9 requests or HEAD requests, except by hand, do they?

> We don't validate that paths start with /, we just skip that byte.

Right. And the $'/ check below is to keep you from saying

    GET //etc/passwd HTTP/1.0
and getting /etc/passwd. In case that matters in 2013.

Thank you very much for looking over it!

Didn't send ab HEAD requests?

I know it does this by a given flag, but in some tests I have seen some HEADs between my GETs. I haven't used ab for long time, so don't quote me on that. Have u tried httpress[1] as a benchmark tool?

How about a simple check against the first byte equals G (DEC 71) if it is a GET? Shouldn't be that expensive, I think.

Thanks for creating it.

[1] https://bitbucket.org/yarosla/httpress/wiki/Home

I don't know if ab sends HEAD requests! Thanks for the link to httpress; I've been having trouble with ab failing at high concurrencies (1000 concurrent connections) and also being the bottleneck.
> before the repne scasb. Did I screw that up?

Ah, it's possible repne scasb halts when ecx drops to zero (that would explain some of the string length asm code I found when I googled it). I'm not very familiar with x86 mneumonics apart from the basics ('mov').

> four million packets per second (well, one million until I parallelize), which is about 85 megabytes per second

Why is this more than 4 million bytes per second (4 MB/s)? A packet can contain a single byte.

To be a valid TCP packet, it needs to contain at a minimum a 20-byte IP header and a 20-byte TCP header, plus the one byte of payload. In practice your server is probably receiving the packet over Ethernet, so it probably has an Ethernet header and things like that, too, but that's a minimum. You could approach it over, say, SLIP.
This is normally the kind of question I ask about anything involving HTML/CSS only or JS only =D PoC's based on low-level concepts are the ones that make you curious about everything from top to bottom. Even though assembly is the least abstract and most esoteric of programming (some would argue opposite) spaces, the program actually reveals itself quite quickly knowing just a few tid-bits. This is how you get to see that even the most low-level aspects of programming are quite accessible.
Thank you! I'm glad to hear that you found it so readable.
Nginx will almost certainly be faster, and is somewhat robust against DoS attacks. I didn't write this to provide benefits. There are situations where this would work better than nginx (where, say, you don't want to spend any time configuring anything) but there are better existing solutions for those cases.
This is obviously a toy, or PoC.
I don't know about calling it a piece of crap. Seems a little harsh. :-)
I don't normally see 'piece of crap' written as PoC. I'm used to seeing PoS used often. I usually see PoC as 'proof of concept'.
It could mean proof of concept.
I think that smile could mean that he's being facetious.
Incidentally, I read it the other way as well.
Haha, imagine what tech news would be like if every time you saw PoC you thought it meant piece of crap. Quite hilarious actually.
Definitely meant it's ROFL web scale, asynchronous, non-blocking, event driven, message passing, nosql, sharded, clusters of highly available, reliable, high performance, real time, bad ass, rockstar, get the girls, get the funding, get the IPO, impress your mom, impress your cat ... applications.