| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by viraptor 6085 days ago

The parser correctness is not trivial. The RFC2616 contains the complete grammar you need, so it's fairly simple to implement. OTOH, if you write a parser on your own, you're likely to miss stuff like section 4.2, which explains that header values can be multiline if they include LWS. The parser from this article will fail on (just looking at the source, I'm 99.5% sure of this):

    abc:
     def

It also doesn't like tabs and will not support comma-separated header values. It's not rocket science to write a "good enough" http parser, but writing a fully compliant one is something completely different. There are also cool parts of the spec that you can read 10 times and come to different conclusions - for example what does the "\" CR LF section mean if it's inside a quoted string and does it finish the header value or not. Writing a "correct" parser is a LOT of fun...

Keeping separate states for characters in HTTP saves you a couple of cycles probably, because you match as you go and can reject the message early and with the exact place that didn't match. It's a bit useless for a 4-letter string though.