| HN Mirror

regular expressions (ala cs) are equiv to finite state machines. regular expressions can't count or match ()'s. ragel allows you to mix in code within the state machine, so it is actually far, far more powerful than a finite state machine.

in http, handling things like 'Content-Length: %d' and then reading a subsequent length is a little harder. as is handling transfer-chunked encoding. http is quite fiendish in places :-)

these are 'data dependent' - the parse stream contains information (i.e length) on the subsequent tokens - although some regexps have back references, these aren't common place in parsing tools/formalisms like LR,LL,LALR,CFG or PEGs.

my point is simply that a lot of the parsing drama of late has revolved around the simple task of parsing a language, rather than parsing network protocols.

there is a larger class of parsing problems that are still to be tackled.