|
|
|
|
|
by zohebv
5663 days ago
|
|
You need to know 1. Context Free Grammars (essentially regular expressions that can be recursive) 2. Two kinds of parsing techniques LL and LR. LR is much harder to implement. LL is straightforward recursive descent. You don't have to actually study LR parsing. 3. Tools like yacc (LALR parser, weaker form of LR but more efficient) can be used to generated to emit parser code in C or your favorite language. However parser combinator libraries are preferred over code generators. These libraries are typically easier to write in functional languages but tend to be LL parsers that don't handle left recursion well. 4. This paper proposes a technique that lets you use left recursion in a parser combinator library. So, the only reason you would now use yacc is parsing speed. |
|
"speed" means a couple different things. Someone might say the only reason you would use C++ rather than Python is speed -- i.e. you can always get within a constant factor with Python. Ignoring other aspects of the languages, let's call that true.
But Russ Cox is saying that you would also use yacc when you want a O() bound on the parsing algorithm (a linear algorithm). That is, the technique they advocate will produce parsers more than a constant factor slower than yacc.
Frankly I don't understand their rebuttal at all. The rebuttal is that it doesn't matter for practical purposes? That doesn't sound like computer science to me.
And practically speaking, it does matter, because plenty of people write exponential time regexps and don't know it, and it will blow up in their face on large input.