|
|
|
|
|
by barrkel
4663 days ago
|
|
PEG parsers use backtracking and do not operate in a single pass. They use memoization to avoid the combinatorial explosion, but that trades one problem for another (though of lesser magnitude). Recursive descent parsers are normally LL(k) and are O(n) in the size of the input - choices are made based on the next k tokens and no backtracking or alternate grammar paths are generally ever tried. Though they are not particularly efficient when rules nest deeply on the left before consuming tokens. It's better to use an operator precedence parser for parsing arithmetic expressions than use LL(1) rules to handle precedence, for example. Having a separate lexer has other benefits besides eliminating explicit whitespace. It enforces consistency in the token set, which can otherwise seem arbitrary - a keyword has a special meaning in one part of the grammar, but not elsewhere. And this in turn helps with keeping backward compatibility when growing a language, because you can know for a fact that using a keyword elsewhere in the grammar doesn't introduce new problems. |
|
Backtracking + memoization is but one implementation technique for PEG parsers. Another such technique is akin to dynamic programming -- fill in a table with possible parsing outcomes as you go. No "backtracking" (which really, is moot in a memoized PEG parser) required.
Either way, the performance is identical to that of a recursive descent parser (linear in the size of the input and number of nonterminals).