|
|
|
|
|
by BruceIV
4663 days ago
|
|
It sounds to me like he wants a PEG parser - PEG is a formalization of a recursive descent parsers that operates in a single pass and can quite easily handle the "parse different things in different contexts" problem (like the inline assembly the author mentions). It's not terribly mature yet, but I've been poking away at building a PEG parser generator myself (mostly as an exercise to learn C++11): https://github.com/bruceiv/egg |
|
Recursive descent parsers are normally LL(k) and are O(n) in the size of the input - choices are made based on the next k tokens and no backtracking or alternate grammar paths are generally ever tried. Though they are not particularly efficient when rules nest deeply on the left before consuming tokens. It's better to use an operator precedence parser for parsing arithmetic expressions than use LL(1) rules to handle precedence, for example.
Having a separate lexer has other benefits besides eliminating explicit whitespace. It enforces consistency in the token set, which can otherwise seem arbitrary - a keyword has a special meaning in one part of the grammar, but not elsewhere. And this in turn helps with keeping backward compatibility when growing a language, because you can know for a fact that using a keyword elsewhere in the grammar doesn't introduce new problems.