|
|
|
|
|
by chrisaycock
2381 days ago
|
|
> However, in practice it turns out that LALR(1) isn’t always the answer. For instance, both GCC, Rust and Go have chosen to use handwritten parsers that are not based on a declarative grammar file. I find this disappointing: We have decades of experience with specifying languages in a declarative format, and apparently it’s still easier to manually write parsers. Obviously there’s something lacking with the algorithms we have today. Many production compilers have hand-written parsers because of error reporting, not because of issues with parser-generator algorithms. Consider the example of missing parenthesis. I can specify a grammar with the following: EXPR =
...
| '(' EXPR ')'
The generated parser will not gracefully report a mismatched number of parentheses! Instead, it will report that the user's input is not formatted as expected, which makes it hard for the user to understand what went wrong.It is possible to modify the grammar to look for this case specifically: EXPR =
...
| '(' EXPR ')'
| '(' EXPR // report this explicitly
But now I've just polluted my once-pristine grammar file. And adding all of the necessary error checks throughout the grammar will really make things complicated.So, many production compilers end-up with custom parsers to report unexpected input in a sane way. It is not that the algorithms are lacking; there is simply no mechanism for a generator to know what a mismatched parenthesis should be reported as. |
|
Building upon that (and obvious bias alert!) led us to come up with a couple of new algorithms. The current paper draft is https://arxiv.org/abs/1804.07133 (a future version of that paper will probably chop out MF which, in retrospect, adds too much complexity over CPCT+ for the relatively small gains). The Rust library (https://crates.io/crates/lrpar) it's implemented in is documented at https://softdevteam.github.io/grmtools/master/book/errorreco... which is probably a friendlier starting point. Summary: one can do a surprisingly decent job of error recovery for very little work.