|
|
|
|
|
by WideCharr
2104 days ago
|
|
I like the way the various approaches are presented here, though the author briefly mentions a factor which I feel like really short circuits the entire discussion: error messages. Most people readily agree that recursive descent parsers can provide the best possible error messages, which if you're writing a compiler in 2020 seems like it trumps all other considerations (especially considering you can also get the best possible performance via recursive descent). The static typing of LR is nice, but trading off user experience for developer experience seems like a bad deal. |
|
IMO there's a kind of funny progression in which parsing approach turns out to be the most appropriate depending on the scope of your project that circles back on itself:
- For pretty simple languages a hand-written recursive descent is obviously easiest
- Once your grammar is complicated enough that you start hitting precedence and ambiguity issues, or get sick of rewriting a big chunk of your parser as the grammar changes, you look into generating your parser from a BNF-like specification and end up with some variant of LL or LR
- At some point your language's grammar has mostly stabilized and you're struggling with providing good error messages or parsing performance or you've had to add one too many hacks to get around limitations of your parser generator and recursive descent starts looking good again
For my money, I tend to think that Pratt parsing/precedence climbing can extend recursive descent in a way that makes a lot of the common complaints about the complexity of dealing with operator precedence and associativity seem overstated. The trick is just that as you're building an AST, some symbols will cause you to reparent nodes that you thought you'd already placed, according to various rules. See: https://www.oilshell.org/blog/2017/03/31.html
I wrote a compiler for a vaguely C-like language by hand in javascript a while back that's intended to show how simple a hand-written parser (and code generator) can end up: https://github.com/j-s-n/WebBS
It's not that hard to statically track type information along the way - the above example requires a second pass at the AST to nail things into place and make sure all our operators are operating on the right type of thing, but a lot of that information is captured during the first parser pass or even during lexing.