| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by WideCharr 2104 days ago
	I like the way the various approaches are presented here, though the author briefly mentions a factor which I feel like really short circuits the entire discussion: error messages. Most people readily agree that recursive descent parsers can provide the best possible error messages, which if you're writing a compiler in 2020 seems like it trumps all other considerations (especially considering you can also get the best possible performance via recursive descent). The static typing of LR is nice, but trading off user experience for developer experience seems like a bad deal.

6 comments

catpolice 2104 days ago

Yeah, I think there are reasons why a lot of big compilers use some recursive descent variant (even a few like gcc used YACC-generated parsers for a long time and switched back to hand-written recursive descent parsers) and error message generation is a big one.

IMO there's a kind of funny progression in which parsing approach turns out to be the most appropriate depending on the scope of your project that circles back on itself:

- For pretty simple languages a hand-written recursive descent is obviously easiest

- Once your grammar is complicated enough that you start hitting precedence and ambiguity issues, or get sick of rewriting a big chunk of your parser as the grammar changes, you look into generating your parser from a BNF-like specification and end up with some variant of LL or LR

- At some point your language's grammar has mostly stabilized and you're struggling with providing good error messages or parsing performance or you've had to add one too many hacks to get around limitations of your parser generator and recursive descent starts looking good again

For my money, I tend to think that Pratt parsing/precedence climbing can extend recursive descent in a way that makes a lot of the common complaints about the complexity of dealing with operator precedence and associativity seem overstated. The trick is just that as you're building an AST, some symbols will cause you to reparent nodes that you thought you'd already placed, according to various rules. See: https://www.oilshell.org/blog/2017/03/31.html

I wrote a compiler for a vaguely C-like language by hand in javascript a while back that's intended to show how simple a hand-written parser (and code generator) can end up: https://github.com/j-s-n/WebBS

It's not that hard to statically track type information along the way - the above example requires a second pass at the AST to nail things into place and make sure all our operators are operating on the right type of thing, but a lot of that information is captured during the first parser pass or even during lexing.

link

derriz 2104 days ago

Yes - Pratt embedded in a recursive descent parser is hard to beat. I say this after writing parsers on and off for years. Like the author, I also went through a cycle of tools but in my case it was recursive descent, LR (Flex/Bison), parser combinators (Haskell), LL (Antlr) before returning to recursive descent.

In the end the recursive descent (+Pratt) beats them all, in my opinion:

- you can easily provide properly helpful error messages

- best general performance

- the parser can be debugged directly using normal debugging tools

- flexibility - all the tools and features of the host programming language are available

- zero dependencies, no build tool integration required.

The only issues I could see that the author has regarding recursive descent are excessive boiler plate and the complexity of handling precedence and associativity but:

- there should be no more boiler plate than there is for any other programming task - you write functions, methods, classes, etc. just like normal dev to reduce boilerplate.

- using Pratt provides the structure to handle all the operator rule complexity.

link

WideCharr 2104 days ago

runevault made a similar point below, and it's pretty valid -- if you're writing a new compiler for a new language, the target is frequently changing and you care most about iteration speed. I've mostly done compiler development for existing languages, which provides a nice fixed target and the "only" challenge is in providing a good experience for end users within that fixed target.

link

skybrian 2104 days ago

It seems like compilers get harder to maintain based mostly on the size of the grammar, along with how much it changes. For a small grammar, just about anything works, but for large grammars, any code that maps one-to-one with grammar nodes gets unwieldy and removing small constant factors in the amount of code per node starts looking worthwhile.

link

runevault 2104 days ago

I think it depends slightly. For the final version of your compiler you are entirely correct. But starting with something easier to write so you can rapidly iterate and experiment with your language until you are ready to worry about letting other people see it has value as well, then swapping your parser to RD.

link

kazinator 2104 days ago

Reursive descent is debuggable. A grammar rule is a function, which means you can put a breakpoint on a grammar rule, and the backtrace tells you how you got there when you hit it.

Recursive descent carries no tooling dependencies.

Recursive descent is re-entrant, unless you don't know what you're doing.

link

andrepd 2104 days ago

Menhir for OCaml is an LR parser generator with outstanding error messages. Static detection of possible error states is a big part of that.

link

benibela 2102 days ago

>Most people readily agree that recursive descent parsers can provide the best possible error messages,

Well, it could, but not all of them do.

My hand-written parsers are pretty bad at it. At the lowest level I run through a string char by char and test whether it parses correctly, i.e. like: p++; if (*p != '(') throw "expected (";

At first this does not even give a line number with the error.

link

mathgladiator 2104 days ago

I recently just hand rolled my initial antlr grammar to a hand coded recursive descent version. Beyond better error handling, there is preserving comments to auto format and indexing types on tokens for auto complete.

link