Hacker News new | ask | show | jobs
by pklausler 78 days ago
Is there a production compiler out there that doesn't use recursive descent, preferably constructed from combinators? Table-driven parsers seem now to be a "tell" of an old compiler or a hobby project.
3 comments

Some people appreciate that an LR/LALR parser generator can prove non-ambiguity and linear time parse-ability of a grammar. A couple of examples are the creator of the Oil shell, and one of the guys responsible for Rust.

It does make me wonder though about why grammars have to be so complicated that such high-powered tools are needed. Isn't the gist of LR/LALR that the states of an automaton that can parse CFGs can be serialised to strings, and the set of those strings forms a regular language? Once you have that, many desirable "infinitary" properties of a parsing automaton can be automatically checked in finite time. LR and LALR fall out of this, in some way.

Production compilers must have robust error recovery and great error messages, and those are pretty straightforward in recursive descent, even if ad hoc.
Oh, I was talking much more about how you can first learn how to write a compiler. I wasn't talking about how you write a production industry-strength compiler.

Btw, I mentioned parser combinators: those are basically just a front-end. Similar to regular expressions. The implementation can be all kinds of things, eg could be recursive descent or a table or backtracking or whatever. (Even finite automata, if your combinators are suitably restricted.)

I used a small custom parser combinator library to parse Fortran from raw characters (since tokenization is so context-dependent), and it's worked well.
The thing about LR parsers is that since it is parsing bottom-up, you have no idea what larger syntactic structure is being built, so error recovery is ugly, and giving the user a sensible error message is a fool’s errand.

In the end, all the hard work in a compiler is in the back-end optimization phases. Put your mental energy there.

I have worked on compilers (mostly) for high-performance computing for over 40 years, writing every part of a production compiler twice or more. Optimization and code generation and register allocation/scheduling are definitely the most fun -- but the hardest work is in parsing and semantics, where "hardest" means it takes the most work to get things right for the language and to deal with user errors in the most graceful and informative manner. This is especially true for badly specified legacy languages like Fortran.