Hacker News new | ask | show | jobs
by CalChris 3561 days ago
tl;dr use ANTLR

But actually, I did read this excellent article for the second time. However, unless you are skilled in the art, you should be using ANTLR4, Honey Badger. Terence Parr deserves a Turing award.

For parsing, most upper div compiler classes start off with CFGs, dip briefly into LL recursive descent and then conclude with LALR. If people remember anything, it's SHIFT/REDUCE. Unfortunately, there doesn't seem to be any standard tools for the LL(1) equivalent, FIRST/FOLLOW tables. And as the article shows, they aren't equivalent. Each has strengths but LALR has tools.

Except for ANTLR. Which started out as recursive descent ...

Parsing Techniques doesn't really have any competition. The compiler books, like the compiler classes, can only give a little attention to parsing. Given how strong the available tools are and how hard the remaining subjects are (SSA, code generation, ...) maybe they have a point.

2 comments

Honestly, having written a few a parsers, i prefer PEG with recursive descent, or parser combinator.

I find recursive descent parsers are easy to understand, write, debug(you can drop in a breakpoint a follow it through), and the resultant code is actually readable and clean. It's also fairly easy to add error reporting and recovery. As long as you don't do complicated stuff, they're really fast to.

You don't particularly need to be an expert to write recursive descent parsers either.

There are plenty of other parser generators than ANTLR and interesting parsing techniques that fit the bill. Not that there's anything wrong with ANTLR.

The last few time when I've been working on programming language prototypes, I've written the parser using Parsec parser combinators in Haskell. It's super fast and easy to use, however it's more like syntactic sugar for recursive descent parsers than a rigorous parser generator that works for a certain grammar class.