Hacker News new | ask | show | jobs
by wostusername 2400 days ago
You don't necessarily need something more powerful than Yacc. I think the point was that you can encode errors directly into your grammar. So for some language with semicolon statement separators you can do something like this:

    PROGRAM -> STATEMENT*
    STATEMENT -> IF-STATEMENT | ASSIGN-STATEMENT | STATEMENT-ERROR
    IF-STATEMENT -> ...
    ASSIGN-STATEMENT -> ...
    STATEMENT-ERROR -> TOKEN* SEMICOLON
The idea being that if the if-statement and assignment statement rules fail you consume tokens until the next statement separator (semicolon in our case), produce an Error node in the AST and resume parsing the next statement. This allows your parser to actually catch multiple errors in one pass since it won't die the moment it encounters something it can't parse. After parsing you scan the AST and if it has any Error nodes you spit out the relevant error messages and exit. You can get as granular with the error rules as you want, although the more you add the more unwieldy the grammar becomes.

Naturally doing this in practice is harder than it sounds. Tweaking the Error rules is hard. If you consume too much or too little input you will resume in a spot that can have no chance of ever succeeding, giving way to a cascade of parse errors. I think we've probably all seen a compiler spit out a couple of dozen error messages all stemming from a single root error like a missing closing parentheses or brace.