Hacker News new | ask | show | jobs
by bmh 1574 days ago
I love PEGs, but their error messages are usually vague, because it will backtrack out of a deep tree (where it should have discovered the actual error), and then presents an error much higher up ("computer says no").

Is there a mechanism that works well for improving errors in PEGs (i.e. something like a non-returnable node), and how does one practically implement that?

2 comments

In an implementation I made ages ago, I created a different operator, the “naughty or”, which defines “invalid” syntax paths (for example, if the entire parse fails, allow going into the branch that does match identifiers that start with a number). This adds a language-level facility for a strategy I see most hand-rolled parsers end up in, which is to start parsing common mistakes to be able to provide better errors. It’s basically “free” because those paths are only explored if the parse is known to fail, so performance of successful parses isn’t affected, and allows the same level of “craft” in error messages as a hand-rolled one. It also allows people to easily submit patches to improve confusing errors they’ve gotten.

https://github.com/tolmasky/language

> Is there a mechanism that works well for improving errors in PEGs (i.e. something like a non-returnable node)

Yes, it's called the "cut operator".

> how does one practically implement that?

Pick a parser generator that supports it natively. ;-) If you're talking about implementing the parser generator yourself then you probably already know more about it than I do.

Thanks! I see the <cut> operator is actually explained in an older HN post: https://news.ycombinator.com/item?id=20502032
It's fascinating how many times the Prolog cut operator has been borrowed into parsing. It shouldn't be surprising, probably, given how easily lookahead leads people down the path of some backtracking mechanism or other for ease of implementation, and how ingrained the idea of a cut to stop backtracking is if you've ever been exposed to Prolog...