Hacker News new | ask | show | jobs
by cwzwarich 2157 days ago
An even bigger flaw with most academic research into parser error recovery is that the vast majority of syntax errors occur from modifying a valid program to produce an invalid program, but the recovery algorithms are oblivious to this.
3 comments

Have a look at this writeup by the author of lezer, if you haven't already: https://marijnhaverbeke.nl/blog/lezer.html
tree-sitter is excellent stuff! It's heavily inspired by Tim Wagner's PhD thesis (original site seems to be down, but https://web.archive.org/web/20150919164029/https://www.cs.be... works). IMHO more people should know about that work, and the sequence of work from Susan Graham's lab that led up to it. We have also been heavily inspired by Tim's work and Lukas's thesis extends and updates a number of aspects of that seminal work including, in Chapter 3, error recovery (https://diekmann.co.uk/diekmann_phd.pdf).

All that said, it's surprisingly difficult to compare error recovery in an online parser (i.e. one that's parsing as you type) to a batch parser. In the worst case (e.g. load a file with a syntax error in), online parsers have exactly the same problems as a batch parser; however, once they've built up sufficient context they have different, sometimes more powerful, options available to them (but they also need to be cautious about rewriting the tree too much as that baffles users).

Approaching this from the opposite side, language designers should also take into account how code with slight mistakes (typos and confused use of features) could be detected. Sometimes adding small things to the grammar can pay a lot of dividends when writing the production ready compiler. Alternatively when adding a new feature they should be thinking "what common mistakes will be made to produce worse errors if we introduce this with this syntax".
I think in some cases they do. Python3’s parser can detect when people use the old print syntax for example.
Which suggests that parsing should be done while editing, supporting many other refactoring tools as well. The key feature enabling this facility is incremental parsing.