| I have a somewhat controversial opinion on this. This paper, like some others that I've seen, treats parsing as an academic/algorithmic topic. Given some token stream we'd like to minimize some error criteria. However, these papers ignore the fact that people don't write token streams: they write code that is formatted using whitespace. I don't know any programmer that writes their source code on a single line. On the contrary, pretty much every programmer formats their source code to some reasonable (if not complete) degree. In other words: parsing error recovery is not primarly an algorithmic problem: it's a usability problem. Newlines and indentation are a source of extra information that we can use to infer what the programmer meant. Why throw all that away during tokenization? That's crazy! We can totally use it for error recovery/error message generation. I decided to play around with this idea in the design my programming language Muon [1]. The language uses redundant significant whitespace for parser error recovery. This simple approach by itself has turned out to work surprisingly well! In my admittedly biased opinion, it frequently surpasses the error recovery quality in mature compilers such as the C# compiler, which has had tons of effort poured into error recovery. Of course, a purely whitespace based recovery scheme is not perfect: there are rough edges, like having to deal with tabs vs spaces, and recovering inside a line is not usually possible. But the fact that such a simple approach has led to good results makes me think this would be a great area for future research, that can perhaps combine the best of both worlds here. [1] https://github.com/nickmqb/muon |
[1]: https://play.rust-lang.org/?version=stable&mode=debug&editio...